5x Efficiency on a $0 budget
A story about AI, productivity and rapid tool development.
Premise
During my time at university, we dealt with several categories of “knowledge” work. The most memorable instance of this was during the last semester of the program, where we had to transcribe long-form interviews. The course centered around qualitative research methods.
“Because of the GDPR regulations you are not allowed to use online transcription services. Expect manual transcription to take four times the length of the interview.”
To the surprise of no-one , the room was filled with murmurs and whispers. Five interviews, an hour a piece, multiplied by four.. That’s 20 hours of work simply because we need the interviews in text. The somewhat dyslectic classmate, taking the course in their third(!) language, looked over their shoulder towards me and the rest of our group with justified concern. For quite obvious reasons they would not be able to keep pace.
I was already going over our options in my head. There were three main ones.
- Grind — Doing it the assumed/instructed way, meaning we either had to throw a team member under the bus or increase our workload to cover for them. Far from ideal.
- Gamble — We could, of course, quietly disregard GDPR regulations. This would put us at risk of getting caught and not passing the course, which was not acceptable a couple of months before graduation.
- Innovate— Find a way to, at least partially, automate the process. Without breaking any laws. Within two weeks.
Deep down I knew where this was heading. I was pretty damn sure this could be automated locally. And I wasn’t about to let some low value busy-work trip us up just a few months from getting our degrees.
With only partially justified confidence, I told them that we would figure something out.
Getting to Work
I was not too engaged in the process of planning the interviews. I was, however, knees deep researching the state of the art within transcription tech.
A few weeks prior to this course, I had taken a course in applied deep learning. I had an absolute blast during that course, managing to get a solid grasp on which types of problems are actually suitable for AI/ML. Because of this knowledge, I was not surprised to end up looking at open source models for Automatic Speech Recognition. (Shoutout Huggingface)
(Very) Rapid Development
The most promising concept was to cram OpenAI’s Whisper onto a local machine and glue the whole thing up with some Python scripts to fit our use case. Tick. Tock.
Meanwhile, we were booking interviews, doing all kinds of planning and time was moving fast.
Early experiments were messy and difficult to properly fit into our workflow, at least without dramatically slowing it down. With no other good alternatives in sight, continued iteration was the only viable option.
Heading into the second and final week before our deadline, the pressure to get this new tool working was rising. I was not excited by the prospect of reverting to manual brute force transcription.
We were now starting to have actual recordings to process, and it was getting close. Monday was spent doing test runs for the recognition and text formatting. The tool became more streamlined as the day progressed and by Tuesday morning we had a locally running command line tool that turned an audio file into a text file.
The Fastest Team in Town
We made use of my old gaming PC with an RTX-3070 graphics card to achieve a raw speed of approximately 9X real time. The actual speed, however, includes reading and making corrections. On average, we spent 45 minutes for every hour of material. HUGE gains when compared to the expected 4 hours.
We offered to help the other teams get the tool running for their interviews as well, only one of them took us up on the offer.
Finally, it was over. A solid 5x speed boost on our most time-consuming task and some bonus free time.
Was it worth the effort? Definitely.
The transcription tool saw continued use during ✨thesis season✨, helping some of the students in transcribing their interview material. The way we approached tooling for our thesis is a story for another time, but to say that we had a significant advantage is no exaggeration.
Open Sourcing
The tool was given the name Crokket, referencing one of my favorite quotes of all time As commonly reiterated by an old friend:
“som en krocketklubba i skenbenet”
which roughly translates to “like a croquet mallet to the shin”.
Crokket (sondelll/crokket) was open source from the start, under an MIT license. Sharing is caring, right?
These days it’s sparsely maintained and may no longer work. I can happily say that it got the job done and that is after all what it was built for. Mission accomplished.
The Status Quo and You
Most people are undoubtedly wasting both time and resources. A large portion of the problem lies in how we value the act of doing work. As counterintuitive as it may sound, people should not by default be rewarded for doing a lot of work. People should be rewarded for getting things done while doing less work, with all the accompanying implications.
You can’t buy a competetive edge through a piece of software, you haven’t been able to in a long time. Now and for the foreseeable future, good tools are built by teams that understand their problem domain.