Deepgram, a startup applying machine learning to audio data, is releasing its machine transcription platform this morning for free. No more will you have to pay for other services like Trint to get the dirty work of automated transcription done. Hint: it has something to do with data.
Machine transcription isn't solved. In fact, machine anything isn't solved. And it seems like everyone these days is making haste to build their own Fort Knox of data to solve machine everything. Deepgram's approach is to make its transcription service free for anyone to upload their audio content and receive searchable text in return.
This approach isn't particularly unique -- as I said, everyone needs data. Don't forget that Image Captchas are basically a means of forcing plebeians to label image data sets for training machine learning models.
Deepgram is using deep learning for its transcription tool (surprise!) -- good old convolutional and recurrent neural networks. Everything is generalized in the free version, but paid offerings might include custom training on company and product names as well as terms of art in a given industry.
I uploaded an hour long interview I did about a week ago to the service to test it out. The file was recorded in a noisy restaurant and consisted of two people having a dialog. The transcription quality was far from perfect -- but it wasn't meaningfully worse than anything else on the market.
I was able to search for a specific quote I remembered and after three attempts, I found the segment of dialog. I wouldn't be able to copy and paste it without angering the interviewee, but it would have given me the context I needed to tell my story. The search process took about five minutes and, to Deepgram's credit, it was obvious that searches were using the sounds of words to find more matches. The thing to remember is that the service costs considerably less than more accurate human transcription and will improve with time.
"ASR is not solved," Scott Stephenson, co-founder and CEO of Deepgram, explained to me in an interview. "It's solved for specific data sets but with noisy accented call data, any service will do a poor job with it."
In addition to the platform, Deepgram is also offering a mostly free API for machine transcription. If you use over a million minutes you will be charged -- computation is expensive so it wouldn't make sense to allow someone to troll the company with a 50 terabyte audio file.
While humans still reign supreme in the transcription world, it's possible that synthesized audio could tilt the odds in the favor of the machines in the near future. Projects like WaveNet and Lyrebird, that generate speech from text, could help to augment systems with data for uncommon words that tend to be the most likely to trip up machine translation systems like Deepgram and those made by the tech giants.