Here’s a Czech radio interview with a transcription provided by 4 different models:
Kozik a deda.mp3
Quick evaluation:
Note: These models will all do a much better job at English although Scribe did as well as on Czech as it would do on English. Rev and Nova 2 have newer versions that don’t support Czech, yet.
- Scribe is a clear winner by a significant margin. It produced an accurate transcription at the level of a professional transcriber. Somebody reading the transcript would get an accurate idea of what was said. It transcribes all turns accurately but at the end, it assigned the same speakers new labels. This happens after the radio show plays a snippet from a radio play with multiple characters. There’s also one mistake where it misses a turn and assigns both to one speaker.
- DeepGram Nova 2 could not detect speakers well and made many errors that makes the transcript almost unusable without frequent references to the original.
- Whisper is better than Nova but makes notable errors in individual words, names, place names and occasionally other places. Mostly they keep the meaning but sometimes can confuse the reader. It transcribes numbers with digits even when words would be appropriate. In one case, transcribing the Czech word ‘ten’ (this/that) as 10.
- **Rev.ai** is somewhere between Whisper and Nova. It has significant errors but gets the speakers mostly right.
- Gemini 2.0 Pro is not a speech transcription model but it did the second best after Scribe. It made noticeable errors (especially a few names and place names) but most of them would not have confused the reader about what the text was about. Curiously the biggest problem it had was transcribing numbers as digits where words would have been more appropriate. It also mixes up a few speaker turns.
- Word has a transcription feature but it’s not clear what model it uses. I was decently impressed with the quality - about at the level of Whisper or Gemini. It really struggles with placenames and unusual proper nouns.
ElevenLabs Scribe
DeepGram Nova 2
Whisper 2 (via DeepGram)
Gemini 2.0 Pro Experimental 02-05
Microsoft Word Transcription