WER is slightly misleading, but Whisper Large v3 WER is classically around 10%, I think, and 12% with Turbo.
The thing that makes it particularly misleading is that models that do transcription to lowercase and then use inverse text normalization to restore structure and grammar end up making a very different class of mistakes than Whisper, which goes directly to final form text including punctuation and quotes and tone.
But nonetheless, they're claiming such a lower error rate than Whisper that it's almost not in the same bucket.
On the topic of things being misleading, GPT-4o transcriber is a very _different_ transcriber to Whisper. I would say not better or worse, despite characterizations such. So it is a little difficult to compare on just the numbers.
There's a reason that quite a lot of good transcribers still use V2, not V3.
tekacs|25 days ago
The thing that makes it particularly misleading is that models that do transcription to lowercase and then use inverse text normalization to restore structure and grammar end up making a very different class of mistakes than Whisper, which goes directly to final form text including punctuation and quotes and tone.
But nonetheless, they're claiming such a lower error rate than Whisper that it's almost not in the same bucket.
tekacs|25 days ago
There's a reason that quite a lot of good transcribers still use V2, not V3.
GaggiX|25 days ago
mdrzn|25 days ago
For Whisper API online (with v3 large) I've found "$0.00125 per compute second" which is the cheapest absolute I've ever found.
emmettm|25 days ago