top | item 28787531

(no title)

dylanbfox | 4 years ago

This is tricky. The de facto metric to evaluate an ASR model is Word Error Rate (WER). But results can vary widely depending on the pre-processing that's done (or not done) to transcription text before calculating a WER.

For example if you take the WER of "I live in New York" and "i live in new york" the WER would be 60% because you're comparing a capitalized version vs an uncapitalized version.

This is why public WER results vary so widely.

We publish our own WER results and normalize the human and automatic transcription text as much as possible to get as close to "true" numbers as possible. But in reality, we see a lot of people comparing ASR services simply by doing diffs of transcripts.

discuss

No comments yet.