top | item 44596133

(no title)

> brought back competitive open source audio transcription

Bear in mind that there are a lot of very strong _open_ STT models that Mistral's press-release didn't bother to compare to, making impression they are the best new open thing since Whisper. Here is an open benchmark: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard . The strongest model Mistral compared to is Scribe, ranked 10 here.

This benchmark is for English, but many of those models are multilingual (eg https://huggingface.co/nvidia/canary-1b-flash )

discuss

espadrine|7 months ago

The best model there is 2.5B parameters. I can believe that a model 10x bigger is somewhat better.

One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.

[0]: https://mistral.ai/news/voxtral

nomad_horse|7 months ago

There are larger models in there, a 8B and a 6B. By this logic they should be above 2B model, yet we don't see this. That's why we have open standard benchmarks, to measure this directly - not hypothesize by the models' sizes or do some cross-dataset arithmetics.

Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"

jiehong|7 months ago

I just can’t find dictation apps for Mac using those models except for open whisper.

IBM’s granite models seems multilingual and well ranked, but can’t find any app using it.

Anybody aware of a dictation app using one of those "better" models?

M4v3R|7 months ago

Have you tried https://spokenly.app/ ?

They do support Voxtral, among others.

irqlevel|7 months ago

Try https://whisperclip.com —it delivers low-latency, real-time voice-to-text streaming to any macOS app.