(no title)
nomad_horse | 7 months ago
Bear in mind that there are a lot of very strong _open_ STT models that Mistral's press-release didn't bother to compare to, making impression they are the best new open thing since Whisper. Here is an open benchmark: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard . The strongest model Mistral compared to is Scribe, ranked 10 here.
This benchmark is for English, but many of those models are multilingual (eg https://huggingface.co/nvidia/canary-1b-flash )
espadrine|7 months ago
One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.
[0]: https://mistral.ai/news/voxtral
nomad_horse|7 months ago
Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"
jiehong|7 months ago
IBM’s granite models seems multilingual and well ranked, but can’t find any app using it.
Anybody aware of a dictation app using one of those "better" models?
M4v3R|7 months ago
They do support Voxtral, among others.
irqlevel|7 months ago