From the WER numbers alone it looks like a very small difference for English itself, but I've found WER to be a misleading assessment mechanism.
Having extensively tested Whisper v2 large against other 'lower WER' models and found them wanting (because of differences in their methodology for generating output), I'm super curious to get a feel for how v3 holistically behaves.
Czech pronunciation is extremely regular and straightforward (sounds close to Latin or even Italian) with no weird "which vowel was that" or "half the word is silent" features and just a few exceptions. Usually if you write a letter, you pronounce the sound, and if you hear a sound, you write the letter.
A great example is that — for most words from any language that uses a subset of the Czech alphabet — a Czech speaker can just pronounce the word instead of spelling it and another Czech speaker will be able to write it down.
e.g. "messerschmitt", "nešamas", "cadeira", "philosophy", "tastaturi", "nicchia", "kaupunki", "abordagem", "povjerilac", "primauté" are all foreign words with very unambiguous pronunciation in Czech.
I don't know Czech, but Italian is extremely consistent in the way it's written, so it's at the top of the list with about one or two orders of magnitude less data.
I'm more impressed about Korean! I didn't even realize it was that good in V2. But I've just seen a lot of systems perform really poorly (judged by my Korean gf not me) and Korea is only a country of 52M (between Spain and Italy).
A funny note, if Siri is set in Korean mode and reads your texts that come in as English, they sound like a racist imitation of a Korean accent. It is absolutely hilarious.
I also find funny how Portuguese is also better than English (Brazilian talking here). I guess is probably the nature of the languages or so, phonetics...
it does works amazing in PT-BR Whisper V2, I can't even imagine it being better, and turns out, V3 promises it to be better...
It looks like it's basically whisper-2 with extra training against datasets for specific languages that brought incidental improvements to the rest. Support for some of the languages is still really bad (from real-world experience).
tekacs|2 years ago
Having extensively tested Whisper v2 large against other 'lower WER' models and found them wanting (because of differences in their methodology for generating output), I'm super curious to get a feel for how v3 holistically behaves.
Will probably test it right now. :)
Void_|2 years ago
And I can confirm - my app Whisper Memos (https://whispermemos.com) is very popular in Czech Republic.
It makes perfect sense. Whisper is almost as good as transcribing Czech as English!
Toutouxc|2 years ago
A great example is that — for most words from any language that uses a subset of the Czech alphabet — a Czech speaker can just pronounce the word instead of spelling it and another Czech speaker will be able to write it down.
e.g. "messerschmitt", "nešamas", "cadeira", "philosophy", "tastaturi", "nicchia", "kaupunki", "abordagem", "povjerilac", "primauté" are all foreign words with very unambiguous pronunciation in Czech.
GaggiX|2 years ago
godelski|2 years ago
A funny note, if Siri is set in Korean mode and reads your texts that come in as English, they sound like a racist imitation of a Korean accent. It is absolutely hilarious.
vitorgrs|2 years ago
it does works amazing in PT-BR Whisper V2, I can't even imagine it being better, and turns out, V3 promises it to be better...
mesmertech|2 years ago
unknown|2 years ago
[deleted]
ComputerGuru|2 years ago
WXLCKNO|2 years ago
crucialfelix|2 years ago