top | item 38167225

(no title)

nshm | 2 years ago

Good improvements for many languages, numbers here

https://github.com/openai/whisper/blob/main/language-breakdo...

discuss

tekacs|2 years ago

From the WER numbers alone it looks like a very small difference for English itself, but I've found WER to be a misleading assessment mechanism.

Having extensively tested Whisper v2 large against other 'lower WER' models and found them wanting (because of differences in their methodology for generating output), I'm super curious to get a feel for how v3 holistically behaves.

Will probably test it right now. :)

Void_|2 years ago

I don't understand how a pop. 10M country - Czech Republic is among the best.

And I can confirm - my app Whisper Memos (https://whispermemos.com) is very popular in Czech Republic.

It makes perfect sense. Whisper is almost as good as transcribing Czech as English!

Toutouxc|2 years ago

Czech pronunciation is extremely regular and straightforward (sounds close to Latin or even Italian) with no weird "which vowel was that" or "half the word is silent" features and just a few exceptions. Usually if you write a letter, you pronounce the sound, and if you hear a sound, you write the letter.

A great example is that — for most words from any language that uses a subset of the Czech alphabet — a Czech speaker can just pronounce the word instead of spelling it and another Czech speaker will be able to write it down.

e.g. "messerschmitt", "nešamas", "cadeira", "philosophy", "tastaturi", "nicchia", "kaupunki", "abordagem", "povjerilac", "primauté" are all foreign words with very unambiguous pronunciation in Czech.

GaggiX|2 years ago

I don't know Czech, but Italian is extremely consistent in the way it's written, so it's at the top of the list with about one or two orders of magnitude less data.

godelski|2 years ago

I'm more impressed about Korean! I didn't even realize it was that good in V2. But I've just seen a lot of systems perform really poorly (judged by my Korean gf not me) and Korea is only a country of 52M (between Spain and Italy).

A funny note, if Siri is set in Korean mode and reads your texts that come in as English, they sound like a racist imitation of a Korean accent. It is absolutely hilarious.

vitorgrs|2 years ago

I also find funny how Portuguese is also better than English (Brazilian talking here). I guess is probably the nature of the languages or so, phonetics...

it does works amazing in PT-BR Whisper V2, I can't even imagine it being better, and turns out, V3 promises it to be better...

mesmertech|2 years ago

Wow a fellow slovak indie developer, kinda rare to see.

unknown|2 years ago

[deleted]

ComputerGuru|2 years ago

It looks like it's basically whisper-2 with extra training against datasets for specific languages that brought incidental improvements to the rest. Support for some of the languages is still really bad (from real-world experience).

WXLCKNO|2 years ago

Curious as to how dutch has the lowest error rate

crucialfelix|2 years ago

They enunciate.