top | item 33892562

(no title)

kamilafsar | 3 years ago

I wonder what the correlation between consistency of written/spoken language is for the breakdown per language here: https://github.com/openai/whisper/blob/main/language-breakdo...

For instance, I know Turkish is very consistent: it was refactored in 1928 with the birth of Turkey. Turkish is quite high in the rankings. I don't think because there's loads of data available, but because of its consistency. Contrary, English has loads of data, which should compensate for it inconsistency.

discuss

order

No comments yet.