top | item 24097787

(no title)

setzer22 | 5 years ago

But DeepSpeech has already been trained with millions of data samples!

I'd feel way better about it if they went for a slightly worse DeepSpeech based implementation, but kept it working in the free software spirit they have been known about for many years.

Also, for desktop devices inference on DeepSpeech is cheap enough, so they could even go the extra mile and work on some Wasm magic to get offline recognition.

That's the kind of work I'd expect from Mozilla! Not wiring up your data collection to the Google Cloud APIs and call it a day! I'm genuinely disappointed with them...

discuss

posguy|5 years ago

The audio Mozilla DeepSpeech is trained on is not very large (about 2000 hours) or diverse (eg: mostly native American English Male voices) and has very little ability to handle noise, accents or other errata.

Comparatively, Baidu had 5000 hours of English to train their versions of DeepSpeech and DeepSpeech2 on, and thus had better results years ago. Google, Microsoft, IBM and other companies have users providing more audio samples on a daily basis, enabling much better quality speech to text.

Mozilla's Common Voice project only has 1492hrs of validated English currently: https://commonvoice.mozilla.org/en/datasets