(no title)
HackedBunny | 5 years ago
--
I just ran a speech-to-text converter on a very clear clip of former Doctor Who actor Tom Baker talking in an interview.
The DeepSpeech converter uses the very latest AI deep-learning advancements to 'listen' to the audio and output the spoken words as text.
After 3 long minutes of running it on a 30-second clip, it printed out its interpretation:
"hooloomooloo how booboorowie i have a honeymoon"
twoslide|5 years ago
magicalhippo|5 years ago
I found that the language model they supplied was trained data that did not contain the words I needed, and got significantly improved results when making my own language model using the kenlm[1] tools.
[1]: https://kheafield.com/code/kenlm/
donw|5 years ago
fxtentacle|5 years ago
Did you maybe not convert your WAV to the correct sampling rate?
bmn__|5 years ago
Besides the hypothesis that DS sucks, the software could also very well be just fine and you made methodological errors.