Ask HN: Why don't we use subtitled films/tv to train speech recognition?
Speech recognition as a technology, has always appeared to move slowly although with the advent of mobile popularity, the technology is becoming increasingly popular.
Is anyone doing anything like this?
[+] [-] tcarnell|14 years ago|reply
Over time you could build up a database of voice prints and grammars for not just celebrities, policitians, but also criminals (for automatic identification).
I had this idea almost 4 years ago, submitted it to the company, but it wasn't taken seriously.
If anybody is interested in this, let me know!
[+] [-] molloye|14 years ago|reply
I suspect a lack of data is not the biggest challenge in improving speech recognition
[+] [-] amirmc|14 years ago|reply
The search aspect of this is very interesting and I hadn't thought of it before (though in hindsight it seems like an obvious benefit).
[+] [-] molloye|14 years ago|reply
I suspect a lack of data is the biggest challenge in improving speech recognition
[+] [-] josefresco|14 years ago|reply
[+] [-] nvictor|14 years ago|reply
[+] [-] eftpotrm|14 years ago|reply
If it were me... Project Gutenberg has free books available in both audio and text formats. You may well again run into issues with the spoken and written text not exactly matching (it's not something I've looked into to know) but I wouldn't be surprised if it was rather less than what I've observed in subtitles, and the data concerned is in a more easily parsed format.
[+] [-] rcthompson|14 years ago|reply
[+] [-] sycren|14 years ago|reply
[+] [-] killa_bee|14 years ago|reply
Some related papers ~ Moore R K. 'There's no data like more data (but when will enough be enough?)', Proc. Inst. of Acoustics Workshop on Innovation in Speech Processing, IoA Proceedings vol.23, pt.3, pp.19-26, Stratford-upon-Avon, 2-3 April (2001). Charles Yang. Who's afraid of George Kingsley Zipf? Ms., University of Pennsylvania. http://www.ling.upenn.edu/~ycharles/papers/zipfnew.pdf
[+] [-] hartror|14 years ago|reply
[+] [-] fraser|14 years ago|reply
1. Audio track is censored, Subtitles are not or Vice/Versa. 2. Actors Improvise the audio, the Subtitles are based on the script. 3. English Translations were done by the cheapest person possible so lots of partial words because they weren't clear and the transcriber didn't understand the context. 4. A recent show (2011) seemed to have a symbol every other character, I'm not sure if this is a Double-Byte Character issue, or just a bad translation. 5. Several shows such as American Idol and America's Got Talent display song lyrics and I'm not sure but I would think singing would require changes to the Algorithm.
I wish you well with the idea, but now you have a little more information.
[+] [-] tintin|14 years ago|reply
[+] [-] sycren|14 years ago|reply
[+] [-] drKarl|14 years ago|reply
[+] [-] sycren|14 years ago|reply
It may also be possible to automate the entire process as we have both the audio and the words spoken at a particular time.
Take it a step further, we have millions of sung songs with lyrics that can also be used. Its a gold mine of information that can be repurposed.
[+] [-] tcarnell|14 years ago|reply
[+] [-] mooism2|14 years ago|reply
[+] [-] 0x12|14 years ago|reply
The problem would be the disproportionate weights given to the words 'I', 'love', 'you', 'baby'. Songs are probably not the best training data when it comes to getting a well rounded vocabulary.
[+] [-] uniclaude|14 years ago|reply
[+] [-] SandB0x|14 years ago|reply
http://www.comp.leeds.ac.uk/me/Publications/cvpr09_bsl.pdf
[+] [-] adsahay|14 years ago|reply
The big problem with using these sources is the huge vocabulary. Speech recognition works better for smaller vocabularies than bigger.
[+] [-] fbnt|14 years ago|reply
Training a speech recognition engine is quite a sophisticated process, and usually requires at least a clean (not noisy) set of samples, which you can't find in dubbed movies and surely not in music.
[+] [-] detst|14 years ago|reply
[+] [-] unknown|14 years ago|reply
[deleted]