whodunser | 9 years ago | on: Announcing AudioSet: A Dataset for Audio Event Research
whodunser's comments
whodunser | 9 years ago | on: Introducing Keras 2.0
So if you are relying on the docs to edit old code, you may become a teensy bit frustrated!
whodunser | 9 years ago | on: Baidu Deep Voice Explained: Part 1βββthe Inference Pipeline
From my perspective, Baidu's approach is a little embarrassing, with the use of many modeling stages in their training and production of TTS. When the rest of the community is moving towards end-to-end training, their usage of this many stages sounds excruciating. Merlin[0], which was a pretty good standard for 2016, has this painful feeling as well, with two DL stages (duration, acoustic) followed by some conditioning and then a synthesis step.
The more important technical contribution seems to be the hand-tuned synthesis code that makes their generation faster; cool but not particularly sexy (and there are few details). The details on training hyperparams are nice to have too, of course.
Contrary to the post, I would be very surprised if the voice sample included in the post was actually generated by Deep Voice -- it has none of the robotic qualities pointed out by the researchers themselves in their blog post[1]. More likely it is a demonstration of the loss in their last, WaveNet-like stage. This was also pointed out in the previous HN discussion[2]
Lastly, Andrew Ng is neither thanked in the paper nor mentioned on any webpage -- are we sure this was work he supervised?
[0] https://github.com/CSTR-Edinburgh/merlin
[1] http://research.baidu.com/deep-voice-production-quality-text...
whodunser | 9 years ago | on: Transcribing the Phyllis Diller Gag File
Compare to hampanda.com (from Deepgram, YC W16)
whodunser | 9 years ago | on: Deep Voice: Real-Time Neural Text-To-Speech
whodunser | 9 years ago | on: Launch HN: FloydHub (YC W17) β Heroku for Deep Learning
The jupyter jobs look neat, but I assume they are charged continuous time? Would be cool if somehow that only ended up charged for compute time, but I understand that would be difficult.
Are these instances guaranteed to be in a given region, for if I wanted to route more complex debug output / intermediate files to S3?
whodunser | 9 years ago | on: Trump silences government scientists with gag orders
[0] http://www.seattletimes.com/seattle-news/politics/trump-admi...
whodunser | 9 years ago | on: How to Get into Natural Language Processing
http://smmry.com (demo here)
https://np.reddit.com/r/autotldr/comments/31bfht/theory_auto...
whodunser | 9 years ago | on: The jobs that really smart people avoid
Throw in uncertainty, and inability to judge quality work so far from the future, and maybe that justifies the low pay of many basic research jobs? Would be interesting to see an analysis of this across fields, countries, year, etc.
Looks like the... tag names? and example urls have been released, but the videos and sound are under their respective licenses -- ie, mostly the standard Youtube license.
This is neat. Can a ML model developed on this dataset be used for commercial purposes? I guess, at minimum, the paper and tag list are provided as help for those corporations that would wish to build/use a private dataset for similar purposes?
[0] https://github.com/audioset/ontology