(no title)
PieSquared | 6 years ago
Second, I don't see any reason why there shouldn't be an open-source Tacotron or WaveNet implementation that's as good as Google's model implementations. Implementing and training these models is expensive but not prohibitively so (nowadays, you could probably do it with $5,000 - $10,000, including experimentation costs).
That said, quality of text-to-speech systems is determined only partially by the quality of these models -- much if not most of the work of building high quality text to speech systems goes into things like high quality data collection systems, good data annotations, good normalization and NLP tailored towards the domain of the TTS system, multilanguage support, optimized inference implementations for server or mobile platforms, etc.
No comments yet.