Metavoice is one of a dozen GPT-based TTS systems around starting from Tortoise. And not that great honestly. You can clearly hear "glass scratches" in their sound, it is because they trained on MP3-compressed data.
There are much more clear sounding systems around. You can listen for StyleTTS2 to compare.
Is the crispness of compressed audio really the benchmark of TTS improvements? I feel like that's an aside. A valid point, but not much of a detractor..
I had forgotten about StyleTTS2, and it was discussed here on HN a couple of months ago. Maybe that's what made me feel that there's something going on.
standardly|2 years ago
nshm|2 years ago
qwertox|2 years ago
popalchemist|2 years ago
ionwake|2 years ago