top | item 39376036

(no title)

nshm | 2 years ago

Metavoice is one of a dozen GPT-based TTS systems around starting from Tortoise. And not that great honestly. You can clearly hear "glass scratches" in their sound, it is because they trained on MP3-compressed data.

There are much more clear sounding systems around. You can listen for StyleTTS2 to compare.

discuss

order

standardly|2 years ago

Is the crispness of compressed audio really the benchmark of TTS improvements? I feel like that's an aside. A valid point, but not much of a detractor..

nshm|2 years ago

Yes, it is one of the important aspects. In particular if you use TTS to create an audiobook or in a video production.

qwertox|2 years ago

I had forgotten about StyleTTS2, and it was discussed here on HN a couple of months ago. Maybe that's what made me feel that there's something going on.

popalchemist|2 years ago

I've tested both. StyleTTS2 is impressive, especially its speed, but the prosody is lacking, compared to Metavoice.

ionwake|2 years ago

Is it possible to run Metavoice and other pytorch systems on Apple silicon EG the M1? I keep getting issues.