top | item 38337625

(no title)

wczekalski | 2 years ago

One thing I've seen done for style cloning is a high quality fine tuned TTS -> RVC pipeline to "enhance" the output. TTS for intonation + pronunciation, RVC for voice texture. With StyleTTS and this pipeline you should get close to ElevenLabs.

discuss

order

eigenvalue|2 years ago

I suspect they are doing many more things to make it sounds better. I certainly hope open source solutions can approach that level of quality, but so far I've been very disappointed.