top | item 38337625

(no title)

One thing I've seen done for style cloning is a high quality fine tuned TTS -> RVC pipeline to "enhance" the output. TTS for intonation + pronunciation, RVC for voice texture. With StyleTTS and this pipeline you should get close to ElevenLabs.

discuss

eigenvalue|2 years ago

I suspect they are doing many more things to make it sounds better. I certainly hope open source solutions can approach that level of quality, but so far I've been very disappointed.

KolmogorovComp|2 years ago

RVC? R… Voice Model?

a2128|2 years ago

Retrieval-based Voice Conversion - https://github.com/RVC-Project/Retrieval-based-Voice-Convers...

stavros|2 years ago

Retrieval-based voice conversion, apparently.