xutan's comments

xutan | 2 years ago | on: NaturalSpeech 2: Zero-shot speech and singing synthesizers

Thanks for your interests in NaturalSpeech and NaturalSpeech 2!

NaturalSpeech focuses on synthesizing human-level high-quality speech, by training on a single-speaker recording-studio dataset.

NaturalSpeech 2 trains on 44K hours of multi-speaker in-the-wild datasets with more than 5K speakers and focuses on synthesizing any speaker's voice in a zero-shot way given only a short speech prompt. When the speech prompt is noisy in the background, NaturalSpeech 2 will mimic this noise as well. If you want clean voice, just give a clean speech prompt is OK.

Check more discussions on reddit as well: https://www.reddit.com/r/singularity/comments/12rubq4/latent...

page 1