Is the approach being used to do accented TTS (or just reference recordings), and then a tone color conversion model that just changes the timbre? Because if I say a completely different sentence it still says the original words, haha.
Hmmm. Initially impressive but upon retries and reflection ... not that great. It doesn't even maintain timing ... unless that's part of the transform.
Indeed yeah that’s one of the key weaknesses of the approach that we’re using. It overrides the speakers cadence and accent while keeping their voice profile / timbre in place. Different techniques may not do this but also may not copy over the accent to the resulting clip as effectively. So far we’re using this to support pedagogical (and lead-gen) use cases where we think it works sufficiently enough.
lukeinator42|9 months ago
PaulDavisThe1st|10 months ago
ilyausorov|10 months ago