The cadence of the synthesized voices are still noticeably artificial, even for the short demo phrases. This is not to say this isn't impressive. But how much does this method improve when it isn't constrained by a 5-second sample? If we feed it several hours of public speeches from Martin Luther King Jr, or hell, tens of hours of audio from President Obama or Trump – will it have the same artificial cadence, even if the tone and pitch of the imitated voice is accurate?
Dockson|6 years ago
ry4nolson|6 years ago
Only requires 5 seconds of voice audio to synthesize believable speech.
danso|6 years ago