Your average mobile processor doesn't have anywhere near enough processing power to run a state of the art text to speech network in real-time. Most text to speech on mobile hardware are stream from the cloud.
I had a lot of success using FastSpeech2 + MB MelGAN via TensorFlowTTS: https://github.com/TensorSpeech/TensorFlowTTS. There are demos for iOS and Android which will allow you to run pretty convincing, modern TTS models with only a few hundred milliseconds of processing latency.
Not only is state of the art TTS much more demanding (and much much higher quality) than Dr. Sbaitso[0], but so are the not-quite-so-good TTS engines in both Android and iOS.
That said, having only skimmed the paper I didn’t notice a discussion of the compute requirements for usage (just training), but it did say it was a 28.7 million parameter model, so I recon this could be used in real-time on a phone.
[0] judging by the videos of Dr. Sbaitso on YouTube, it was only one step up from the intro to Impossible Mission on the Commodore 64.
Arbortheus|3 years ago
kevin_thibedeau|3 years ago
ben_w|3 years ago
That said, having only skimmed the paper I didn’t notice a discussion of the compute requirements for usage (just training), but it did say it was a 28.7 million parameter model, so I recon this could be used in real-time on a phone.
[0] judging by the videos of Dr. Sbaitso on YouTube, it was only one step up from the intro to Impossible Mission on the Commodore 64.
ccbccccbbcccbb|3 years ago
SemanticStrengh|3 years ago
snek_case|3 years ago