top | item 46641399

(no title)

kamranjon | 1 month ago

That's a pretty crazy requirement for something to be "useful" especially something that runs so efficiently on cpu. Many content creators from non-english speaking countries can benefit from this type of release by translating transcripts of their content to english and then running it through a model like this to dub their videos in a language that can reach many more people.

discuss

phoronixrly|1 month ago

You mean youtubers? And have to (manually) synchronise the text to their video, and especially when youtube apparently offers voice-voice translation out of the box to my and many others' annoyance?

littlestymaar|1 month ago

YouTube's voice to voice is absolutely horrible though. Having the ability for the youtubers to clone their own voice would make it much, much more appealing.

ethin|1 month ago

Uh, no? This is not at all an absurd requirement? Screen readers literally do this all the time, with voices that are the classic way of making a speech synthesizer, no AI required. ESpeak is an example, or MS OneCore. The NVDA screen reader has an option for automatic language switching as does pretty much every other modern screen reader in existence. And absolutely none of these use AI models to do that switching, either.

kube-system|1 month ago

They didn’t say it was a crazy requirement. They said it was crazy to consider it useless without meeting that requirement.