top | item 42709453

(no title)

lc64 | 1 year ago

"was trained on <100 hours of audio"

How the hell was it trained on that little data ?

discuss

order

bbminner|1 year ago

I suppose it means per speaker. And it is based on a simplified style tts 2 which from my small dive into the subject seems one of the smaller models achieving great quality.

Havoc|1 year ago

Yeah that surprised me as well - seems low vs what is used on text llms . To be fair 100 hours of speaking is a lot of speaking though

edude03|1 year ago

But it covers five? Languages so if all equal it’s just 20 hours per language.