top | item 42709453 (no title) lc64 | 1 year ago "was trained on <100 hours of audio"How the hell was it trained on that little data ? discuss order hn newest bbminner|1 year ago I suppose it means per speaker. And it is based on a simplified style tts 2 which from my small dive into the subject seems one of the smaller models achieving great quality. unknown|1 year ago [deleted] Havoc|1 year ago Yeah that surprised me as well - seems low vs what is used on text llms . To be fair 100 hours of speaking is a lot of speaking though edude03|1 year ago But it covers five? Languages so if all equal it’s just 20 hours per language. load replies (1)
bbminner|1 year ago I suppose it means per speaker. And it is based on a simplified style tts 2 which from my small dive into the subject seems one of the smaller models achieving great quality.
Havoc|1 year ago Yeah that surprised me as well - seems low vs what is used on text llms . To be fair 100 hours of speaking is a lot of speaking though edude03|1 year ago But it covers five? Languages so if all equal it’s just 20 hours per language. load replies (1)
edude03|1 year ago But it covers five? Languages so if all equal it’s just 20 hours per language. load replies (1)
bbminner|1 year ago
unknown|1 year ago
[deleted]
Havoc|1 year ago
edude03|1 year ago