top | item 38113299

(no title)

ccoreilly | 2 years ago

ElevenLabs and Gemelo.AI are services that both support text input streaming for exactly this use-case. I am not aware of any open-source Incremental TTS (this is the term used in research afaik) model but you can already achieve somthing similar by buffering tokens and sending them to the TTS model on punctuation characters.

discuss

order

selfhoster11|2 years ago

ElevenLabs only has streaming output available. I've had a look at both recently and ElevenLabs doesn't have streaming input listed as a feature. Would be cool if it had it, though. You could probably approximate this on a sentence level, but you would need to do some normalisation to make the speech sound even.