top | item 31462185

(no title)

holdenc137 | 3 years ago

Basically it is real. Because the possible scripts that can be generated are known - fragments of speech (eg 3,4,5 word phrases) were recorded (so the intonation is free).

Would be great to do it with an off-the-shelf TTS engine but I don't think there quite there yet. I know my recording skills and microphone technique is rubbish - but if I knew what I was doing on that front - I think you'd be really hard pushed to tell it was stitched together phrases.

discuss

order

planetsprite|3 years ago

The potential is 100x more with vocal synthesis imo. No need to make programmatic mad-libs style formats. Complete freedom, even though the quality isn't optimal.

holdenc137|3 years ago

Totally agree. I think we're probably only a year or so off TTS that can put some proper intonation into a sentence - hopefully then they'll be indistiguishable from live speech.

I've tried to listen to books with today's TTS and it soon becomes really grating (To my ears at least). It only needs the tinyest slip every few sentences and you can't listen any more.