top | item 32931087

(no title)

no1youknowz | 3 years ago

This is awesome. But I really want the other way.

To be able to give it text and hear the speech. A TTS (text to speech).

As a language learner, the ability to create my own sentences (based on existing ones I have, in changing a word here or there). Would be amazing.

How long till we have this I wonder. I know I could use a service to do this currently. But having something running locally, I'd prefer.

Hopefully someone in the OpenAI team reads this. :)

discuss

order

TaylorAlexander|3 years ago

I suspect this is coming. I mean we do have decent text to speech systems already, but in this vein of “we used neural networks and now it’s very very good” you can imagine that with something like GPT-3, to extend it they could use this speech to text system so you could speak to it for input, and then a natural progression is that it can use text to speech to return the output, so you just have a voice oriented conversational system.

So I think TTS is a logical part of the system. I also think that there are peculiarities of voice interaction that aren’t captured in text training datasets, so they would need to do some fine tuning on actual voice conversation to make it feel natural.

All in due time I suppose.

visarga|3 years ago

A full NLP system would include speech recognition, TTS, a large language model, and a vector search engine. The LM should be multi modal, multi language and multi task, "multi-multi-model" for short haha. I'm wondering when we'll have this stack as default on all OSes. We want to be able to search, transcribe, generate speech, run NLP tasks on the language model and integrate with external APIs by intent detection.

On the search part there are lots of vector search companies - Weaviate, Deepset Haystack, Milvus, Pinecone, Vespa, Vald, GSI and Qdrant. But it has not become generally deployed on most systems, people are just finding out about the new search system. Large language models are still difficult to run locally. And all these models would require plenty of RAM and GPU. So the entry barrier is still high.

freedomben|3 years ago

Likewise, TTS is what I really want. My goal is to be able to create audio books from text. I've been using Amazon Polly and it's acceptable quality, but I would be ecstatic to be able to do it locally on my own hardware.

visarga|3 years ago

Check out NaturalReader. It has hundreds of amazing voices, a system for highlighting text as it is being read, works on books (pdf) and webpages, and is available on phones and in browsers on all platforms. So I could have the same voice on Mac, Linux and iPhone.