Speech-to-video synthesis: Real-time rendering of speech
1 points| cwbuilds | 2 years ago
After some research and no luck finding anyone that seems to be working on this, I thought I'd try a Hail Mary and post on here.
I'm looking to speak to anyone who is working on speech-to-video (real-time speech rendering). We already have software which can take audio (speech) input and render a video which resembles a person or avatar speaking, but it takes a long time to render.
How long will it be before the video of the person/avatar speaking will be renderable in near real-time, with similar latency to existing speech-to-text models?
What would the prototype look like to reduce the latency? Is anyone working on anything like this?
For context, I run a language learning app where you can practice speaking orally with AI. It would be far more engaging if the user had an avatar/person to be able to speak to, rather than staring at the chat history whilst talking to the AI conversation partner.
Thanks, Chris
For context, here's the original post: https://news.ycombinator.com/item?id=36973400
billconan|2 years ago
https://www.heygen.com/article/unleashing-the-power-of-realt...
https://docs.trypromptly.com/guides/realtime-avatar-with-rag
cwbuilds|2 years ago