top | item 39197342

Speech-to-video synthesis: Real-time rendering of speech

1 points| cwbuilds | 2 years ago

Hi guys,

After some research and no luck finding anyone that seems to be working on this, I thought I'd try a Hail Mary and post on here.

I'm looking to speak to anyone who is working on speech-to-video (real-time speech rendering). We already have software which can take audio (speech) input and render a video which resembles a person or avatar speaking, but it takes a long time to render.

How long will it be before the video of the person/avatar speaking will be renderable in near real-time, with similar latency to existing speech-to-text models?

What would the prototype look like to reduce the latency? Is anyone working on anything like this?

For context, I run a language learning app where you can practice speaking orally with AI. It would be far more engaging if the user had an avatar/person to be able to speak to, rather than staring at the chat history whilst talking to the AI conversation partner.

Thanks, Chris

For context, here's the original post: https://news.ycombinator.com/item?id=36973400

2 comments

billconan|2 years ago

this ?

https://www.heygen.com/article/unleashing-the-power-of-realt...

https://docs.trypromptly.com/guides/realtime-avatar-with-rag

cwbuilds|2 years ago

Wow. Yes, thank you so much! I knew of HeyGen, but had no idea they'd done this with real-time avatars.