top | item 46774352

(no title)

For real-time: we use WebRTC for streaming. Input is streaming STT, then a low-latency LLM, then TTS, then we drive Live2D parameters on the client. Lip sync: we currently do (simple phoneme / amplitude-based) and are testing viseme extraction. Rhubarb is on our list, but we’re cautious about added latency.

discuss

No comments yet.