top | item 46987054

(no title)

Great work on open-sourcing the orchestrator. Full-duplex and barge-in are definitely the hardest parts to nail—getting those audio buffers cleared and the LLM stream killed in sub-500ms makes or breaks the "human" feel.

Curious about how you're handling VAD in noisy environments—do you find the RMS-based approach holds up well for telephony, or are you considering a more robust model-based VAD (like Silero) for the future?

We're tackling similar low-latency orchestration challenges at eboo.ai. It's great to see more Go-based tools in this space. Subscribed to the repo!

discuss

dani-lokutor|13 days ago

Barge-in is a total nightmare. Clearing those buffers fast enough to kill the 'ghost audio' without the LLM stuttering is exactly what we’re fighting right now.

You're spot on about VAD, too. RMS is our 'MVP debt', it’s fine for clean mics, but we’re definitely looking at a Silero bridge for telephony/noisy environments.

Also, we actually built this because we run Lokutor (ultra-low latency TTS). If you guys at eboo.ai are hunting for faster inference, hit me up—would love to get you a key to play with.