top | item 47120453

(no title)

Nice work — real-time voice plumbing always looks “simple” until you build it.

A few things that helped us keep cost + complexity sane on similar voice-agent flows:

- Treat the call as a state machine (collect slots -> confirm -> execute). Don’t let the LLM free-run every turn; use small models for routing/slot-filling, escalate only on ambiguity. - Put hard guardrails on “thinking”: max tokens/turn + short system prompts. It’s shocking how often cost is prompt bloat + retry loops. - If you’re using Twilio, Media Streams + a streaming STT/TTS loop reduces latency and avoids “LLM per sentence” patterns. - Phone-number discovery: try a tiered approach (cached business DB / Places API / fallback scrape) and cache aggressively; scraping every time is where it gets gnarly.

We build production voice agents at eboo.ai and have hit the same Twilio + latency + cost cliffs — happy to share patterns if you want to compare notes.

discuss

gitpullups|6 days ago

is this an ad? written by an LLM?