(no title)
PranayKumarJain | 6 days ago
A few things that helped us keep cost + complexity sane on similar voice-agent flows:
- Treat the call as a state machine (collect slots -> confirm -> execute). Don’t let the LLM free-run every turn; use small models for routing/slot-filling, escalate only on ambiguity. - Put hard guardrails on “thinking”: max tokens/turn + short system prompts. It’s shocking how often cost is prompt bloat + retry loops. - If you’re using Twilio, Media Streams + a streaming STT/TTS loop reduces latency and avoids “LLM per sentence” patterns. - Phone-number discovery: try a tiered approach (cached business DB / Places API / fallback scrape) and cache aggressively; scraping every time is where it gets gnarly.
We build production voice agents at eboo.ai and have hit the same Twilio + latency + cost cliffs — happy to share patterns if you want to compare notes.
gitpullups|6 days ago