Show HN: Dograh – an OSS Vapi alternative to quickly build and test voice agents
16 points| a6kme | 2 months ago |github.com
I assumed the hard work was just wiring LiveKit/Pipecat + STT/TTS + an LLM. It wasn’t.
Even with solid OSS (Pipecat/LiveKit), we still had to do a lot of plumbing- variable extraction, tracing, testing etc and any workflow changes required constant redeploys.
We eventually realized we’d spent more time building infrastructure than building the actual agents. Everything felt custom. We hit every possible pain with Pipecat and VAPI style systems.
So we built Dograh - a fully open-source voice agent framework that includes all the boring, painful pieces by default.
What’s different:
- Pipecat-based engine, but forked - custom event model, and concurrency fixes
- One-click start template generated by an LLM Agent for a quick get start template for any use case
- Drag-and-drop visual agent builder for quick iteration (the thing we wished existed earlier)
- Variable extraction layer (name/order/date/etc.) baked into the LLM loop
- Built in Telephony integration (Twilio/ Vonage/ Vobiz/ Cloudonix)
- Multilingual support end-to-end
- Select any LLM TTS STT (add their credits, if any)
- AI-to-AI call testing: automatically stress-test an agent before shipping (still a work in progress- so patchy as of now)
- Fully Open Source
It's built and maintained by YC alumni / exit founders who got tired of rebuilding the same plumbing.
Why we open-sourced it: We kept feeling that the space was drifting toward closed SaaS abstractions (VAPI, Retell). Those are good for demos, but once you need data controls, privacy or self/offline deployment, you end up stuck. We wanted a stack where you can see every part, fork it, self-host it, and patch it as needed.
Try it:
- Repo: https://github.com/dograh-hq/dograh
This spins up a basic multilingual agent with everything pre-wired.
Who this is for:
- If you are looking for self hosting a Vapi like platform for Data Privacy etc.
- Anyone trying to build production-grade voice agents without reinventing audio plumbing.
- If you’ve tried to glue STT→LLM→TTS manually, you probably know the exact pain this is built for
Happy to answer technical questions, show the architecture, or hear how we can improve the product.
pritesh1908|2 months ago
We are happy to share some technical details for anyone interested. A lot of Dograh’s internal work went into extending the functionality of the pipeline by including custom Frames and Processors, creating a ReactFlow based visual agent builder and creating an Engine that can parse that Agent JSON and call conversational LLM loops with function calling. Also we enhanced the functionality by creating easier access to extracted variables, call transcripts and recordings - things that are needed in any production deployment.
One thing we are still trying to understand better: how teams handle long-running conversations while keeping context tight and cheap. Would love to hear how others have approached that.
a6kme|2 months ago
But when we switched to OSS stacks (Pipecat, LiveKit), we realise that even with great OSS, the plumbing was still painful and necessary- no standard way to extract variables from conversations (name/date/order ID), no straightforward tracing of LLM calls, no way to run AI-to-AI test loops, and no fast workflow iteration - and every change meant another redeploy.
The infrastructure glue kept ballooning, and each time it felt like rebuilding the same system from scratch.
Dograh came out of that combination of cost pain and integration pain. Happy to dig deeper into anything.
eddywebs|2 months ago
1) It would be great to provide different voice personas like vapi does maybe it's there already but couldn't find the config. 2) My agent reported some lag in getting responses during the call, perhaps that's just resource issue ?
Either Way you're to a great start and I look forward for this project to grow, starred the repo on GH,I think I was the 100th one :).
a6kme|2 months ago
1. Having different voice personas selector like Vapi is in our pipeline. 2. The lag can be either because of system resource constraints, or due to LLM Inference Lags from the LLM inference providers. We are constantly trying to squeeze out every milisecond to combat the latency issues.
Thank you again for your kind words.
Multicomp|2 months ago
I hope you find product market fit and are able to do what you desire with this product. In the meantime, I am grateful that you are helping us advance towards the Star Trek Voice Computer being defictionalized!
a6kme|2 months ago
Among many other useful and fun things, yes, the dream of having a Star Trek Voice Computer or the good HAL is not very far away. :)
brihati|2 months ago
pritesh1908|2 months ago
android521|2 months ago
a6kme|2 months ago