Launch HN: Vocera (YC F24) – Testing and Observability for Voice AI
71 points| Sid45 | 1 year ago
We were working on voice agents in healthcare, and kept running into the same problem: manually testing was incredibly time-consuming and error-prone. Testing voice AI in a comprehensive way was far more difficult than we had anticipated – not just the setup, but the ongoing monitoring of production calls. Despite our best efforts, some calls still failed once we went live.
The main challenges we faced were: (1) Demonstrating reliability to customers for production was really tough; (2) Manual testing was incomplete and didn't cover edge cases; (3) We couldn’t easily simulate all possible conversations, especially with diverse customer personas; (4) Monitoring every production call manually was a huge time sink.
We built Vocera to solve these problems. Vocera automatically simulates real personas, generates a wide range of testing scenarios from your prompts/call scripts and monitors all production calls. The result? You can be sure your voice agents are reliable, and you get real-time insights into how they’re performing.
Our platform tests how your AI responds to diverse personas, evaluates the conversation against different metrics and gives you directed feedback on the issues.
What’s different about us is that we don’t just automate the evaluation. We generate scenarios and metrics automatically, so developers do not have to spend time defining their scenarios or eval metrics. This saves them a ton of time. Obviously, we give them the option to define these manually as well. Also, we provide detailed analytics on the agent's performance across simulations so developers do not need to listen to all call recordings manually.
If you’re building voice agents and want to ensure they’re reliable and production-ready, or if you’re just interested in the challenges of Voice AI, we’d love to chat.
We’d love to get your feedback, thoughts, or experiences related to testing voice agents!
Areibman|1 year ago
Given the size of the niche (developers building voice agents), do you find there's a lot of demand for testing and observability? From my anecdata, many of the voice AI agent builders are using SDKs and builder tools (Voiceflow, Vapi, Bland, Vocode, etc). Observability is usually already baked-in pattern with these SDKs (testing I'm not so sure of).
One conversation I had with a voice agent builder: "Our product is complex enough where external testing tools don't make sense. And we know when things are not working because we have close relationships with power users and companies." Whose problem are you solving?
Your tool looks very powerful, but might the broader opportunity be just to use your evals to roll out the best voice agents yourself?
unknown|1 year ago
[deleted]
Sid45|1 year ago
What they love about our platform is having both testing and observability in one place. Observability helps identify issues while testing allows them to simulate and prevent those problems before they escalate. This dual approach is especially helpful for teams dealing with voice-specific challenges, industry-specific nuances, or company-specific edge cases.
Our tool is particularly valuable for teams stuck with manual testing—it saves time in iterating the bot and ensure the edge cases are taken care of
ghodoussikian|1 year ago
One trend I’ve noticed is there’s a really heavy focus on pre-deploy tests which makes a lot of sense. But one big gap is the lack of ability to surface the scenarios that you don’t know are even occurring after deployment. These are the scenarios that human agents are great at handling and ai agents often fall flat: in a sales context that can have a direct impact on a customers bottom line.
I think rather than attempting to deploy a perfect agent, having a mechanism to surface issues would lend much more peace of mind when launching ai voice agents. Would be happy to chat more if additional context/real world examples would be helpful. Congratulations again on the launch!
Background: have worked on contact center unstructured analytics for several years.
Sid45|1 year ago
Agree to everything you said. That is why we have our observability platform, which allows your live calls to be monitored. The idea is to use the observability platform to run real-life simulations so as you make fixes, you can test it in simulation environment
BrandiATMuhkuh|1 year ago
I've been working with auto-generated content for the past 8 years (both algorithmic and LLM-based). One of the biggest challenges is detecting and preventing regressions after "improving" the prompt, algorithm, or model.
Just yesterday, I deployed a voice agent (OpenAI + Twilio), and it's clear that there are countless scenarios where the agent might not perform as expected. For example, if you ask the agent to speak in German, but your tool uses call names or returns data in English, the agent might suddenly switch to speaking English.
Overall, I believe voice agents will primarily be used and developed by SMEs, but they often lack the time or expertise to account for all these edge cases.
Btw: here is the number agent. sorry it's in German: +43732 350011
tabarnacle|1 year ago
Sid45|1 year ago
Glad you liked the voice quality.
unknown|1 year ago
[deleted]
shreyapathak|1 year ago
Great to see the focus on robust and exhaustive evaluations. With large-scale usage of products, everything that can go wrong usually does so such evals will go a long way!
How do you intend to grow the product?
Sid45|1 year ago
ishantarunesh|1 year ago
Sid45|1 year ago
topicseed|1 year ago
AkashKaStudio|1 year ago
suyashb613|1 year ago
shyam_manchhani|1 year ago
Sid45|1 year ago
savy91|1 year ago
Sid45|1 year ago
Having said that we have a role play which you can try. In roleplay, you talk with our AI and then we evaluate your performance.
filipeisho|1 year ago
Sid45|1 year ago
Yes, we will be hiring for founding engineers pretty soon. Please reach out to founders@vocera.ai if you are interested
nextworddev|1 year ago
Aurornis|1 year ago
Having some experience with the healthcare industry, seeing the name Vocera here is incredibly confusing.
Vocera is a very common communication platform and set of devices used in hospitals: https://vocera.stryker.com/s/product-hub/vocera-smartbadge These things are everywhere in healthcare already. If someone came to me and suggested using “Vocera” for a healthcare related tech thing, my mind would assume it’s the Stryker product. It’s that common.
So unfortunately I’d recommend a name change as a high priority. Dealing with healthcare tech is difficult enough, but using the same name as a very popular and established healthcare tech product is going to be an unnecessary obstacle in getting traction. Not to mention that Stryker’s Vocera division will have some things to say about this.
Sid45|1 year ago
We will discuss and evaluate the name internally to avoid confusion or potential issues. Thanks a lot for your feedback
technics256|1 year ago
AlphaWeaver|1 year ago
Beyond just accidental confusion, I'd be worried about real legal issues with trademark infringement. Trademarks primarily exist to prevent customers from being confused about which business they're interacting with, and this is a great example of the types of things they're trying to prevent. (I am not a lawyer, so take this with a grain of salt.)
rahulgoel|1 year ago
https://www.stryker.com/us/en/portfolios/medical-surgical-eq...
Sid45|1 year ago
doubleg72|1 year ago
unknown|1 year ago
[deleted]