top | item 42307393

Launch HN: Vocera (YC F24) – Testing and Observability for Voice AI

71 points| Sid45 | 1 year ago

Hey HN, we’re Shashij, Sidhant, and Tarush, founders of Vocera AI (https://www.vocera.ai) – a platform that automates the testing and monitoring of AI voice agents. We’re building the product we wish we had when we were developing voice agents. Here’s a demo video: https://www.youtube.com/watch?v=aRtAb_E60jY. You can engage in a roleplay as well which we made just for fun: https://www.vocera.ai/talk.

We were working on voice agents in healthcare, and kept running into the same problem: manually testing was incredibly time-consuming and error-prone. Testing voice AI in a comprehensive way was far more difficult than we had anticipated – not just the setup, but the ongoing monitoring of production calls. Despite our best efforts, some calls still failed once we went live.

The main challenges we faced were: (1) Demonstrating reliability to customers for production was really tough; (2) Manual testing was incomplete and didn't cover edge cases; (3) We couldn’t easily simulate all possible conversations, especially with diverse customer personas; (4) Monitoring every production call manually was a huge time sink.

We built Vocera to solve these problems. Vocera automatically simulates real personas, generates a wide range of testing scenarios from your prompts/call scripts and monitors all production calls. The result? You can be sure your voice agents are reliable, and you get real-time insights into how they’re performing.

Our platform tests how your AI responds to diverse personas, evaluates the conversation against different metrics and gives you directed feedback on the issues.

What’s different about us is that we don’t just automate the evaluation. We generate scenarios and metrics automatically, so developers do not have to spend time defining their scenarios or eval metrics. This saves them a ton of time. Obviously, we give them the option to define these manually as well. Also, we provide detailed analytics on the agent's performance across simulations so developers do not need to listen to all call recordings manually.

If you’re building voice agents and want to ensure they’re reliable and production-ready, or if you’re just interested in the challenges of Voice AI, we’d love to chat.

We’d love to get your feedback, thoughts, or experiences related to testing voice agents!

33 comments

Areibman|1 year ago

Congrats on the launch!

Given the size of the niche (developers building voice agents), do you find there's a lot of demand for testing and observability? From my anecdata, many of the voice AI agent builders are using SDKs and builder tools (Voiceflow, Vapi, Bland, Vocode, etc). Observability is usually already baked-in pattern with these SDKs (testing I'm not so sure of).

One conversation I had with a voice agent builder: "Our product is complex enough where external testing tools don't make sense. And we know when things are not working because we have close relationships with power users and companies." Whose problem are you solving?

Your tool looks very powerful, but might the broader opportunity be just to use your evals to roll out the best voice agents yourself?

unknown|1 year ago

[deleted]

Sid45|1 year ago

Great question! We've seen significant demand for testing and observability across sectors like healthcare, insurance, home services, and e-commerce. You’re correct—many of our customers also rely on tools like Voiceflow, Vapi, and others to build their agents.

What they love about our platform is having both testing and observability in one place. Observability helps identify issues while testing allows them to simulate and prevent those problems before they escalate. This dual approach is especially helpful for teams dealing with voice-specific challenges, industry-specific nuances, or company-specific edge cases.

Our tool is particularly valuable for teams stuck with manual testing—it saves time in iterating the bot and ensure the edge cases are taken care of

ghodoussikian|1 year ago

Congratulations on the launch! This looks like a really powerful framework.

One trend I’ve noticed is there’s a really heavy focus on pre-deploy tests which makes a lot of sense. But one big gap is the lack of ability to surface the scenarios that you don’t know are even occurring after deployment. These are the scenarios that human agents are great at handling and ai agents often fall flat: in a sales context that can have a direct impact on a customers bottom line.

I think rather than attempting to deploy a perfect agent, having a mechanism to surface issues would lend much more peace of mind when launching ai voice agents. Would be happy to chat more if additional context/real world examples would be helpful. Congratulations again on the launch!

Background: have worked on contact center unstructured analytics for several years.

Sid45|1 year ago

Yes, would love to chat. You can block my calendar here: cal.com/kabrasidhant.

Agree to everything you said. That is why we have our observability platform, which allows your live calls to be monitored. The idea is to use the observability platform to run real-life simulations so as you make fixes, you can test it in simulation environment

BrandiATMuhkuh|1 year ago

Congrats on the launch! I can definitely see the value in that.

I've been working with auto-generated content for the past 8 years (both algorithmic and LLM-based). One of the biggest challenges is detecting and preventing regressions after "improving" the prompt, algorithm, or model.

Just yesterday, I deployed a voice agent (OpenAI + Twilio), and it's clear that there are countless scenarios where the agent might not perform as expected. For example, if you ask the agent to speak in German, but your tool uses call names or returns data in English, the agent might suddenly switch to speaking English.

Overall, I believe voice agents will primarily be used and developed by SMEs, but they often lack the time or expertise to account for all these edge cases.

Btw: here is the number agent. sorry it's in German: +43732 350011

tabarnacle|1 year ago

I was able to easily flip the script on the return scenario to convince the rep that they were the one calling me to return - and then flipped it again. The quality of the voice was great, though.

Sid45|1 year ago

Thats great. We also generate adversarial scenarios for our customer's voice agents like you did. The roleplay example was made for you to get a sample evaluation of your performance. In reality, we generate simulations automatically and provide analytics on your AI agent's performance, as demonstrated in the demo video.

Glad you liked the voice quality.

unknown|1 year ago

[deleted]

shreyapathak|1 year ago

Congrats on the launch!

Great to see the focus on robust and exhaustive evaluations. With large-scale usage of products, everything that can go wrong usually does so such evals will go a long way!

How do you intend to grow the product?

Sid45|1 year ago

Thanks a lot. The focus currently is to link the observability and testing environment so that test sets can automatically be created based on findings in actual production calls. Currently, it requires human intervention to create a scenario for testing based on findings of production calls.

ishantarunesh|1 year ago

I run an AI services company and we've built voice bots for multiple clients. Is there a way for us to evaluate the agents on Vocera (these are custom builds not VAPI or Synthflow etc)

Sid45|1 year ago

Yes, Ishan, we are working with clients who have their own custom builds instead of using a platform. All we need is your bot context and an endpoint to run simulations.

topicseed|1 year ago

Evaluate them how? Specifically the voice/audio or you're happy to evaluate them with the transcript with something like Modelmetry?

AkashKaStudio|1 year ago

Do you have a flow/customization where the customer asks to wait for X seconds? And is this just telephony over Websocket or is a WebRTC stream supported as well?

suyashb613|1 year ago

How do you handle sensitive data in production calls, especially for industries like healthcare and finance?

shyam_manchhani|1 year ago

Can users customize scenario generation to focus on specific conversational intents or user behaviors?

Sid45|1 year ago

Yes, apart from using our AI-generated scenarios, users can also define their scenarios

savy91|1 year ago

Is there any way to sign up and try this without going through the sales call/demo?

Sid45|1 year ago

Our onboarding currently includes a quick demo to ensure users get the most value out of the tool and understand its capabilities in detail.

Having said that we have a role play which you can try. In roleplay, you talk with our AI and then we evaluate your performance.

filipeisho|1 year ago

This is super dope! Are you looking to hire? Your product made me super excited.

Sid45|1 year ago

Thanks a lot.

Yes, we will be hiring for founding engineers pretty soon. Please reach out to founders@vocera.ai if you are interested

nextworddev|1 year ago

This market will be killed by Twilio soon

Aurornis|1 year ago

> We were working on voice agents in healthcare

Having some experience with the healthcare industry, seeing the name Vocera here is incredibly confusing.

Vocera is a very common communication platform and set of devices used in hospitals: https://vocera.stryker.com/s/product-hub/vocera-smartbadge These things are everywhere in healthcare already. If someone came to me and suggested using “Vocera” for a healthcare related tech thing, my mind would assume it’s the Stryker product. It’s that common.

So unfortunately I’d recommend a name change as a high priority. Dealing with healthcare tech is difficult enough, but using the same name as a very popular and established healthcare tech product is going to be an unnecessary obstacle in getting traction. Not to mention that Stryker’s Vocera division will have some things to say about this.

Sid45|1 year ago

You are right. We were unaware of Stryker's Vocera prominence when we chose the name. Healthcare voice agents are a huge space. We have 6 clients in healthcare already.

We will discuss and evaluate the name internally to avoid confusion or potential issues. Thanks a lot for your feedback

technics256|1 year ago

This is the first thing I thought too, the name confused me with the company and brand Vocera, I thought it was from them.

AlphaWeaver|1 year ago

Came here to say this exactly - any nurse, PA, or other hospital staff person likely has heard of "Vocera" the hospital communication platform.

Beyond just accidental confusion, I'd be worried about real legal issues with trademark infringement. Trademarks primarily exist to prevent customers from being confused about which business they're interacting with, and this is a great example of the types of things they're trying to prevent. (I am not a lawyer, so take this with a grain of salt.)

rahulgoel|1 year ago

Congrats on the launch. For a sec, I thought this was related to the healthcare comms firm owned by Stryker.

https://www.stryker.com/us/en/portfolios/medical-surgical-eq...

Sid45|1 year ago

yeah, we are Vocera AI.

doubleg72|1 year ago

Vocera is a company that already exists in healthcare and you have some potential legal issues with keeping this name.

unknown|1 year ago

[deleted]