top | item 43120164

Show HN: I built an AI voice agent for Gmail

35 points| chrisnolet | 1 year ago |pocket.computer

Hello again, HN! I’ve been using my DSL to create new voice experiences.

I’ve made an AI-powered email client for Gmail that you talk to, using your microphone. (I highly recommend using earbuds or headphones! Or the best is with Ray-Ban Meta glasses.)

Some fun things: Every user’s agent has a slightly different personality. You can train it by asking it to remember things for next time. And it presents some generative UI while you use it.

This is the first time I’m showing this publicly. I’d love your feedback! What works well, and what doesn’t?

I previously did a Show HN for ‘D&D meets Siri’: https://news.ycombinator.com/item?id=41328794. I’m thinking of releasing the framework/DSL that I’m using to craft these experiences. Would that be interesting? Would you want to build voice apps?

30 comments

upwardbound2|1 year ago

This looks incredibly cool and I really want to try it with my real email account (rather than a throwaway test account). In order to enable people to consider taking that leap, can you please provide more information about where the data will be sent and stored, and your legal liability, if any? Everyone's real email accounts contain extremely sensitive financial and medical secrets that allow identity theft or could even physically endanger the person if they are a reporter in a corrupt regime or something like that.

- Can you please provide a list of the companies that you send data to? Do you use OpenAI? Speaking plainly, I do not trust OpenAI to honor any legal commitments about what they will or won't do with any data sent to them. They are being sued because they systematically violated copyright law at a mass scale -- data theft -- and so I absolutely do not ever want even a single one of my emails going to that company. (Fool me once, ..)

- What exactly do you mean by this line in the Privacy Policy? "We do not use user data obtained through third-party APIs to develop, improve, or train generalized AI and/or ML models." https://pocket.computer/privacy If I read this literally, it sounds like you are saying that you won't use my private emails to train AGI (Artificial General Intelligence, aka superintelligence), which is good I guess, but I also don't really want you to train any AI/ML models of any kind with my emails, because of very real concerns about training data memorization and regurgitation.

Thank you. Providing honesty and transparency and engaging with privacy rights advocates like immigrants' rights advocates would be very good to consider. If you make a mistake here it could result in innocent families being split apart by ICE, for example.

oxcabe|1 year ago

These concerns, IMO, are at least as important as the actual value proposition.

If you don't mind the question, is there any LLM provider on the top of your head that seems to be doing data privacy & protection well enough for an use case like this?

Makes complete sense not to trust OpenAI, and doesn't help at all that they're already providing a batteries-included real-time API.

chrisnolet|1 year ago

Thank you so much for this question, and for your thoughtful post below. It's really easy to put privacy and security to one side when you're launching a startup. And lots of users don't mind privacy when they're signing up for products. But it's something that's personally very close to my heart, and I put a tremendous effort into privacy and security because I knew I wouldn't be able to sleep at night if I cut any corners.

I worked at Apple for many years and their approach to privacy really left a mark on me. I strongly believe that preserving privacy is a moral obligation. (Especially when you're handling people's emails.)

Now, while the beta is running, when you log in to Pocket, there is a big blue switch above the fold under the title 'Privacy.' It says: 'Share recordings with our team.' If you leave it on, that's really helpful for me! But it does exactly what it says, and if you have anything sensitive you don't want to share with me, turn it off.

For your questions:

- The voice data is routed through Retell and the transcripts are passed to OpenAI's API.

- Sensitive data is retained by Retell for 10 minutes (when sharing is off).

- Sensitive data is retained by OpenAI for 30 days 'to identify abuse.'

I'm working with OpenAI to get Zero Data Retention. As it stands, their commitment has been that they will not use API input or output to train models. (I personally trust that commitment, but I understand the skepticism and if that's a deal-breaker for you.)

Retell is HIPAA-compliant and SOC 2 Type II certified. They've been great to work with.

- Regarding the privacy policy: 'User data obtained through third-party APIs (will not be used) to develop, improve, or train generalized AI and/or ML models.' This language was actually required by Google. The use of the word 'generalized' here is actually less specific; it's not AGI, but includes any kind of foundation model. There might be a point in the future where we can fine-tune one model per user with a LoRA, but I agree that the risk of PII leaking from a shared model is far too great.

- The company is a Delaware C-corp and subject to U.S. and California laws.

I really appreciate the opportunity to discuss this. I want to put privacy and security first always, and make sure that's baked into the company culture. Thanks for advocating!

vishrajiv|1 year ago

It works surprisingly well! I thought it’d just be a read interface but it archives and marks as read like I asked. This would be useful when I commute to work.

Drafting replies would be necessary, of course.

It sounds like you have a library to make these voice apps. To my knowledge, people just use providers like Vapi or Retell. What’s the difference here?

chrisnolet|1 year ago

Thanks for trying it!

You can draft emails and reply to threads as well, actually! And if you're unsure of what to say, you can throw some hints at the agent and it'll generate a draft in your tone of voice. (The agent analyzes your past emails to match your style.)

For your question: The providers (Vapi, Retell) handle the big pieces well. My framework/DSL sits on top, helping developers manage the conversation in TypeScript.

Quick example... When you start your first session, we spin up a 'worker agent' to figure out your name, then we say something nice, and display a personalized welcome message:

  dataStore.userName = await aside(z.string(), this.emailText, `What is the user's first name?`);

  prompt(`Say something nice about ${dataStore.userName}.`);
  display(`Welcome to _Pocket_, ${dataStore.userName}.`);

The primitives are powerful. And the DSL makes it simple to wrangle conversational pathways. But my favorite part is that it's all just TypeScript, so you can use NPM packages to make your voice agents actually do things very easily.

It's very cool and I hope to share more in the future!

vidyesh|1 year ago

I am not a gmail(web interface) user, so haven't used it but congrats on the launch! I like how your landing page is so simple and small. And the domain is amazing!

I just wanted to point out, I love my Firefox but the gradient animation is so bad on Firefox!

At first I thought oh cool bg animation, checked the dev tools to know what it is only to realize this shouldn't be a color band animation as I see it!

Chromium based browser have it so subtle, that I barely notice that animation, I question does it even exist on it(it does!)?

chrisnolet|1 year ago

Oooh, thanks for the report! It looks like Firefox doesn't have dithering on gradients. (There's a bug report, but it's been opened for 14 years!)

The gradient animation is super-subtle :) Do you think I should disable it for Firefox users, or do you still think 'cool background animation' in spite of the banding?

protocolture|1 year ago

>I previously did a Show HN for ‘D&D meets Siri’:

I have been messing around with something similar for roleplaying. If you have sourcecode or something to release I would be interested.

chrisnolet|1 year ago

Nice! I’d love to check it out when you’re ready to share it. I’ll likely release the source code for mine if/when I publish the DSL.

In the meantime, since the original link has changed, feel free to try it out at: https://pocket.computer/dungeons. Happy to chat more if you want to know how parts of it are done!

wferrell|1 year ago

Is there a YouTube of this?

chrisnolet|1 year ago

Update to add: https://youtube.com/shorts/D8-sdTm8bd4

Thanks for the nudge!

camkego|1 year ago

I’d really prefer to see a video before trying, also.

jz10|1 year ago

I'm super curious how these special TLDs perform when it comes to SEO and user recall

chrisnolet|1 year ago

Me, too! I have pocketcomputer.com as well, which I use for email. (I learned a long time ago how confusing it is to read out a special TLD over the phone!)

chrisnolet|1 year ago

Footnote to add that it works with Google Workspace accounts, too!

hollowturtle|1 year ago

Exactly how is secure?

chrisnolet|1 year ago

I had to pass the Google CASA audit and implement a ton of security procedures. Basically everything is encrypted, we don’t store your emails, verified best-practice for session tokens and so on. I probably went a little overboard to be honest, but it’s people’s emails and I need to respect the gravity of that.

moralestapia|1 year ago

Nice, is this using an offline model? (For the AI)

chrisnolet|1 year ago

It's using OpenAI's API at the moment, actually. An offline model could _probably_ handle the conversation and tool calling, but it just needs to be really fast to keep up with conversational speeds. (And really, GPT-4o is a bit too slow for my liking in this current iteration. I'm hoping that GPT-4.5 will be faster.)

I'm writing up a full accounting of the stack for the post above, so check back for that and let me know if that doesn't answer your questions/concerns!

stuckkeys|1 year ago

Would be cool to watch a demo use case.