Show HN: I built an AI voice agent for Gmail
35 points| chrisnolet | 1 year ago |pocket.computer
I’ve made an AI-powered email client for Gmail that you talk to, using your microphone. (I highly recommend using earbuds or headphones! Or the best is with Ray-Ban Meta glasses.)
Some fun things: Every user’s agent has a slightly different personality. You can train it by asking it to remember things for next time. And it presents some generative UI while you use it.
This is the first time I’m showing this publicly. I’d love your feedback! What works well, and what doesn’t?
I previously did a Show HN for ‘D&D meets Siri’: https://news.ycombinator.com/item?id=41328794. I’m thinking of releasing the framework/DSL that I’m using to craft these experiences. Would that be interesting? Would you want to build voice apps?
upwardbound2|1 year ago
- Can you please provide a list of the companies that you send data to? Do you use OpenAI? Speaking plainly, I do not trust OpenAI to honor any legal commitments about what they will or won't do with any data sent to them. They are being sued because they systematically violated copyright law at a mass scale -- data theft -- and so I absolutely do not ever want even a single one of my emails going to that company. (Fool me once, ..)
- What exactly do you mean by this line in the Privacy Policy? "We do not use user data obtained through third-party APIs to develop, improve, or train generalized AI and/or ML models." https://pocket.computer/privacy If I read this literally, it sounds like you are saying that you won't use my private emails to train AGI (Artificial General Intelligence, aka superintelligence), which is good I guess, but I also don't really want you to train any AI/ML models of any kind with my emails, because of very real concerns about training data memorization and regurgitation.
Thank you. Providing honesty and transparency and engaging with privacy rights advocates like immigrants' rights advocates would be very good to consider. If you make a mistake here it could result in innocent families being split apart by ICE, for example.
oxcabe|1 year ago
If you don't mind the question, is there any LLM provider on the top of your head that seems to be doing data privacy & protection well enough for an use case like this?
Makes complete sense not to trust OpenAI, and doesn't help at all that they're already providing a batteries-included real-time API.
chrisnolet|1 year ago
I worked at Apple for many years and their approach to privacy really left a mark on me. I strongly believe that preserving privacy is a moral obligation. (Especially when you're handling people's emails.)
Now, while the beta is running, when you log in to Pocket, there is a big blue switch above the fold under the title 'Privacy.' It says: 'Share recordings with our team.' If you leave it on, that's really helpful for me! But it does exactly what it says, and if you have anything sensitive you don't want to share with me, turn it off.
For your questions:
- The voice data is routed through Retell and the transcripts are passed to OpenAI's API.
- Sensitive data is retained by Retell for 10 minutes (when sharing is off).
- Sensitive data is retained by OpenAI for 30 days 'to identify abuse.'
I'm working with OpenAI to get Zero Data Retention. As it stands, their commitment has been that they will not use API input or output to train models. (I personally trust that commitment, but I understand the skepticism and if that's a deal-breaker for you.)
Retell is HIPAA-compliant and SOC 2 Type II certified. They've been great to work with.
- Regarding the privacy policy: 'User data obtained through third-party APIs (will not be used) to develop, improve, or train generalized AI and/or ML models.' This language was actually required by Google. The use of the word 'generalized' here is actually less specific; it's not AGI, but includes any kind of foundation model. There might be a point in the future where we can fine-tune one model per user with a LoRA, but I agree that the risk of PII leaking from a shared model is far too great.
- The company is a Delaware C-corp and subject to U.S. and California laws.
I really appreciate the opportunity to discuss this. I want to put privacy and security first always, and make sure that's baked into the company culture. Thanks for advocating!
vishrajiv|1 year ago
Drafting replies would be necessary, of course.
It sounds like you have a library to make these voice apps. To my knowledge, people just use providers like Vapi or Retell. What’s the difference here?
chrisnolet|1 year ago
You can draft emails and reply to threads as well, actually! And if you're unsure of what to say, you can throw some hints at the agent and it'll generate a draft in your tone of voice. (The agent analyzes your past emails to match your style.)
For your question: The providers (Vapi, Retell) handle the big pieces well. My framework/DSL sits on top, helping developers manage the conversation in TypeScript.
Quick example... When you start your first session, we spin up a 'worker agent' to figure out your name, then we say something nice, and display a personalized welcome message:
The primitives are powerful. And the DSL makes it simple to wrangle conversational pathways. But my favorite part is that it's all just TypeScript, so you can use NPM packages to make your voice agents actually do things very easily.It's very cool and I hope to share more in the future!
vidyesh|1 year ago
I just wanted to point out, I love my Firefox but the gradient animation is so bad on Firefox!
At first I thought oh cool bg animation, checked the dev tools to know what it is only to realize this shouldn't be a color band animation as I see it!
Chromium based browser have it so subtle, that I barely notice that animation, I question does it even exist on it(it does!)?
chrisnolet|1 year ago
The gradient animation is super-subtle :) Do you think I should disable it for Firefox users, or do you still think 'cool background animation' in spite of the banding?
protocolture|1 year ago
I have been messing around with something similar for roleplaying. If you have sourcecode or something to release I would be interested.
chrisnolet|1 year ago
In the meantime, since the original link has changed, feel free to try it out at: https://pocket.computer/dungeons. Happy to chat more if you want to know how parts of it are done!
wferrell|1 year ago
chrisnolet|1 year ago
Thanks for the nudge!
camkego|1 year ago
jz10|1 year ago
chrisnolet|1 year ago
chrisnolet|1 year ago
hollowturtle|1 year ago
chrisnolet|1 year ago
moralestapia|1 year ago
chrisnolet|1 year ago
I'm writing up a full accounting of the stack for the post above, so check back for that and let me know if that doesn't answer your questions/concerns!
stuckkeys|1 year ago