Launch HN: Martin (YC S23) – Using LLMs to Make a Better Siri
153 points| darweenist | 1 year ago | reply
I’ve been a Siri power user for a long time, mainly because I’ve always liked using voice as an interface. But, legacy voice assistants like Siri, Google Assistant, and Alexa were never well integrated enough or reliable enough to actually save time. Maybe 1 in 5 commands end up executing as smoothly as you expected, but the most useful thing they do is play a song or set an alarm. The advent of LLMs seemed like a great opportunity to push the state of the art forward a notch or two!
Our goal is to do 2 things better:
1) Deeper integrations with productivity-related apps you use every day, like calendar, email, messages, whatsApp, and soon Google Docs, Slack, and phone calls.
2) Better memory of each user based on their past conversations and integrations, so Martin can start to anticipate parameters in the user’s commands (e.g. text the guy from yesterday about the plans we made this morning)
A great way that our early users use Martin is having morning syncs and evening debriefs with the software. At the start/end of each day, they’ll have a 5-10 minute sync about their TODOs for the next day, and Martin will brief them on upcoming tasks and news they’re typically interested in.
Something else Martin does which is unlike other voice assistants is it can have full text conversations with your contacts on your behalf from its own phone number. For example, you can tell it to plan a lunch with a friend, and it can text back and forth with that friend to figure out a time and place. After the text conversation between your friend and Martin is over, Martin reports back to you via a notification and a text. You can also monitor all of its messages with your contacts in the app.
We started building Martin exactly 1 year ago, during our YC batch. It’s definitely a hard product to “complete" because of the many unsolved technical challenges, but we’re making progress step by step. First was the voice interface, which Siri still hasn’t gotten right after more than a decade. We have 2 modes: push-to-talk and handsfree. Handsfree is great for obvious reasons. We’ve gotten our latency down to only a couple seconds max for most commands, and we’ve tuned our own voice activity detection model to minimize the chance of Martin cutting you off (a common problem with voiceGPTs). But, even then, Martin may still cut you off if you pause for 3-5 seconds in the middle of a thought, so we made a push-to-talk mode. For those cases where you want to describe something in detail or just brain-dump to Martin, you might need 20-30 seconds to finish speaking. So just hold down, speak, and release when you’re done—like a walkie talkie.
We’ve also had to tackle a very long tail of integrations, and we want to do each one well. For example, when we launched Google calendar, we wanted to make sure you could add a Google Meet link, invite your contacts to the events, and access secondary calendars. And, you should be able to say things like “set reminders leading up to the event” or “text Eric the details of this event.” So, we pretty much release one new major integration every month.
Finally, there’s the problem of personalization / LLM memory, which is still very unsolved. From each conversation that a user has with their Martin, we try to infer what the user is busy with, worried about, or looking forward to, so in their next “morning sync” or “evening debrief”, Martin can proactively suggest to-dos or goals/topics to discuss with the user. Right now, we use a few different LLMs and many chain-of-thought steps to extract clues from each conversation and have Martin “reflect” periodically to build its memory. But, with all that said we still have a lot of work to do here, and this is just a start!
You can try Martin by going to our website (https://www.trymartin.com) and starting a 7 day free trial. Once you start your trial, you’ll get an access code emailed to you along with the download link for our iOS app. After you enter your access code into the app, you can integrate your calendar, contacts, etc. If you find Martin useful after the trial, we charge our early users (who are generally productivity gurus and prosumers with multiple AI subscriptions) a $30/month subscription.
We can’t wait to hear your thoughts. Any cool experiences with Siri, things you wish a voice assistant could do, or ideas about LLM memory, tool calling, etc. - I’d love to discuss any of these topics with you!
[+] [-] jmagnuss|1 year ago|reply
[+] [-] darweenist|1 year ago|reply
We recently got our CASA Tier-2 compliance done (Cloud Application Security Assessment). We've also gone through Google's OAuth compliance process for every new integration we add that's related to Google. These assessments scan our app and make sure that our software meets pretty stringent standards when it comes to data security and encryption, and that we're not using the data for anything other than the specific features we promise (i.e. not sharing or selling to advertisers, etc.). You can read more about CASA here (https://appdefensealliance.dev/casa). We haven't gone through SOC2 yet, but planning on soon once we have a few more integrations.
[+] [-] toddmorey|1 year ago|reply
As you develop your messaging, I wanted to share the questions I had as I think a lot of users will ask the same:
1. What powers Martin? Is it a custom LLM or powered by OpenAI, Anthropic? 2. Is any of my data ever used in training? 3. Will I always be notified before new texts / calls / actions are taken on my behalf? Does the AI present as me or are my contacts aware that it's an AI assistant that may provide incorrect information? 4. Can I easily and quickly remove all my data and context?
[+] [-] idealboy|1 year ago|reply
I’ve been putting it through its paces and it’s handling some complicated requests correctly the first time. For example:
“There is an art festival in my city this weekend. They have a jazz stage my wife and I would like to check out. Find the schedule for each day and create one event every day it’s happening. In the event description put the schedule for each day, and invite my wife.”
It got it right the first time. Pretty amazing.
I see some folks saying it’s just a “wrapper for an LLM” like that’s easy to do. LLMs are not faerie powder that just work for every use case. The personal assistant use case is extremely difficult, which is why the big players haven’t done it yet.
So bravo for the bravado and actually making it work. Privacy is a concern, but honestly I’m not that worried that you can find out which art festival I’ll be at this weekend. But an oncology appointment? I might.
You should create a system where you cannot access user data, and it can never be shared with third parties. Make that system open source to prove it. Give up the potential upside of using this data for revenue so that Martin becomes what it can be. Otherwise, I’ll never feel confident telling Martin anything I don’t want advertisers to know.
[+] [-] darweenist|1 year ago|reply
We can certainly publish more privacy guarantees in the future - thanks for the suggestions. Our business model is subscriptions, so we won't be going anywhere near ads or data sharing.
[+] [-] scottydelta|1 year ago|reply
[+] [-] kjkjadksj|1 year ago|reply
https://ios.gadgethacks.com/how-to/60-ios-features-apple-sto...
[+] [-] dom96|1 year ago|reply
[+] [-] blueboo|1 year ago|reply
[+] [-] fragmede|1 year ago|reply
[+] [-] pulvinar|1 year ago|reply
I'm sure it won't be long before we see apps that listen, record any "Hey Siri" they hear, and then synthesize that voice to give your phone commands to "tell me my passwords", or more insidious and difficult-to-detect commands.
It seems Apple's new version will be facing this problem too.
[+] [-] ryankrage77|1 year ago|reply
[+] [-] yewenjie|1 year ago|reply
- How did you solve the long-term memory problem? What kind of issues are you facing with scaling the number of tools?
- I like the idea but there's one crucial thing missing for me. I will happily pay for your app if it lets me bring my own API keys/ endpoints for models that I can host, so that I know my data is private and secure.
[+] [-] darweenist|1 year ago|reply
- Right now, we use a combination of RAG and chain of thought for storing memories. At different time intervals, we'll create memories at different levels of granularity. For example, at the end of every conversation, we'll embed some vectors based on specific commands from the user. At the end of each day, we'll have the LLM reflect on key questions related to a user's routine. And every few days, it'll reflect on the user's short/long term goals. This has worked to some degree, but we're still in the very early stages of figuring out how to do long-term memory for an assistant.
- Scaling the number of tools is definitely a struggle since we want to make our integrations as thorough as we can. It takes time, so we just try to keep growing the list consistently. We have an internal goal of adding at least one new major integration every month.
- Love the idea of bringing your own API keys/endpoints. We've gotten this feedback before, so we'll seriously consider it in our next few sprints!
[+] [-] frankdenbow|1 year ago|reply
[+] [-] edanm|1 year ago|reply
As someone who is very interested in using this, may I make two suggestions:
1. Have a list of integrations somewhere on the homepage. It might be there, but if so I missed it. I immediately wanted to know if it can integrate with Obsidian, for example, or Omnifocus. I'm sure others will want to know if "email" means Google only, or Outlook, etc.
2. Make the trial longer. When I see 7 days, what I immediately think is "not enough time to really test this". I'm a busy person, I'm not going to change habits overnight, and unless this thing will immediately integrate into my daily routine (it won't), I'll probably only use it casually the first few times. It would be much better to give me more time to test it. (This is not business advice - maybe I'm wrong and 7 days is better to actually convert users! I'm just giving my immediate reaction.)
[+] [-] robertlagrant|1 year ago|reply
[+] [-] cube2222|1 year ago|reply
My main problem with startups around this is that it’s just a big ask to get access to all my data and store it in their cloud.
[+] [-] toomuchtodo|1 year ago|reply
[+] [-] cellis|1 year ago|reply
[+] [-] darweenist|1 year ago|reply
[+] [-] vendiddy|1 year ago|reply
It has been a few years and my Android assistant is still dumb. I would have expected the iOS/Android assistants to be much better by now.
[+] [-] apwell23|1 year ago|reply
I am guessing its against their philosophy to release product that only works sometimes from the get go. Thats why they have been so demure about whole AI stuff.
[+] [-] Gys|1 year ago|reply
[+] [-] ianbicking|1 year ago|reply
Anyway, a few thoughts...
1. I find the event planning stuff to be kind of stale. Like maybe it will be cool this time, but so far it's part of every demo and concept around AI assistants and it's never ACTUALLY been cool. I wish this was trying to be cool in a new way.
2. The turn-taking for voice input looks kind of awkward. I get why it has to be that way, and there's not really a better solution, but... well, maybe it would be possible to use visual output and voice input, or generally make them complement each other. Many details are better to show visually and can be tedious to listen to.
3. I like the patient and attentive secretary model more than the turn-taking chat. The confirmation turn-taking is a trust exercise (did the AI _really_ hear and understand what I said?) but I think there's other ways to handle that trust. Like being more trustworthy (modern non-streaming speech recognition works really well!), making things easy to undo, or detecting unlikely commands and require verification.
4. For example, when reviewing a to-do list, I'd rather it show the list and I can just say "yeah, I finished item 1 and 2, and I was able to pick up milk but there's still some other groceries I need to get for tonight" and have it complete and revise entries based on that.
5. Generally to-do and task management is 10x more interesting to me than calendaring. But you should have a theory, not just be a layer over something else. I should be able to break down tasks, complete subtasks, identify partial completion and have it identify the remaining portions, get suggestions on breakdown, get advice on which tasks to complete when, etc.
6. Another interesting thing would be a kind of personal database. I would love to be able to unload a lot of information from my head and know that it will be put someplace where it can be meaningfully retrieved, combined with other data, etc. Like if I have certain bill payments or house maintenance I want to remember or something, I don't want to turn that into calendar items. Lots of them aren't even fully articulated, or the structure will emerge as more information becomes available. But I want to get started before I have carefully defined the task, and an AI assistant could do that.
[+] [-] darweenist|1 year ago|reply
We'll be experimenting a lot and releasing updates to Martin's home screen layout and feed soon! There's a good chance this will come with changes in how we do task management altogether.
Also totally hear you on the personal database idea. We've been toying with similar ideas for a while. Many users already do this with Martin, basically brain dump in a long voice session, and it'll suggest reminders/calendar events for you. We're still figuring out how to display this personal DB info to the user in a UI though, so would love to hear your suggestions.
[+] [-] thriftwy|1 year ago|reply
1. https://languagehat.com/sorokins-norma/
[+] [-] rakkhi|1 year ago|reply
[+] [-] darweenist|1 year ago|reply
[+] [-] hahnbee|1 year ago|reply
[+] [-] edreichua|1 year ago|reply
[+] [-] darweenist|1 year ago|reply
[+] [-] ilrwbwrkhv|1 year ago|reply
Maybe the founders they are funding are not diverse enough. Is there too much tracking on which universities they went to? So the same set is applying and getting funded?
[+] [-] salamo|1 year ago|reply
The exception for me would be situations where I can't use my hands, like driving. I don't want to have to look at a screen. If a voice agent could replicate the functionality of CarPlay, that would be really useful.
[+] [-] darweenist|1 year ago|reply
[+] [-] written-beyond|1 year ago|reply
Also I feel you, about running into all of the challenges your facing with LLMs. We've run into quiet a few road blocks, but your comments summarises it the best. Just keep working on it step by step.
[+] [-] darweenist|1 year ago|reply
[+] [-] 0x62|1 year ago|reply
- do some research on a given company/individual/website and give me a summary.
- preferably also identify a contact email.
- handle selecting a good time for meetings according to my availability and preferences.
- handle the communication with the other party.
- let me know when it is arranged, or if it's given up.
I signed up and gave it a UK phone number, and got a UK number back for texting Martin. I'm not sure why it has to be SMS when it could be an in-app chat. I was expecting to get a confirmation SMS or similar, but it just accepted it straight away. When I texted the number I was given (several times), it was delivered but there was no reply.
Martin sent me an email welcoming me. I replied asking it to set up a meeting for early next week with another email address. Martin replied saying it is unable to email people on my behalf, and suggested I set it up myself.
> Unfortunately, I am currently unable to send emails to other people on your behalf. However, you can easily send an email to ** to schedule the meeting for early next week in the afternoon.
I reminded Martin that there is an example on the website homepage of doing just that, and it replied saying it can indeed schedule meetings, and asked for the details again. I replied with the same details, and it confirmed the meeting was set up.
I checked my other email, and there was no message setting anything up. I told Martin that the other party needs to know about the email, and it replied with:
> Understood. I'll make sure to inform ** about the meeting details.
Still nothing received. Furthermore, I checked the app and I haven't even connected my calendar, so I'm surprised it didn't warn me or prompt me to do this when I asked for a meeting.
I gave up with that and decided to try something else. I forwarded Martin an email thread from a lead, which included a lot of back story on their organization, offering, and some areas that they think we could potentially collaborate on. I asked Martin to find out more about the company, and evaluate the options for collaboration.
This lead is in the AI space, with their primary product being a document digitisation solution to help surface and discover business documents.
Martin replied describing it as a "nearbound revenue platform to streamline revenue operations", with a key feature being "Automated lead scoring and distribution to prioritize high potential leads". As far as evaluating the collaboration opportunities, it instead gave me a list of collaboration features within the platform, none of which exist.
At the end, it linked to a blog post to their recent funding round. Except, the blog post was from a completely unrelated company with a similar name. Bear in mind that the originally forwarded email was from their business email account, and the body contained multiple links and references to their website.
I decided to try one more test, and asked it to do some research on my own business website and let me know what it finds out. It's been 20 minutes, and I haven't had a reply. I checked the app to see if there was any indication it's working on something for me, but nothing their either.
I love the idea of Martin, but I'll be canceling my trial - it just doesn't seem anywhere near ready yet - especially given I have to trust it to communicate on my behalf.
[+] [-] darweenist|1 year ago|reply
- I totally resonate with your criteria for an AI PA - this is very much what we're working towards with our email integration. We had been focused on voice for a while, but recently started tackling all the email use cases. Really want to get these right for you!
- Sorry for the poor onboarding job - we should make it more clear that you have to sync your calendar before we send you an email inviting you to send and forward scheduling items to Martin.
- For sending emails to contacts, this is one of our upcoming integrations that we've been building for a while - but just not ready yet! We want to make it able to send/reply to emails and fully act on threads that you attach it to. This means issuing you a unique email address for "your Martin" and managing it's behavior on threads and memory of other contacts. It's a harder problem than we first anticipated, so we're working through it steadily! It should be ready in the next month or so. For now, the communications feature is just limited to texting contacts on your behalf.
- For "deep searches", it definitely isn't the greatest at digging into a topic or generating a thorough briefing for you right now. We're not sure how deep we'll go into this use case in the future, but we do plan on integrating with more specialized functions, like LinkedIn, Twitter, Maps, etc. which should make this a lot better.
Sorry again for the poor onboarding experience. I think we also got an email from you, so will reply there as well to ask for more feedback!
[+] [-] hnlurker22|1 year ago|reply
[+] [-] awwstn|1 year ago|reply
[+] [-] yrcyrc|1 year ago|reply
[+] [-] logicchains|1 year ago|reply
[+] [-] tiahura|1 year ago|reply
Want big $$$? Support exchange 365 via MS Graph API or whatever today's preferred api is.
[+] [-] darweenist|1 year ago|reply
Exchange is on our list! Started the Microsoft compliance process a couple months ago - expecting to get support for at least outlook or word rolled out this year.
[+] [-] yewenjie|1 year ago|reply
[+] [-] lannisterstark|1 year ago|reply
[+] [-] darweenist|1 year ago|reply