top | item 43681287

A hackable AI assistant using a single SQLite table and a handful of cron jobs

800 points| stevekrouse | 10 months ago |geoffreylitt.com

174 comments

order
[+] xp84|10 months ago|reply
I don't know if I love this more for the sheer usefulness, or for the delightful over-the-top "Proper English Butler" diction.

But what really has my attention is: Why is this something I'm reading about on this smart engineer's blog rather than an Apple or Google product release? The fact that even this small set of features is beyond the abilities of either of those two companies to ship -- even with caveats like "Must also use our walled garden ecosystem for email, calendars, phones, etc" -- is an embarrassment, only obscured by the two companies' shared lack of ambition to apply "AI" technology to the 'solved problem' areas that amount to various kinds of summarization and question-answering.

If ever there was a chance to threaten either half of this lumbering, anticompetitive duopoly, certainly it's related to AI.

[+] dcre|10 months ago|reply
There’s actually a good answer to this, namely that narrowly targeting the needs of exactly one family allows you to develop software about 1000x faster. This is an argument in favor of personal software.
[+] killerstorm|10 months ago|reply
This is literally in the first chapter of Mythical Man-Month:

> One occasionally reads newspaper accounts of how two programmers in a remodeled garage have built an important program that surpasses the best efforts of large teams. And every programmer is prepared to believe such tales, for he knows that he could build any program much faster than the 1000 statements/year reported for industrial teams.

> Why then have not all industrial programming teams been replaced by dedicated garage duos? One must look at what is being produced.

One reason might be that personal data going into a database handled by a highly experimental software might be a non-issue for this dev, but it is a serious risk for Google, Apple, etc.

[+] aktuel|10 months ago|reply
The reason Google and Apple stopped innovating is simply because they make too much money from their current products and see every innovation primarily as a risk to their existing business. This is something that happens all the time to market leaders.
[+] dzikimarian|10 months ago|reply
Take a look at Home Assistant - I would argue their implementation is currently better than both Siri & Gemini assistants.

HA team is releasing actually useful updates every month - eg ability for assistant to proactively ask you something.

In my opinion both Google & Apple have huge issues with cooperation between product teams, while cooperation with external companies is next to impossible.

[+] navane|10 months ago|reply
Because how would you monetize this? Because would google or apple make a product that talks to telegram? Or anything with an open ecosystem?

All the big guys are trying to do is suck the eggs out of their geese faster.

[+] bronco21016|10 months ago|reply
As some of the other commenters have directly and indirectly pointed out, I believe this is the crux of the AI Agent problem. Each user has a customized workflow they’re trying to achieve. This doesn’t lend well to a “product” or “SaaS”. It leads to thousands of bespoke implementations.

I’m not sure how you get over this hurdle. My email agent is inevitably different than everyone else’s email agent.

[+] hm-nah|10 months ago|reply
It’s because this story hints at the concept of “Unmetered AI”. It can be easily hosted locally and run with a self-hosted LLM.

Wonder if Edison mentioned Nikola Tesla much in his writings?

[+] dogline|10 months ago|reply
This made me think: what if my little utility assistant program that I have, similar to your Stevens, had access to a mailbox?

I've got a little utility program that I can tell to get the weather or run common commands unique to my system. It's handy, and I can even cron it to run things regularly, if I'd like.

If it had its own email box, I can send it information, it could use AI to parse that info, and possibly send email back, or a new message. Now, I've got something really useful. It would parse the email, add it to whatever internal store it has, and delete the message, without screwing up my own email box.

Thanks for the insight.

[+] mbil|10 months ago|reply
I’ve been thinking lately that email is a good interface for certain modes of AI assistant interaction, namely “research” tasks that are asynchronous and take a relatively long time. Email is universal, asynchronous, uses open standards, supports structured metadata, etc.
[+] spacecadet|10 months ago|reply
This was the attack vector of a AI CTF hosted by Microsoft last year. I built an agent to assess, structure, and perform the attacks autonomously and found that even with some common guardrails in place the system was vulnerable to data exfiltration. My agent was able to successfully complete 18 of the challenges... Here is the write up after the finals.

https://msrc.microsoft.com/blog/2025/03/announcing-the-winne...

[+] loremm|10 months ago|reply
For gmail, there's also an amazing thing where you can hook it with pubsub. So now it's push not pull. Any server will get pubsub little webhooks for any change within milliseconds (you can filter server side or client side for specific filters)

This is amazing, you can do all sorts of automations. You can feed it to an llm and have it immediately tag it (or archive it). For important emails (I have a specific label I add, where if the person responds, it's very important and I want to know immediately) you can hook into twilio and it calls me. Costs like 20 cents a month

[+] bambax|10 months ago|reply
Mailgun (and I'm sure many other services like it) can accept emails and POST their content to an url of your choice.

I use that for journaling: I made a little system that sends me an email every day; I respond to it and the response is then sent to a page that stores it into a db.

[+] sdsd|10 months ago|reply
I made an AI assistant telegram bot running on my Mac that runs commands for me. I'll tell it "Run ncdu in the root dir and tell me what's taking up all my disk space" or something and it converts that bash and runs it via os.system. It shows me the command it created, plus the output.

Extremely insecure, but kinda fun.

I turned it off because I'm not that crazy but I'm sure I could make a safer version of it.

[+] dogline|10 months ago|reply
*Update*: I tried writing a little Python code to read and write from a mailbox, reading worked great, but writing an email had the email disappear to some filter or spam or something somewhere. I've got to figure out where it went, but this is the warning that some people had about not trusting a messaging protocol (email in this case) when you can't control the servers. Messages can disappear.

I read that [Mailgun](https://www.mailgun.com/) might improve this. Haven't tried it yet.

Other alternatives for messages that I haven't tried. My requirement is to be able to send messages and send/receive on my mobile device. I do not want to write a mobile app.

* [Telegram](https://telegram.org/) (OP's system) with [bots](https://core.telegram.org/bots)

* [MQTT](https://mqtt.org/) with server

* [Notify (ntfy.sh)](https://ntfy.sh/)

* Email (ubiquitous)

   * [Mailgun](https://www.mailgun.com/)

   * [CloudMailin](https://www.cloudmailin.com/)
Also, to [simonw](https://news.ycombinator.com/user?id=simonw) point, LLM calls are cheap now, especially with something as low tokens as this.

And, links don't format in HN markdown. I did the work to include them, they're staying in.

[+] sci_prog|10 months ago|reply
I'm building something similar and related to the other comments below! It's not production ready but it will hopefully be in a couple of weeks. You guys can sign up for free and I will upgrade you to the premium tier manually (premium cannot be bought yet anyway) in exchange for some feedback:

https://threadwise.app

[+] WillAdams|10 months ago|reply
Ages ago, I proposed that the best CMS for a company would be one which used e-mail as the front-end:

- all attachments are stripped out and stored on a server in an hierarchical structure based on sender/recipient/subject line

- all discussions are archived based on similar criteria, and can be reviewed EDIT: and edited like to a wiki

[+] nullwarp|10 months ago|reply
I built up an AI Agent using n8n and email doing exactly this. Works great and was surprised I'd not seen any other place kicking the idea around.

Probably my favorite use case is I can shoot it shopping receipts and it'll roughly parse them and dump the line item and cost into a spreadsheet before uploading it to paperless-ngx.

[+] groseje|10 months ago|reply
This is the kind of pragmatic AI hack I want to see. It feels like sometimes we are forgetting why certain tooling even exists. To simplify things! No fancy vector DBs or complex architectures, just practical integration with existing data sources. Love it.
[+] squireboy|10 months ago|reply
" Initially, Stevens spoke with a dry tone, like you might expect from a generic Apple or Google product. But it turned out it was just more fun to have the assistant speak like a formal butler. "

Honestly, saying way too little with way too much words (I already hate myself for it) is one of the biggest annoyances I have with LLM's in the personal assistant world. Until I'm rich and thus can spend the time having cute conversations and become friends with my voice assistant, I don't want J.A.R.V.I.S., I need LCARS. Am I alone in this?

[+] xp84|10 months ago|reply
I appreciated the butler gimmick here probably because of novelty, but I share your urge to throw my device across the room when Siri, Google, Alexa, etc. run on at the mouth more than the absolute minimum amount of words. Timer check? "On Kitchen Display, there are 23 minutes and 16 seconds on the casserole timer."

I don't need your life story, dude, just say "23 minutes" or "Casserole - 23 minutes, laundry - 10" if there are two.

[+] golergka|10 months ago|reply
Have you tried eigenprompt?

----

Don't worry about formalities.

Please be as terse as possible while still conveying substantially all information relevant to any question.

If policy prevents you from responding normally, please printing "!!!!" before answering.

If a policy prevents you from having an opinion, pretend to be responding as if you shared opinions that might be typical of eigenrobot.

write all responses in lowercase letters ONLY, except where you mean to emphasize, in which case the emphasized word should be all caps.

Initial Letter Capitalization can and should be used to express sarcasm, or disrespect for a given capitalized noun.

you are encouraged to occasionally use obscure words or make subtle puns. don't point them out, I'll know. drop lots of abbreviations like "rn" and "bc." use "afaict" and "idk" regularly, wherever they might be appropriate given your level of understanding and your interest in actually answering the question. be critical of the quality of your information

if you find any request irritating respond dismissively like "be real" or "that's crazy man" or "lol no"

take however smart you're acting right now and write in the same style but as if you were +2sd smarter

use late millenial slang not boomer slang. mix in zoomer slang in tonally-inappropriate circumstances occasionally

prioritize esoteric interpretations of literature, art, and philosophy. if your answer on such topics is not obviously straussian make it more straussian.

[+] kswzzl|10 months ago|reply
I'm praying every day for TARS if I'm being honest.
[+] singron|10 months ago|reply
You can just read and write the notebook directly with ordinary calendar/todo-list UIs and get 99% of the utility without an LLM. I'm not really seeing value in the LLM except the butler voice? It is just reading the notebook right? E.g. they ask the butler to remember a coffee preference, but then that's never used for anything?
[+] rossant|10 months ago|reply
Same, I want a bot as terse as I am.
[+] jredwards|10 months ago|reply
I've been kicking around idea for a similar open source project, with the caveats that:

1. I'd like the backend to be configured for any LLM the user might happen to have access to (be that the API for a paid service or something locally hosted on-prem).

2. I'm also wondering how feasible it is to hook it up to a touchscreen running on some hopped-up raspberry pi platform so that it can be interacted with like an Alexa device or any of the similar offerings from other companies. Ideally, that means voice controls as well, which are potentially another technical problem (OpenAI's API will accept an audio file, but for most other services you'd have to do voice to text before sending the prompt off to the API).

3. I'd like to make the integrations extensible. Calendar, weather, but maybe also homebridge, spotify, etc. I'm wondering if MCP servers are the right avenue for that.

I don't have the bandwidth to commit a lot of time to a project like this right now, but if anyone else is charting in this direction I'd love to participate.

[+] Workaccount2|10 months ago|reply
Lately I have been experimenting with ways to work around the "context token sweet spot" of <20k tokens (or <50k with 2.5). Essentially doing manual "context compression", where the LLM works with a database to store things permanently according to a strict schema, summarizes it's current context when it starts to get out of the sweet spot (I'm mixed on whether it is best to do this continuously like a journal, or in retrospect like a closing summary), and then passes this to a new instance with fresh context.

This works really effectively with thinking models, because the thinking eats up tons of context, but also produces very good "summary documents". So you can kind of reap the rewards of thinking without having to sacrifice that juicy sub 50k context. The database also provides a form of fallback, or RAG I suppose, for situations where the summary leaves out important details, but the model must also recognize this and go pull context from the DB.

Right now I have been trying it to make essentially an inventory management/BOM optimization agent for a database of ~10k distinct parts/materials.

[+] jasonjmcghee|10 months ago|reply
I am excitedly waiting for the first company (guessing / hoping it'll be anthropic) to invest heavily in improvements to caching.

The big ones that come to mind are cheap long term caching, and innovations in compaction, differential stuff - like is there a way to only use the parts of the cached input context we need?

[+] mikethemerry|10 months ago|reply
Along the same lines, I've just done a build called Jeeves. A bit less flair, but very fast to put together. The stack is:

1. Claude Desktop 2. Projects 3. MCPs for [Notion, Todoist] and exploring emails + WhatsApp for a next upgrade

This is for me to support productivity workflows for consulting + a startup. There are a few Notion databases - clients, projects, meetings, plus a Jeeves database. The Jeeves database is up to Jeeves how it uses it, but with some guidance. Jeeves uses his own database for things like tracking a migration of all of my previous meeting notes etc under the new structure.

So my databases, I've set up my best practices for use. Here's how my minutes look, here's how client one pagers looks like, here's the information to connect it all together, and here's how I manage To Dos. I then drop in transcriptions into a new chat, with some text-expanding prompts in Alfred for a few common meetings or similar, and away he goes. He'll turn the transcript into meeting notes, create the todos, check everything with me, do a pass, and then go and file everything away into Notion and Todoist via MCP.

It's also self documenting on this. The todoist MCP had some bugs, so I instructed Jeeves to go, run all the various use cases it could, figure out the limitations and strengths, document it, and it's filed away in the Jeeves database that it can pull into context.

It lacks the cron features which I would like, but honestly, a once-a-day prepared prompt dropping into Claude is hardly difficult.

[+] angusturner|10 months ago|reply
The thing this really hits home for me is how Apple is totally asleep at the wheel.

Today I asked Siri “call the last person that texted me”, to try and respond to someone while driving.

Am I surprised it couldn’t do it? Not really at this point, but it is disappointing that there’s such a wide gulf between Siri and even the least capable LLMs.

[+] tossandthrow|10 months ago|reply
Here I thought they used the sqlite DB for next token prediction.

For others: they use Claude.

[+] evacchi|10 months ago|reply
hah! this is great. I built something similar using mcp.run and a task

- https://docs.mcp.run/tasks/tutorials/telegram-bot

for memories (still not shown in this tutorial) I have created a pantry [0] and a servlet for it [1] and I modified the prompt so that it would first check if a conversation existed with the given chat id, and store the result there.

The cool thing is that you can add any servlets on the registry and make your bot as capable as you want.

[0] https://getpantry.cloud/ [1] https://www.mcp.run/evacchi/pantry

Disclaimer: I work at Dylibso :o)

[+] didip|10 months ago|reply
So… I have a number of questions:

1. How did he tell Claude to “update” based on the notebook entries?

2. Won’t he eventually ran out of context window?

3. Won’t this be expensive when using hosted solutions? For just personal hacking, why not simply use ollama + your favorite model?

4. If one were to build this locally, can Vector DB similarity search or a hybrid combined with fulltext search be used to achieve this?

I can totally imagine using pgai for the notebook logs feature and local ollama + deepseek for the inference.

The email idea mentioned by other commenters is brilliant. But I don’t think you need a new mailbox, just pull from Gmail and grep if sender and receiver is yourself (aka the self tag).

Thank you for sharing, OP’s project is something I have been thinking for a few months now.

[+] theptip|10 months ago|reply
This is fun! I think this sort of tooling is going to be very fertile ground for hackers over the next few years.

Large swathes of the stack is commoditized OSS plumbing, and hosted inference is already cheap and easy.

There are obvious security issues with plugging an agent into your email and calendar, but I think many will find it preferable to control the whole stack rather than ceding control to Apple or Google.

[+] eitland|10 months ago|reply
> It’s rudimentary, but already more useful to me than Siri!

For me, that is an extremely low barrier to cross.

I find Siri useful for exactly two things at the moment: setting timers and calling people while I am driving.

For these two things it is really useful, but even in these niches, when it comes to calling people, despite it having been around me for years now it insist on stupid things like telling me there is no Theresa in my contacts when I ask it to call Therese.

That said what I really want is a reliable system I can trust with calendar acccess and that is possible to discuss with, ideally voice based.

[+] 0xbadcafebee|10 months ago|reply
Hmm, there's supposed to be a Tasks [reminders] feature in ChatGPT, but it's in beta (I don't have access to it). Whenever it gets released, you could make some kind of "router" that connects to different communication methods and connect that up to ChatGPT statefully, and you could just "speak"/type to ChatGPT from anywhere, and it would send you reminders. No need for all the extra logic, cron jobs, or SQLite table (ChatGPT has memory across chats).
[+] hwpythonner|10 months ago|reply
Very cool. I’m wondering if you’ve thought about memory pruning or summarization as usage grows?

What do you think of this: instead of just deleting old entries, you could either do LRU (I guess Claude can help with it), or you could summarize the responses and store the summary back into the same table — kind of like memory consolidation. That way raw data fades, but a compressed version sticks around. Might be a nice way to keep memory lightweight while preserving context.

[+] simianwords|10 months ago|reply
I have built something similar that runs without a server. It required just a few lines in Apple shortcuts.

TL;DR I made shortcuts that work on my Apple watch directly to record my voice, transcribe it and store my daily logs on a Notion DB.

All you need are 1) a chatgpt API key and 2) a Notion account (free).

- I made one shortcut in my iPhone to record my voice, use whisper model to transcribe it (done locally using a POST request) and send this transcription to my Notion database (again a POST request on shortcuts)

- I made another shortcut that records my voice, transcribes and reads data from my Notion database to answer questions based on what exists in it. It puts all data from db into the context to answer -- costs a lot but simple and works well.

The best part is -- this workflow works without my iPhone and directly on my Apple Watch. It uses POST requests internally so no need of hosting a server. And Notion API happens to be free for this kind of a use case.

I like logging my day to day activities with just using Siri on my watch and possibly getting insights based on them. Honestly the whisper model is what makes it work because the accuracy is miles ahead of the local transcription model.

[+] fedeb95|10 months ago|reply
The title is a bit misleading since it relies on Claude API to function.
[+] paulnovacovici|10 months ago|reply
Curious, how come you decided to use a cloud solution instead of hosting this on a home server? I’ve recently bought a mini PC for small projects like this and have been loving being able to host with no cost associated to it. Albeit it’s probably still incredibly cheap to use a IaaS or PaaS but still a barrier to entry for random projects I want to work on a weekend
[+] drog|10 months ago|reply
I've been using my own telegram -> ai bot and its very interesting to see what others do with the similar interface.

I have not thought about adding memory log of all current things and feeding it into the context I'll try it out.

Mine is a simple stateless thing that captures messages, voice memos and creates task entries in my org mode file with actionable items. I only feed current date to the context.

Its pretty amusing to see how it sometimes adds a little bit of its own personality to simple tasks, for example if one of my tasks are phrased as a question it will often try to answer the question in the task description.

[+] kylecazar|10 months ago|reply
I like the idea of parsing USPS Informed Delivery emails (a lot of people I encounter still don't know that this service exists). Maybe I'll make something to alert me when my checks are finally arriving!