top | item 47049307

Gave Claude photographic memory for $0.0002/screenshot

6 points| jzapletal | 13 days ago |github.com

7 comments

jzapletal|13 days ago

We built an open-source tool that screenshots your desktop and feeds summaries to Claude/Cursor via MCP.

What surprised us:

- Cost: $0.0002/screenshot (we budgeted 100x more), guess cloud vision APIs got cheap fast

- CPU: 5% (exp. 50%) and laptop stays cool

- Quality: night and day vs local models, we tried running vision locally first and it was mediocre

It works by triggering a screenshot on activity, sending it to a cloud vision model for summarization, then deleting the screenshot and storing only the text in local SQLite. You query it via MCP – "what was I working on before lunch?" and Claude actually knows.

quinncom|13 days ago

Screen sharing to any remote API is a nonstarter for me. I don’t care if the API claims ZDR; Snowden’s revelations are still echoing. So, I appreciate that the app supports a custom endpoint for local models.

Which local models did you try? GLM-OCR seems like it would excel at this: https://huggingface.co/zai-org/GLM-OCR

BloondAndDoom|13 days ago

This is great stuff, have you tried with local models? Summarization etc. is easy but I haven’t played with image to text models locally? Any ideas. I can run 32b models fine and for summarization kind of tasks they are extremely good I’d even say more than necessary

fidorka|13 days ago

Nice, could I provide this memory to openclaw as well?