Show HN: I replaced vector databases with Git for AI memory (PoC)
198 points| alexmrv | 6 months ago |github.com
The insight: Git already solved versioned document management. Why are we building complex vector stores when we could just use markdown files with Git's built-in diff/blame/history?
How it works:
Memories stored as markdown files in a Git repo Each conversation = one commit git diff shows how understanding evolves over time BM25 for search (no embeddings needed) LLMs generate search queries from conversation context Example: Ask "how has my project evolved?" and it uses git diff to show actual changes in understanding, not just similarity scores.
This is very much a PoC - rough edges everywhere, not production ready. But it's been working surprisingly well for personal use. The entire index for a year of conversations fits in ~100MB RAM with sub-second retrieval.
The cool part: You can git checkout to any point in time and see exactly what the AI knew then. Perfect reproducibility, human-readable storage, and you can manually edit memories if needed.
GitHub: https://github.com/Growth-Kinetics/DiffMem
Stack: Python, GitPython, rank-bm25, OpenRouter for LLM orchestration. MIT licensed.
Would love feedback on the approach. Is this crazy or clever? What am I missing that will bite me later?
BenoitP|6 months ago
aszen|6 months ago
alexmrv|6 months ago
Then mention she is 10,
a few years later she is 12 but now i call her by her name.
I have struggled to get any of the RAG approaches to handle this effectively. It is also 3 entries, but 2 of them are no longer useful, they are nothing but noise in the system.
petesergeant|6 months ago
> Why are we building complex vector stores
Because we want to use embeddings.
OutOfHere|6 months ago
Because BM25 ostensibly relies on word matching, there is no way it will extend to concept matching.
cchance|6 months ago
lsb|6 months ago
That is what's known in FAISS as a "flat" index, just one thing after another. And obviously you can query by primary key to the key-value store that is git, and do atomic updates as you'd expect. In SQL land this is an unindexed column, you can do primary key lookups on the table, or you can look through every row in order to find what you want.
If you don't need fast query times, this could work great! You could also use SQL (maybe an AWS Aurora Postgres/MySQL table?) and stuff the fact and its embedding into a table, and get declarative relational queries (find me the closest 10 statements users A-J have made to embedding [0.1, 0.2, -0.1, ...] within the past day). Lots of SQL databases are getting embedding search (Postgres, sqlite, and more) so that will allow your embedding search to happen in a few milliseconds instead of a few seconds.
It could be worth sketching out how to use SQLite for your application, instead of using files on disk: SQLite was designed to be a better alternative to opening a file (what happens if power goes out while you are writing a file? what happens if you want to update two people's records, and not get caught mid-update by another web app process?) and is very well supported by many language ecosystems.
Then, to take full advantage of vector embedding engines: what happens if my embedding is 1024 dimensions and each one is a 32 bit floating point value? Do I need to save all of that precision? Is 16-bit okay? 8-bit floats? What about reducing the dimensionality? Is it good enough accuracy and recall if I represent each dimension with an index to a palette of the best 256 floats for that dimension? What about representing each pair of dimensions with an index to a palette of the best 256 pairs of floats for those two dimensions? What about, instead of looking through every embedding one by one, we know that people talk about one of three different topics, and we have three different indices for each of those major topics, and to find your nearest neighbors you want to first find your closest topic (or maybe closest two topics?) and then search in those lower indices? Each of these hypotheticals is literally a different “index string” in an embedding search called FAISS, and could easily be thousands of lines of code if you did it yourself.
It’s definitely a good learning experience to implement your own embedding database atop git! Especially if you run it in production! 100MB is small enough that anything reasonable is going to be fast.
mingtianzhang|6 months ago
meander_water|6 months ago
Also, there are tradeoffs associated with using BM25 instead of embedding similarity. You're essentially trading semantic understanding for computational speed and keyword matching.
[0] https://github.com/xhluca/bm25s
mattnewton|6 months ago
I found it was both much simpler and more accurate at the cost of marginally more time and tokens, compared to RAG on embedded chunks with a vector store.
Shameless plug- https://www.matthewnewton.com/blog/replacing-rag
jarirajari|6 months ago
cc_ashby|6 months ago
alexmrv|6 months ago
rekttrader|6 months ago
No shade on your project, this is an emerging space and we can all use novel approaches.
Keep it up!
alexmrv|6 months ago
bob1029|6 months ago
For code search only, BM25 might be a bit overkill and not exactly what you want. FM indexes would be a simpler and faster way to implement pure substring search.
Maybe having both kinds of search at the same time could work better than either in isolation. You could frame them as "semantic" and "exact" search from the perspective of the LLM tool calls. The prompt could then say things like "for searching the codebase use FunctionA, for searching requirements or issues, use FunctionB."
albertdessaint|6 months ago
namrog84|6 months ago
I could envision a bunch of use cases about this workikg well. Ive personally encounter scenadios where sometimes the ai gets hung up on irrelevant outdated fact. But could still look up if specifically needed.
I could see even an automated short summary of all history that is outdated being updated in the vector db from this too. So not all context is lost.
Keep up the great work!
richardblythman|6 months ago
jmtulloss|6 months ago
alexmrv|6 months ago
jerpint|6 months ago
https://github.com/jerpint/context-llemur
Major difference is a conversation doesn’t get stored, the LLM (or you) can use the MCP/CLI to update with the relevant context updates
danshalev7|6 months ago
unknown|6 months ago
[deleted]
aszen|6 months ago
alexmrv|6 months ago
And you can _choose_ to explore the history, which in the most common case is not even needed.
entanglr|6 months ago
simplecto|6 months ago
The use of commit-hooks is also very clever (mentioned here in the replies)
unknown|6 months ago
[deleted]
_pdp_|6 months ago
nunodonato|6 months ago
mjk3026|6 months ago
cc_ashby|6 months ago
page_index|6 months ago
[deleted]
skyzouwdev|6 months ago
[deleted]
unknown|6 months ago
[deleted]
thecopy|6 months ago
alexmrv|6 months ago
unknown|6 months ago
[deleted]