top | item 42725163

Build a Database in 3000 Lines with 0 Dependencies

422 points| not_a_boat | 1 year ago |build-your-own.org | reply

56 comments

order
[+] btown|1 year ago|reply
If you want 1 dependency rather than 0, there are various low-level libraries designed for building full-featured databases.

For one of these, RocksDB, one reference point is how CockroachDB was built on top of them for many years and many successful Jepsen tests (until they transitioned to an in-house solution).

https://www.cockroachlabs.com/blog/cockroachdb-on-rocksd/

https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/

Another possibility is Apple's FoundationDB, with an interesting discussion here: https://news.ycombinator.com/item?id=37552085

[+] vrnvu|1 year ago|reply
A similar resource I recently discovered and it’s not that popular: https://github.com/pingcap/talent-plan/tree/master/courses/r...

Writing a Bitcask(KV wal) like db in Rust. Really cool and simple ideas. The white paper is like 5 pages.

[+] asa400|1 year ago|reply
Bitcask is great, it's such an elegant idea. I've built a few different toy implementations in different languages and learned something new each time. YMMV based on how many deps you do or don't want to use, how complete you want to go, but it's a totally doable small-ish project.
[+] andai|1 year ago|reply
I looked into this at one point, I was typing out entire codebases for didactic purposes: SQLite 3 was 120,000 lines of code, but SQLite 2 was 12,000.

So for a bit more effort you get a battle tested real world thing!

[+] cryptonector|1 year ago|reply
The proprietary test suite for SQLite3 is much much larger still. The battle-testedness comes in great part from that.
[+] ianmcgowan|1 year ago|reply
Really puts the auto- in didact! Very curious to hear how this worked for you; it’s almost directly the opposite of the copilot approach.

I learned assembler by typing in listings from magazines and hand dis-assembling and debugging on paper. Your approach seems similar in spirit, but who has the times these days?

[+] esmy|1 year ago|reply
Wait you took a repo and started typing it into the IDE? Could you please expand on what benefits you noticed and how it affected your understanding of the language? It sounds like a fascinating way to force attention to the code simply reading it wouldn't.
[+] tonyedgecombe|1 year ago|reply
>I was typing out entire codebases for didactic purposes

I've read about an author who did this (I can't remember their name right now), writing down the works of another author they wanted to learn from.

[+] frankie_t|1 year ago|reply
can you elaborate on typing out for didactic purposes, please?
[+] 867-5309|1 year ago|reply
one does not simply type out 120 000 lines of code..
[+] cryptonector|1 year ago|reply
Re: copy-on-write (CoW) B-tree vs append-only log + non-CoW B-tree, why not both?

I.e., just write one file (or several) as a B-tree + a log, appending to log, and once in a while merging log entries into the B-tree in a CoW manner. Essentially that's what ZFS does, except it's optional when it really shouldn't be. The whole point of the log is to amortize the cost of the copy-on-write B-tree updates because CoW B-tree updates incur a great deal of write magnification due to having to write all new interior blocks for all leaf node writes. If you wait to accumulate a bunch of transactions then when you finally merge them into the tree you will be able to share many new interior nodes for all those leaf nodes. So just make the log a first-class part of the database.

Also, the log can include small indices of log entries since the last B-tree merge, and then you can accumulate even more transactions in the log before having to merge into the B-tree, thus further amortizing all that write magnification. This approaches an LSM, but with a B-tree at the oldest layer.

[+] qianli_cs|1 year ago|reply
I've read a similar series from Phil back in 2020: "Writing a SQL database from scratch in Go" https://notes.eatonphil.com/database-basics.html

The code is available on GitHub: https://github.com/eatonphil/gosql (it's specifically a PostgreSQL implementation in Go).

It's cool to build a database in 3000 lines, but for a real production-ready database you'll need testing. Would love to see some coverage on correctness and reliability tests. For example, SQLite has about 590 times more test code than the library itself. (https://www.sqlite.org/testing.html)

[+] _zoltan_|1 year ago|reply
I've started learning Velox last year, and it's a staggering amount of code. Sure, it has a ton of dependency because it wants to support so many things, but I feel like the core itself is also very complex.

I'm not sold on complexity being a necessity in software engineering, as I'm sure a lot of you also aren't. Yet we see a lot of behemoth projects.

[+] wwarren|1 year ago|reply
Man I got the first edition of this book and it was so bad. Hopefully this is better…
[+] miffy900|1 year ago|reply
Can you elaborate on why it was so bad?
[+] koushik_x|1 year ago|reply
I've recently started to think about covering some advanced topics like compilers, databases, interpreters. Do u know any good oss repos where I can learn and build them from scratch on my own
[+] amelius|1 year ago|reply
Waiting for a post where he writes a distributed database.
[+] MarcelOlsz|1 year ago|reply
You're going to feel awful silly when he does it in like, 3 lines.