If you want 1 dependency rather than 0, there are various low-level libraries designed for building full-featured databases.
For one of these, RocksDB, one reference point is how CockroachDB was built on top of them for many years and many successful Jepsen tests (until they transitioned to an in-house solution).
Bitcask is great, it's such an elegant idea. I've built a few different toy implementations in different languages and learned something new each time. YMMV based on how many deps you do or don't want to use, how complete you want to go, but it's a totally doable small-ish project.
Really puts the auto- in didact! Very curious to hear how this worked for you; it’s almost directly the opposite of the copilot approach.
I learned assembler by typing in listings from magazines and hand dis-assembling and debugging on paper. Your approach seems similar in spirit, but who has the times these days?
Wait you took a repo and started typing it into the IDE? Could you please expand on what benefits you noticed and how it affected your understanding of the language? It sounds like a fascinating way to force attention to the code simply reading it wouldn't.
Re: copy-on-write (CoW) B-tree vs append-only log + non-CoW B-tree, why not both?
I.e., just write one file (or several) as a B-tree + a log, appending to log, and once in a while merging log entries into the B-tree in a CoW manner. Essentially that's what ZFS does, except it's optional when it really shouldn't be. The whole point of the log is to amortize the cost of the copy-on-write B-tree updates because CoW B-tree updates incur a great deal of write magnification due to having to write all new interior blocks for all leaf node writes. If you wait to accumulate a bunch of transactions then when you finally merge them into the tree you will be able to share many new interior nodes for all those leaf nodes. So just make the log a first-class part of the database.
Also, the log can include small indices of log entries since the last B-tree merge, and then you can accumulate even more transactions in the log before having to merge into the B-tree, thus further amortizing all that write magnification. This approaches an LSM, but with a B-tree at the oldest layer.
It's cool to build a database in 3000 lines, but for a real production-ready database you'll need testing. Would love to see some coverage on correctness and reliability tests. For example, SQLite has about 590 times more test code than the library itself. (https://www.sqlite.org/testing.html)
I've started learning Velox last year, and it's a staggering amount of code. Sure, it has a ton of dependency because it wants to support so many things, but I feel like the core itself is also very complex.
I'm not sold on complexity being a necessity in software engineering, as I'm sure a lot of you also aren't. Yet we see a lot of behemoth projects.
I've recently started to think about covering some advanced topics like compilers, databases, interpreters. Do u know any good oss repos where I can learn and build them from scratch on my own
[+] [-] btown|1 year ago|reply
For one of these, RocksDB, one reference point is how CockroachDB was built on top of them for many years and many successful Jepsen tests (until they transitioned to an in-house solution).
https://www.cockroachlabs.com/blog/cockroachdb-on-rocksd/
https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/
Another possibility is Apple's FoundationDB, with an interesting discussion here: https://news.ycombinator.com/item?id=37552085
[+] [-] vrnvu|1 year ago|reply
Writing a Bitcask(KV wal) like db in Rust. Really cool and simple ideas. The white paper is like 5 pages.
[+] [-] asa400|1 year ago|reply
[+] [-] andai|1 year ago|reply
So for a bit more effort you get a battle tested real world thing!
[+] [-] cryptonector|1 year ago|reply
[+] [-] ianmcgowan|1 year ago|reply
I learned assembler by typing in listings from magazines and hand dis-assembling and debugging on paper. Your approach seems similar in spirit, but who has the times these days?
[+] [-] esmy|1 year ago|reply
[+] [-] tonyedgecombe|1 year ago|reply
I've read about an author who did this (I can't remember their name right now), writing down the works of another author they wanted to learn from.
[+] [-] frankie_t|1 year ago|reply
[+] [-] 867-5309|1 year ago|reply
[+] [-] cryptonector|1 year ago|reply
I.e., just write one file (or several) as a B-tree + a log, appending to log, and once in a while merging log entries into the B-tree in a CoW manner. Essentially that's what ZFS does, except it's optional when it really shouldn't be. The whole point of the log is to amortize the cost of the copy-on-write B-tree updates because CoW B-tree updates incur a great deal of write magnification due to having to write all new interior blocks for all leaf node writes. If you wait to accumulate a bunch of transactions then when you finally merge them into the tree you will be able to share many new interior nodes for all those leaf nodes. So just make the log a first-class part of the database.
Also, the log can include small indices of log entries since the last B-tree merge, and then you can accumulate even more transactions in the log before having to merge into the B-tree, thus further amortizing all that write magnification. This approaches an LSM, but with a B-tree at the oldest layer.
[+] [-] airstrike|1 year ago|reply
[+] [-] phoronixrly|1 year ago|reply
[+] [-] qianli_cs|1 year ago|reply
The code is available on GitHub: https://github.com/eatonphil/gosql (it's specifically a PostgreSQL implementation in Go).
It's cool to build a database in 3000 lines, but for a real production-ready database you'll need testing. Would love to see some coverage on correctness and reliability tests. For example, SQLite has about 590 times more test code than the library itself. (https://www.sqlite.org/testing.html)
[+] [-] anacrolix|1 year ago|reply
[+] [-] ComputerGuru|1 year ago|reply
[+] [-] bArray|1 year ago|reply
The idea in mind was to use it for something like an RSS feed reader.
[+] [-] codr7|1 year ago|reply
https://github.com/codr7/whirlog
[+] [-] _zoltan_|1 year ago|reply
I'm not sold on complexity being a necessity in software engineering, as I'm sure a lot of you also aren't. Yet we see a lot of behemoth projects.
[+] [-] wwarren|1 year ago|reply
[+] [-] teleforce|1 year ago|reply
https://leanpub.com/build_your_own_database_from_scratch/c/L...
To be honest it seems a bit strange to pay for the code since all the source codes to build the database is literally inside the book.
[+] [-] miffy900|1 year ago|reply
[+] [-] koushik_x|1 year ago|reply
[+] [-] amelius|1 year ago|reply
[+] [-] MarcelOlsz|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] swyx|1 year ago|reply
[+] [-] sandeep1998|1 year ago|reply
[+] [-] fatih-erikli-cg|1 year ago|reply
[deleted]