(no title)
pmeunier | 2 years ago
It took a little bit of work to get that down to actual fast code, for example I had to write my own key-value store, which wasn't a particularly pleasant experience, and I don't think any existing programming language could have helped, it would have required a full linear logic type system. But at least now that thing (Sanakirja) exists, is more generic, and modular than any storage library I know (I've used it to implement ropes, r trees, radix trees…), and its key-value store is faster than the fastest C equivalent (LMDB).
Could we do the same in Haskell or OCaml? As much as I like these two languages, I don't think I could have written Sanakirja in a garbage-collected language, mostly because Sanakirja is generic in its underlying storage layer: it could be mmap, a compressed file, an entire block device in Unix, an io_uring buffer ring, or something else. And the notion of ownership of the objects in the dictionary is absolutely crucial: Sanakirja allows you to fork a key-value store efficiently, so one question is, what should happen when your code drops a reference to an object from the kv store? what if you're deleting the last fork of a table containing that object? are these two the same thing? Having to explain these to a GC would have been hard I think.
I wouldn't have done it in C/C++ either, because it would have taken forever to debug (it already did take a long time), and even C++-style polymorphism (templates) isn't enough for the use of Sanakirja we have in Pijul.
remember the "poop" paper about mmap for databases, right? Well, guess what: having a generic key-value store implementation allowed me to benchmark their claims, and actually compare congestion, throughput, speed between mmap and io_uring. Conclusion: mmap rocks, actually.
zellyn|2 years ago
Did you write up the mmap results as a paper? Sounds like it would be quite useful. (Is this the paper you're referring to? https://db.cs.cmu.edu/papers/2022/cidr2022-p13-crotty.pdf)