top | item 39627454

(no title)

pmeunier | 2 years ago

Yes indeed, we would not have started Pijul without the hope that at least theoretically, patch-based designs could be faster than snapshots. "Patch-based is slow" without any other argument is not a very informed claim, Pijul is actually faster than Git in some cases (in fact Pijul is faster where it matters most IMHO: large files and/or large repos, conflicts and blames). Not because we're better at tweaking C code (we're definitely not!), but because we designed our datastructures like theorists, and only then looked at how (and whether!) to implement things. One advantage we had over Linus is that we had no time pressure: we could well use Darcs, Git, or Mercurial to write Pijul initially (we used Darcs, actually), and it didn't matter much if we failed.

It took a little bit of work to get that down to actual fast code, for example I had to write my own key-value store, which wasn't a particularly pleasant experience, and I don't think any existing programming language could have helped, it would have required a full linear logic type system. But at least now that thing (Sanakirja) exists, is more generic, and modular than any storage library I know (I've used it to implement ropes, r trees, radix trees…), and its key-value store is faster than the fastest C equivalent (LMDB).

Could we do the same in Haskell or OCaml? As much as I like these two languages, I don't think I could have written Sanakirja in a garbage-collected language, mostly because Sanakirja is generic in its underlying storage layer: it could be mmap, a compressed file, an entire block device in Unix, an io_uring buffer ring, or something else. And the notion of ownership of the objects in the dictionary is absolutely crucial: Sanakirja allows you to fork a key-value store efficiently, so one question is, what should happen when your code drops a reference to an object from the kv store? what if you're deleting the last fork of a table containing that object? are these two the same thing? Having to explain these to a GC would have been hard I think.

I wouldn't have done it in C/C++ either, because it would have taken forever to debug (it already did take a long time), and even C++-style polymorphism (templates) isn't enough for the use of Sanakirja we have in Pijul.

remember the "poop" paper about mmap for databases, right? Well, guess what: having a generic key-value store implementation allowed me to benchmark their claims, and actually compare congestion, throughput, speed between mmap and io_uring. Conclusion: mmap rocks, actually.

discuss

order