top | item 38664241

(no title)

mattarm | 2 years ago

Definitely interested in how you achieved another 2-10x over the btree approach. I want surprised that btree was as effective as it was, but I’d be curious to know how you squeezed a bit more out of it.

discuss

josephg|2 years ago

The btree works great, and has barely changed. I made it faster with two tricks:

1. I made my own rope library (jumprope) using skip lists. Jumprope is about 2x faster than ropey on its own. And I have a wrapper around the skip list (called “JumpropeBuf” in code) which buffers a single incoming write before touching the skip list. This improves raw replay performance over ropey by 10-20x iirc.

2. Text (“sequence”) CRDTs replicate a list / tree of fancy “crdt items” (items with origin left / origin right / etc). This special data structure needs to be available both to parse incoming edits and generate local edits.

Turns out that’s not the only way you can build systems like this. Diamond types now just stores the list of original edits. [(Edit X: insert “X” position 12, parent versions Y, Z), …]. Then we recompute just enough of the crdt structure on the fly when merging changes.

This has a bunch of benefits - it makes it possible to prune old changes, it lowers memory usage (you can just stream writes to disk). The network and disk formats aren’t dependant on some weird crdt structure that might change next week. (Yjs? RGA? Fugue?). File size is also smaller.

And the best bit: linear traces don’t need the btree step at all. Linear traces go as fast as the rope. Which - as I said above, is really really fast. Even when there are some concurrent edits and the btree is created, any time the document state converges on all peers we can discard all the crdt items we generated so far and start again. Btrees are O(log n). This change essentially keeps resetting n, which gives a constant size performance improvement.

The downside is that the code to merge changes is more complex now. And it’s slower for super complex traces (think dozens of concurrent branches in git).

I’m writing a paper at the moment about the algorithm. Should be up in a month or two.