Why CRDT didn't work out as well for collaborative editing xi-editor

[+] tomsmeding|6 years ago|reply

Relevant discussion from a couple of days ago: https://news.ycombinator.com/item?id=19845776

[+] chubot|6 years ago|reply

I don't have much experience in this area, but I'd be interested in an overview of how different pieces of sofware handle the concurrent / multiplayer editing problem, like:

- Etherpad

- Google docs

- Apache / Google Wave (open sourced: http://incubator.apache.org/projects/wave.html)

- repl.it https://repl.it/site/blog/multi

- figma https://www.figma.com/blog/multiplayer-editing-in-figma/ (image editing rather than text editing)

So is the gist of it that OT relies on central servers and they all use OT rather than CRDT? That was not entirely clear to me.

Looking at the xi analysis, it sounds like "IME" is specific to a desktop application using X (?), so it doesn't apply to any of this web software.

The rest of them seem like they do apply?

Is the problem that you have to pay a "CRDT tax" for every piece of state in the application? I thought the same was true of OT. Don't you have to express every piece of state within those constraints too?

repl.it doesn't seem to have a problem with syntax highlighting or brace matching (try it, it's pretty slick). So is it just that they paid that tax with a lot of code or is there something else fundamentally different about xi vs. repl.it ? Or maybe xi is going for a lot lower latency than web-based editors?

recent thread about OT vs. CRDT that might be interesting: https://news.ycombinator.com/item?id=18191867

[+] DannyBee|6 years ago|reply

TL;DR CRDT is completely irrelevant to any of the highlighting/etc stuff

Most highlighters are lexers. Advanced highlighters/folders are parsers.

The lexing/parsing that is required for highlighting is easy to make incremental for all sane programming languages.

for LL(star) grammars, adding incrementality is completely trivial (i sent patches to ANTLR4 to do this)

for LR(k) grammars, it's more annoying but possible (tree-sitter does this)

For lexing, doing it optimally is annoying, doing it near optimally is very easy.

optimal incremental lexing requires tracking, on a per token basis, how far ahead in the character stream the recognizer looked (easy), and computing the affected sets (annoying)

Most real programming languages have a lookahead of 1 or 2. Near-optimally requires tracking only the max lookahead used, and assuming all tokens need that max-lookahead. In a world where min token length is 1, that means you only need to re-lex an additional (max lookahead) tokens before the changed range. In a world where min token length is 0, it's all zero length tokens + (max lookahead) tokens before the changed range. This does not require any explicit per-token computation.

Again for basically all programming languages, this ends up re-lexing 1 or 2 more tokens total than strictly necessary.

Tree-sitter does context-aware on-demand lexing. i have patches on the way for ANTLR to do the same.

The only thing CRDT helps with in this equation at all is knowing what changed and producing the sequence of tree-edits for the lexer/parser.

The lexer only cares about knowing what character ranges changed, which does not require CRDT. The typical model for this kind of edit is what vscode's document text changes provide (IE For a given text edit, old start , old end, new start , new end)

The parser only cares about what token ranges changed, which does not require CRDT.

[+] raphlinus|6 years ago|reply

Just to answer the IME question, it refers to "Input Method Editor" and is an important problem to solve for all platforms, desktop, mobile, and Web. The API's for IME are often crufty and it's easy to get edge cases wrong. These days, lots of people care because emoji, but formerly it was something that English language speakers tended to ignore.

[+] antoncohen|6 years ago|reply

Figma has a blog post about it, they use Fractional Indexing.

https://www.figma.com/blog/realtime-editing-of-ordered-seque...

[+] jahewson|6 years ago|reply

OT doesn’t have to depend on central servers, but it is much simpler and less resource-intensive to do it that way. This is what Etherpad, Google Wave and Google Docs do.

As you say, both OT and CRDT come with a “tax” in that you must structure your application’ edits in a way that they can interpret. However, this is easier with OT for the text editing case, as OT uses position based addressing, whereas CRDT is identifier based (I’m assuming this is where the canonical IME issue mentioned comes from - somebody please correct me if I’m wrong).

[+] zamalek|6 years ago|reply

I'm pretty certain that these are OT: Etherpad, Docs, Wave. I've done quite a bit of investigation into these too and I'm convinced that OT is the mathematically inferior, but practically superior model (it is far easier to reason about, which is important because not everyone on your team might read compsci books).

You can abstract OT/CRDT behind a step-locked view model, which should be fine for in-process. IME can be solved in this view model, too, by halting updates to the CRDT/OT between IME entries.

The problem is that, with xi, this would necessitate implementing that view model in each front-end, so there's certainly the "CRDT tax" there. xi is a bit of an oddball.

[+] macintux|6 years ago|reply

Another OT/CRDT thread from just a few days ago: https://news.ycombinator.com/item?id=19845776

[+] shereadsthenews|6 years ago|reply

I don’t think Google docs is distributed at all. I think all edits happen in a single process on a single machine. That’s why the limit of concurrent editors is so low.

[+] lewisjoe|6 years ago|reply

> So is the gist of it that OT relies on central servers and they all use OT rather than CRDT?

You are right. I'd take shot at blowing out the why part. Practically, real-time collaboration apps can be categorized into two:

#1 The document editors, which stores information on the cloud. I'll refer these as document editors henceforth.

#2 The ones that doesn't need a thick server, except for broadcasting edits. Like plain text editors and code editors. I'll refer these as code editors for simplicity sake.

Both have a slightly different set of problems.

Document editors, usually have a slightly richer document representation schema (since most of them are rich text documents anyway). But they have the luxury of a central server. A central server means, now they have the power for good versioning, rollbacks to a consistent state, picking winners and ensuring ordering of edits when syncing large edits.

Code editors, although their doc representation is usually arrays of characters (it could be strings, ropes, or whatever but essentially just a set of characters), these don't have the luxury of a central server that could decide a winner when concurrent edits take place. This means the clients have to be very intelligent and converge to the same representation no matter how out-of-order the edits arrive.

Now coming back to the two algorithms camps - OT & CRDT:

The general outlook is that, OT - while being simple, they have this classic TP2 problem that would make it harder to arrive at a consistent state eventually - without a help from a central server. The alternative is to have a complicated system, almost as complicated as CRDTs. (More about this can be read through the Internet. If anybody's interested I'll post links to stuff I've read through)

CRDT - CRDT have (or maybe had?) a strong reputation of being efficient at arriving at a consistent state with out of order edits. They can do it, because basically no characters in a CRDT document is deleted anyway (they are just marked deleted and are termed tombstones). So no information loss, which means a clever algorithm can write a function to easily arrive at a consistent state at each site. This means, a central server is now only optional.

If I didn't lose you until now, you'd have intuitively guessed why document editors, tended to always pick OT and code editors prefer CRDT over OT.

This is complete oversimplifications and certainly there's a ton of things I left out for simplicity sake.

Source: I'm in the business of building a powerful Google Docs alternative, a word processor that runs on the browser: https://writer.zoho.com (Zoho Writer)

[+] lewisl9029|6 years ago|reply

> Indeed, the literature of CRDT does specify a mathematically correct answer. But this does not always line up with what humans would find the most faithful rendering of intent.

This is a very salient point that anyone thinking of using CRDTs to "solve" synchronization in an user-facing application needs to take into consideration. Yes, CRDTs will guarantee that clients converge to an identical, mathematically "consistent" state eventually, but there's no guarantee whether or not that mathematically "consistent" state would make any sense to the application business logic that needs to consume that state, or to the human that needs to reason about the rendered result. That is a completely different can of worms that we'll still have to tackle to build a usable application.

Here's great example to illustrate this from Martin Kleppmann's talk on this topic: https://youtu.be/yCcWpzY8dIA?t=2634

Rest of the talk is also highly recommended for anyone interested in an approachable primer on CRDTs.

The trade-offs to CRDTs mentioned by the author in the context of text-editors make sense, but I would be curious to hear from the Xray team on what their current thinking on the topic is, given that they have collaborative editing as an explicit core objective (which might shift the value prop in favor of using CRDTs relatively speaking since in Xi it seems to be only an aspirational goal), and that their approach to implementation was similar but not quite identical to Xi's:

> Our use of a CRDT is similar to the Xi editor, but the approach we're exploring is somewhat different. Our current understanding is that in Xi, the buffer is stored in a rope data structure, then a secondary layer is used to incorporate edits. In Xray, the fundamental storage structure of all text is itself a CRDT. It's similar to Xi's rope in that it uses a copy-on-write B-tree to index all inserted fragments, but it does not require any secondary system for incorporating edits.

https://github.com/atom/xray#text-is-stored-in-a-copy-on-wri...

[+] the_duke|6 years ago|reply

> but I would be curious to hear from the Xray team on what their current thinking on the topic is

Xray is dead:

https://www.reddit.com/r/rust/comments/bdf3lx/we_need_to_sav...

[+] cryptonector|6 years ago|reply

This really is the most important thing to get from all of this. In an editor the users can see any bad outcomes and correct them. In, say, a database system, bad outcomes can be much more problematic.

[+] laughinghan|6 years ago|reply

@dang, can we edit the title? It's inaccurate. This post is about how the CRDT didn't work for asynchronous editing by automated tools like syntax highlighting, automatic bracket balancing, etc. The post explicitly contrasts those use cases with collaborative editing as a use case that the author didn't implement, but where they think "the CRDT is not unreasonable".

[+] coldtea|6 years ago|reply

Maybe first build a capable editor, with plugins, etc (xi-editor is not that yet) and worry about "collaborative editing" later?

And even for that, I think simply "taking turns" (where users share an editor session, can chat with each other, and can switch on sequentially who gets to actively edit) is enough for 99% of cases, and is not more difficult than mere single-person editing (since there are no conflicts).

[+] kahnjw|6 years ago|reply

Collaborative editing is very difficult to implement in a pre-existing codebase. Decisions made during the initial design will be prohibitive to just bolting on collaborative editing in the future.

To me, xi seems like an aspirational project, so this seems like the perfect place to take some time and design for these features up front.

[+] espadrine|6 years ago|reply

I think (from reading the documentation) an implicit aim of Fuchsia is to treat a device no longer as a holder of documents (a storage medium for photos / messages / candy crush scores) but as a view into a universal storage space.

https://webcache.googleusercontent.com/search?q=cache:S9poew...

[+] plorkyeran|6 years ago|reply

Collaborative editing is not something you can just bolt on after the fact if you want it to actually work well. Things like building a robust server that can be exposed to the internet can certainly wait, but how you are you supposed to develop a plugin ecosystem when you haven't even settled on a conceptual model for how to store and manipulate text yet?

[+] marcosdumay|6 years ago|reply

Start by redoing everything that the mature alternatives do is an advice for creating neither successful not useful things.

By all means, focus on creating a kick-ass collaborative editor, and add just the editing capabilities needed to make it good at collaborative editing.

[+] lucb1e|6 years ago|reply

At the risk of asking a stupid question: is there a reason other than offline support why we bother with conflict resolution algorithms?

Every time concurrent editors come up, one of the main points of discussion is the pros and cons of different possible conflict resolution algorithms. People seem to be spending a lot of time on debating and implementing that. The way I see it, whichever packet reaches the server first gets applied first. Send something like "line 9 column 19: insert <Enter>", and when another client whose cursor is on line 15 receives that, it moves the cursor down to line 16 and scrolls the view down one line. Because you can see each other's cursors and selections, it shouldn't be hard to avoid typing in the same place. Unless you have round trip times of multiple seconds (satellite uplinks maybe?), and unless you edit continuously with more than, say, one people per ten sentences, you should hardly ever need it, and if it happens, the person editing will notice within two seconds and just wait a second for the other to finish. It's not as if you can reliably apply edits anyway: as the article already describes, changing a line from ABC to EFG concurrently with someone modifying B to D, does not really have a good outcome. In a more realistic example, it would be changing "its" to "it's" concurrently with changing the word to "that". There is no good solution (the server wouldn't know which person to ignore: the apostrophe inserter or the replacer), so someone will have to resolve it manually anyway, so why bother with complex resolution algorithms? Heck, I'd be fine if my editor would do exclusive locks for the line I'm on before I can start typing.

For slow things like the customer report, internal documents, code, etc., I use something like git. Collaborative editing is (to me) for realtime things like jotting down notes about what I'm working on and looking at what others are working on right now, where even a proper revision control system is too cumbersome (git pull, vim notes.txt, small edit, :wq, git commit, git push, repeat) because someone might be working on the same thing. In such a case, where I need to work together on a file in real time, I'm not working offline, so this conflict resolution is by definition unnecessary. Is that different from the majority of people that use collaborative editing?

[+] braythwayt|6 years ago|reply

Well (strokes grey beard), before we talk about offline support, we should consider that there are two kinds of "online editing."

The first kind of "online editing" is where you make a request to a server, and nothing happens until the server acknowledges it and sends you an approval. That's synchronous.

The second type of "online editing" is where you have an independent process in your browser or client, and it communicates asynchronously with a server, while simultaneously allowing you to edit a local model of the data.

In the first case, we really need an editor to send every keypress to the server and wait for a response.

In the second case, we really are talking about the browser or local app or whatever being offline, it just synchronizes very frequently with the server.

I think that the moment you want to support typing faster than the round trip to the server, you are already talking about offline editing.

And in most cases, yes you do want to support typing faster than teh round trip to the server, because what works in an urban area with high-speed internet becomes comically unusable in low-bandwidth and low-latency environments.

---

So... I suggest that we almost always want to design for "offline editing," it's just that we don't always imagine someone writing an essay on a weekend camping trip, then synchronizing when they get back to the office on Monday.

[+] laughinghan|6 years ago|reply

> it shouldn't be hard to avoid typing in the same place [...] if it happens, the person editing will notice within two seconds and just wait a second for the other to finish

But if it does happen, it needs to result in a consistent state in both people's editors, right?

> In a more realistic example, it would be changing "its" to "it's" concurrently with changing the word to "that". There is no good solution (the server wouldn't know which person to ignore: the apostrophe inserter or the replacer)

Discarding the apostrophe seems like a good solution to me, what's wrong with it?

As long as, if the replacer hits Undo, "it's" is what is restored. (Not sure if any existing algorithms do so, but I believe it's possible and am intending for the collaborative editor I'm designing to do so. Haven't implemented it yet though, so who knows.)

[+] k__|6 years ago|reply

I never understood this either.

Why not simply lock the file automatically when someone types and unlock it after a few seconds pause.

What do I get from parallel edits? When I finish my stuff and my collaborator isn't finished, we will end up with broken code until they have finished editing.

Sure it is nice to be able to write comments at some place while someone else edits code at another place, but then row-based locking would do the trick too.

[+] OJFord|6 years ago|reply

> why we bother with conflict resolution algorithms? […] The way I see it, […] Send something like "line 9 column 19: insert <Enter>", and when another client whose cursor is on line 15 receives that, it moves the cursor down to line 16 and scrolls the view down one line.

What you describe is (an aspect of) the algorithm you want to do away with!

[+] nicodemus26|6 years ago|reply

I think CRDTs would make much more sense in a projectional editor than a text one. When the changes are mutations to the abstract syntax tree its more well defined how a merge would end. Also, the merge results don't have the opportunity to be invalid syntax.

[+] msvan|6 years ago|reply

It is really crazy that we go through these massive hoops to simulate what would be trivial to do in an AST editor. I recently read through some of the literature on projectional editors, and while they have historically had some usability issus I really hope that this will change in the future.

[+] catpolice|6 years ago|reply

"For syntax highlighting, any form of OT or CRDT is overkill; the highlighting is a stateless function of the document, so if there's a conflict, you can just toss the highlighting state and start again."

I first became interested in CRDTs in a case where this wasn't really true. I was writing an IDE for a custom in-house DSL - think of the application as a special language for interacting with a gigantic and very strange database. Basically, the problem was that the use case really stretched the bounds of what is normally done with syntax highlighting. Some requirements:

- It had syntax and semantic highlighting, where the visual feedback associated with a term would depend on the results of queries to the remote database

- It had to be able to handle documents of several megabytes (and many thousands of terms) fairly smoothly, with as little noticeable lag or flicker as possible

- It couldn't swamp the database with unnecessary requests

- The document itself had implicit procedural state (e.g. if you wrote a command that, if evaluated, would alter the state of a term on the database, appearances of that term later in the document needed to be highlighted as if those changes had already been applied)

So I definitely couldn't throw out metadata and start over with every change. I ended up with a kind of algebraic editing model that allowed me to put bounds on what needed to be updated with every edit and calculate a minimal set of state changes to flow forward. It was extraordinarily complicated. I never got around to learning enough about CRDTs to determine if they'd be simpler than the solution I came up with, but they do seem to target some similar issues.

[+] raphlinus|6 years ago|reply

Yeah definitely. I would consider this a form of semantic analysis, of the kind provided by Language Server implementations rather than the kind of syntax highlighting provided by syntect. Also note, the syntax highlighting done in xi-editor is both incremental and async[11]. This actually worked out well, and I would preserve it going forward. What I wrote above actually overstates the case, I think the ideal solution is an extremely simple form of OT so you can reuse as much as possible of the already-computed highlighting state, but you certainly throw away the highlighting result for the region being edited. Preserving "before" is trivial, and preserving "after" should be a simple matter of fixing up line counts.

[11]: https://xi-editor.io/docs/rope_science_11.html

[+] hansjorg|6 years ago|reply

I'm assuming CRDT refers to conflict free replicated data type: https://en.m.wikipedia.org/wiki/Conflict-free_replicated_dat...

OT, operational transformation: https://en.m.wikipedia.org/wiki/Operational_transformation

[+] spooneybarger|6 years ago|reply

That's a correct assumption.

[+] colemickens|6 years ago|reply

Thank you for writing this up Raph. I've been following CRDT usage in Xray/Xi and am curious to see where collaborative editing goes. I appreciate you thinking about it upfront.

[+] lacampbell|6 years ago|reply

Having a bit of difficulty following this, so I'll break down my understanding of CRDTs and see if someone can help me out.

A CRDT can be thought of as an algebraic structure, consisting of data type D, and a join function. So for all a, b, c in D, it's:

Associative:

  join(a, join(b, c)) == join(join(a, b), c)

Commutative:

  join(a, b) == join(b, a)

Idempotent:

  join(a, a) == a

Partially ordered:

  if join(a, b) == b then a <= b
  a <= a == true
  if (a < b) and (b > a) then a == b
  if (a <= b) and (b <= c) then (a <= c)

So given all of that, I am not sure why the example in the article holds. I assume it's a consequence of the partial ordering, but I don't know what the partial ordering is. What's the join operation and what's the data type?

[+] BoiledCabbage|6 years ago|reply

I'm just learning this, but it seems to be that in some CRDTs "a" and "b" represent two different edits to the common state made by two different users and the join operations is how you combine those two edits into a single joint edit that will produce the same results regardless of which order the edits arrive at the "server".

In other CRDTs "a" and "b" each represent the new states after two users have made different edits to the same common source, and the "join" function is how you combine those independent edited docs/states into a single state again.

Ie if you think of fit branches, set of rules on edits to ensure that no matter the order you merge branches back together conflicts can be automatically resolved and will always reach the same end state with the changes "appropriately" incorporated.

[+] josephg|6 years ago|reply

I replied with my thoughts to the github issue, but they might be of interest to people reading along here too. I've got some experience on these systems (wave, sharejs, sharedb, etc).

> As a side note, I've heard an interesting theory about why CRDT-type solutions are relatively popular in the cloud. To do OT well, you need to elect a centralized server, which is responsible for all edits to a document. I believe the word for this is "server affinity," and Google implements it very well. They need to, for Jupiter-style OT (Google Docs) to work.

You don't need to do this. (Although I'm not sure if we knew that on the wave team). You can implement an OT system on top of any database that has a transactional write model. The approach is to enter a retry loop where you first try to apply the operation (but in a way that will reject the operation if the expected version numbers don't match). If an error happens, fetch the concurrent edits, transform and retry. Firepad implemented this retry loop from the client, and it worked much better than I expected. Here is a POC of a collaborative editor on top of statecraft - https://home.seph.codes/edit/test . The only OT code on the server is this middleware function:

https://github.com/josephg/statecraft/blob/b6a82f34268238c90... .

In my experience the reason why semi- or fully- centralized systems are popular in products like google docs is that they're easier to implement. Access control in a decentralized system like git is harder. Gossip networks don't perform as well as straight offset-based event logs (kafka and friends). And if you have a canonical incoming stream of edits, its easier to reason about.

---

> I have a stronger conclusion: any attempt to automate resolving simultaneous editing conflicts that, e.g., git merge could not resolve, will fail in a way that fatally confuses users.

I think you have to act with intent about what you want to happen when two users edit the same text at the same time. There are basically 2 approaches:

1. Resolve to some sort of best-effort outcome. (Eg "DE F G" or "E F GD")

2. Generate an error of some sort (eg via conflict markers) and let the user explicitly resolve the conflict

As much as it pains me to say, for code I think the most correct answer is to use approach (1) when the code is being edited live and (2) when the code is being edited offline / asyncronously. When we can see each other's changes in realtime, humans handle this sort of thing pretty well. We'll back off if someone is actively editing a sentence and we'll see them typing and let them finish their thought. If anything goes wrong we'll just correct it (together) before moving on. The problem happens when we're not online, and we edit the same piece of code independently, "blind" as it were. And in those cases, I think version control systems have the right approach - because the automated merge is often wrong.

(More: https://github.com/xi-editor/xi-editor/issues/1187#issuecomm... )

[+] EGreg|6 years ago|reply

From these comnents it seems that OT requires a central server while CRDT can have a far more flexible topology. Is this true? And don’t we have robust implementations of CRDT for simple trees?

[+] jahewson|6 years ago|reply

No, OT can handle decentralization without issues. It’s just usually far more desirable to centralize it.

[+] unknown|6 years ago|reply

[deleted]

[+] sara7262|6 years ago|reply

[deleted]

[+] microcolonel|6 years ago|reply

I think it would be interesting to let the language mode control the rope, or delegate subtrees of the rope to a mode. This way, you could represent things like lexical scope in the tree of the rope, and a language-specific tokenizer could further reduce the complexity of syntax formatting.

Emacs has the concept of "faces", and many Emacs major modes have proper parsers, lexers, and even some static analyzers that they use to apply the faces. If the rope resembled the AST, then many of the issues Raph talks about could be greatly reduced by localizing edits to their area of influence. If you edit inside a token, and somebody else deletes that whole token, then it is pretty clear how to resolve that. You could conceive of natural language modes which produce humanistic hierarchies, or modes with internal formats other than text (which may have a cached text view on them) like spreadsheets or debuggers.

[+] marknadal|6 years ago|reply

TLDR;

CRDTs cannot be "bolted ontop".

----

I really don't like this answer, but it is sadly true - even as an expert in the space (my database https://github.com/amark/gun is one of the few CRDT-based systems out there). And there is a simple reason for this:

Distributed systems are composable, they can be used to build higher-level strongly consistent systems on top. (Note: Sacrificing AP along the way, but then you can have a "tunable" system where each record you save you decide what consistency requirement you need, fast or slow.)

However centralized systems are not composable, you can't go "down" the latter of abstraction by adding more stuff.

[+] raphlinus|6 years ago|reply

What you say is certainly true, but also not a fair characterization of what I wrote; we deliberately designed xi-editor from the early days to be consistent with the CRDT model (see the "rope science" series for thinking from the very early days).

But yes, if you have an existing editor or application and you try to just add CRDT, there are a lot of things that can go wrong.

[+] atheowaway4z|6 years ago|reply

Parsing & syntax highlighting before CRDT might give better results

83 comments