top | item 41374801

(no title)

felipefar | 1 year ago

CRDTs are powerful, but it's unfortunate that they leave behind a trail of historical operations (or elements), both in their ops- or state-based variants. Even with compression, it's still a downside that makes me concerned about adopting them.

Even so, the discussion surrounding them made me excited by the possibility of implementing conflict-free (or fine-grained conflict resolution) algorithms over file-based storage providers (Dropbox, Syncthing, etc.).

discuss

order

josephg|1 year ago

Author here. I've had this conversation with people a lot. And people in the CRDT world also talk about it a lot. But in practice, at least with text editing, the overhead is so tiny that I can't imagine it ever coming up in practice. Diamond types - my post-CRDT project - does (by default) grow without bound over time. But the overhead is usually less than 1 byte for every character ever typed. If I turn on LZ4 compression on the stored text, documents edited with diamond types are often smaller than the resulting document state. Even though we store the entire editing history!

I know a bunch of ways to solve this technically. But I'm just not convinced its a real problem in most systems.

(I have heard of it being a problem for someone using yjs for a 3d modelling tool. While dragging objects, they created persistent edits with each pixel movement of the mouse. But I think its smarter to use ephemeral edits for stuff like that - which aren't supported by most crdt libraries.)

Git also suffers from this problem, by the way. Repositories only grow over time. And they grow way faster than they would if modern CRDT libraries were used instead. But nobody seems bothered by it. (Yes, you can do a shallow clone in git. But almost nobody does. And you could also do that with CRDTs as well if you want!)

Vinnl|1 year ago

I don't think the concern was necessarily about overhead, but about sensitive data. This is a problem in Git too: people make the mistake of reverting a commit with sensitive values and think it's gone, but the history is still out there.

Edit: or maybe that was the concern, but this other concern exists too :)

jakelazaroff|1 year ago

If you’re not building something fully decentralized, you may be able to loosen some of the constraints CRDTs require. As an example, if you can guarantee that all clients have received changes later than X date, you can safely drop any operations from before that date.

shipp02|1 year ago

Could you also make it fully decentralized but require clients to come online within a deadline (1 day, week) or risk losing their local changes? This would also allowing trimming history but with loss of some functionality to sync.

jchanimal|1 year ago

The full op-log plus deterministic merging is a great fit for immutable block storage, which can have other security, performance, and cost benefits. I'm building Fireproof[1] to take advantage of recent research in this area. An additional benefit to content addressing the immutable data is that each operation resolves to a cryptographically guaranteed proof (or diff), enforcing causal consistency and allowing stable references to be made to snapshots. This means your interactive, offline-capable, losslessly-merging database can run on the edge or in the browser, but still have the integrity you'd have looked to a centralized database or the blockchain for in the past. (Eg you can drop a snapshot CID into a PDF for signature, or a smart contract, and remove all ambiguity about the referenced state.

[1] https://github.com/fireproof-storage/fireproof

jitl|1 year ago

how do such immutable systems deal with redaction eg for GDPR delete requests? Do you need to repack the whole history, and break the existing signature chain?

LAC-Tech|1 year ago

CRDTs are powerful, but it's unfortunate that they leave behind a trail of historical operations (or elements), both in their ops- or state-based variants. Even with compression, it's still a downside that makes me concerned about adopting them.

There's nothing inherent in the concept of a CRDT that requires you to leave behind a trail of historical operations or elements.

You'd be better off directing your criticism and specific implementations than making this blanket statement about what is, at the end of the day, a set of mathematical laws that certain data types / databases follow.

eviks|1 year ago

What's the concern if you can delete history?

orthecreedence|1 year ago

In a distributed system, which is often a place CRDTs thrive, deleted history means a new client cannot be brought up to the current state unless there is some form of state summarization. Doing this summarization/checkpointing in a consistent manner is difficult to do without some form of online consensus mechanism, which is also difficult to do in a distributed system with clients popping in and out all the time.