top | item 44309990

(no title)

zawaideh | 8 months ago

If the server can't operate on the content, it can't merge it into the CRDT documents. Which means it would need to sending and receiving the entire state of the CRDT with each change.

If the friend is online then sending operations is possible, because they can be decrypted and merged.

discuss

ath92|8 months ago

Generally, this is not really true. The point of CRDTs is that as long as all parties receive all messages (in any order), they should be able to recreate the same state.

So instead of merging changes on the server, all you need is some way of knowing which messages you haven’t received yet. Importantly this does not require the server to be able to actually read those messages. All it needs is some metadata (basically just an id per message), and when reconnecting, it needs to send all the not-yet-received messages to the client, so it’s probably useful to keep track of which client has received which messages, to prevent having to figure that out every time a client connects.

josephg|8 months ago

You’re talking past each other. These are both valid descriptions of CRDTs - just different types of CRDTs.

Generally there’s two categories of CRDTs: state based and operation based CRDTs.

State based CRDTs are like a variable with is set to a new value each time it changes. (Think couchdb if you’ve used it). In that case, yes, you generally do update the whole value each time.

Operation based CRDTs - used in things like text editing - are more complex, but like the parent said, deal with editing events. So long as a peer eventually gets all the events, they can merge them together into the resulting document state. CRDTs have a correctness criteria that the same set of operations always merges into the same document, on all peers, regardless of the order you get the messages.

Anyway, I think the parent comment is right here. If you want efficient E2E encryption, using an operation based crdt is probably a better choice.

charcircuit|8 months ago

If it takes 1 seconds per merge as per the article it sounds like a poor user experience for when new people join they have to wait hundreds or thousands of seconds to get to the doc.

Joker_vD|8 months ago

I... still can't make heads or tails out of this description. Let me restate how I understand the scheme in TFA: there are two people, editing on the same document using CRDTs. When one person makes an edit, they push an encrypted CRDT to the sync server. Periodically, each of them pulls edits made by the other from the sync server, apply them to their own copy, and push the (encrypted) result back. Because of CRDT's properties, they both end up with the same document.

This scheme doesn't require them two people to be on-line simultaneously — all updates are mediated via the sync server, after all. So, where am I wrong?

eightys3v3n|8 months ago

I think the difference in understanding is that the article implies, as I understand it, that the server is applying the changes to the document when it receives a change message, not the clients. If the clients were applying the changes then we don't need Homomorphic encryption in the first place. The server would just store a log of all changes; cleaning it up once it was sure everyone played the changes if that is possible. Without Homomorphic encryption, the server must store all changes since some full snapshot and a full snapshot of the document. Where as with it, the server only ever stores the most recent copy of the document.

This could be done to reduce the time required for a client to catch up once it comes online (because it would need to replay all changes that have happened since it last connected to achieve the conflict free modification). But the article also mentions something about keeping the latest version quickly accessible.

unknown|8 months ago

[deleted]

crdrost|8 months ago

So, there is a reason that CRDT researchers would not like this response that you have given, but down-thread from you it's not why the author jakelazaroff didn't like it, but it's worth giving this answer too.

The reason CRDT researchers don't like the sync server is, that's the very thing that CRDTs are meant to solve. CRDTs are a building-block for theoretically-correct eventual consistency: that's the goal. Which means our one source-of-truth now exists in N replicas, those replicas are getting updated separately, and now: why choose eventual consistency rather than strong consistency? You always want strong consistency if you can get it, but eventually, the cost of syncing the replicas is too high.

So now we have a sync server like you planned? Well, if we're at the scale where CRDTs make sense then presumably we have data races. Let's assume Alice and Bob both read from the sync server and it's a (synchronous, unencrypted!) last-write-wins register, both Alice and Bob pull down "v1" and Alice writes "v1a" to the register and Bob in parallel writes "v1b" as Alice disconnects and Bob wins because he happens to have the higher user-ID. Sync server acknowledged Alice's write but it got lost until she next comes online. OK so new solution, we need a compare-and-swap register, we need Bob to try to write to the server and get rejected. Well, except in the contention regime that we're anticipating, this means that we're running your sync server as a single-point-of-failure strong consistency node, and we're accepting the occasional loss of availability (CAP theorem) when we can't reach the server.

Even worse, such a sync server _forces_ you into strong consistency even if you're like "well the replicas can lose connection to the sync server and I'll still let them do stuff, I'll just put up a warning sign that says they're not synced yet." Why? Because they use the sync server as if it is one monolithic thing, but under contention we have to internally scale the sync server to contain multiple replicas so that we can survive crashes etc. ... if the stuff happening inside the sync server is not linearizable (aka strongly consistent) then external systems cannot pretend it is one monolithic thing!

So it's like, the sync server is basically a sort of GitHub, right? It's operating at a massive scale and so internally it presumably needs to have many Git-clones of the data so that if the primary replica goes down then we can still serve your repo to you and merge a pull request and whatever else. But then it absolutely sucks to merge a PR and find out that afterwards, it's not merged, so you go into panic mode and try to fix things, only for 5 minutes later to discover that the PR is now merged. And if you've got a really active eventually consistent CRDT system that has a lot of buggy potential.

For the CRDT researcher the idea of "we'll solve this all with a sync server" is a misunderstanding that takes you out of eventual-consistency-land. The CRDT equivalent that lacks this misunderstanding is, "a quorum of nodes will always remain online (or at least will eventually sync up) to make sure that everything eventually gets shared," and your "sync server" is actually just another replica that happens to remain online, but isn't doing anything fundamentally different from any of the other peers in the swarm.

blamestross|8 months ago

> Which means it would need to sending and receiving the entire state of the CRDT with each change. > If the friend is online then sending operations is possible, because they can be decrypted and merged.

Or the user's client can flatten un-acked changes and tell the server to store that instead.

It can just allways flatten until it hears back from a peer.

The entire scenario is over-contrived. I wish they had just shown it off instead of making the lie of a justification.

clawlor|8 months ago

There are variants of CRDTs where each change is only a state delta, or each change is described in terms of operations performed, which don't require sending the entire state for each change.