top | item 39690453

(no title)

andras_gerlits | 1 year ago

Well no, by making the probability of latency-events extremely unlikely by establishing redundant channels.

https://medium.com/p/5e397cb12e63#7df1

Considering that this is maybe the 12th time I'm linking the explanation, I now think you're not looking for a discussion, you're here for a fight. Please let me know if you find any problems with the article in its reasoning. It was vetted by a lot of people before, so I really need to notify them too if you do.

discuss

order

sausagefeet|1 year ago

> So how can we fix this problem? The same way we solve all our failure issues in IT: redundancy.

Redundancy doesn't fix any problems, it just makes them less likely to occur. Again this not only does nothing to address the CAP theorem but is done for a wide array of problems already, there is nothing new here.

andras_gerlits|1 year ago

Okay, so we can move beyond CAP. So we don't talk about implementation in either the science-paper or in the intro for it (which is what this thread is about). I mostly write about implementation, Mark mostly writes about the science. So yes, redundancy will only make latency-spikes less likely. Notice however, that the way we establish strong consistency is based on communication also. There's a part in the essay I keep linking where I talk about what happens to nodes with flaky connections, it's called "Stability.

https://medium.com/p/5e397cb12e63#373c

There are three separate mitigation-solutions that go into how total-order strong consistency can keep marching on even if a specific child-group is intermittently isolated.

The first one is redundancy by deterministic replication, so there will always be many replicas which aren't just shards, but full copies of the _consistency ledger_. It's not a database, not a cache, just the thing that establishes the consistency between nodes. These instances all "race each other" to provide the first value of the outputs to other nodes.

The second one is the latency-mitigation we talked about earlier, I don't think we need to waste more breath on that.

The third one is that since the consistency-mechanism requires an explicit association-instruction to interleave the children's versions into its parent's (so that these versions can be exposed to nodes observing it from afar). If the child goes AWOL, it won't be able to associate its versions to its parent, so it won't keep up everyone else either. In this case the total-order won't be affected by the child-group's local-order, which is still allowed to make progress, as long as it's not trying to update any record that is distant to it.