top | item 27178620

You Can’t Sacrifice Partition Tolerance (2010)

93 points| xk3 | 4 years ago |codahale.com

59 comments

order
[+] eloff|4 years ago|reply
You can make partition tolerance arbitrarily low in practice with enough work and resources. But you can never eliminate it. Note it is not just network failures here, because a dead machine and an inaccessible machine are both unavailable.

So in theory you have to choose in the event of P do you sacrifice A or C. That's CAP in a nutshell. You have to plan for P so choose your failure mode in that case.

And this is where people went wrong. They figured you can't sacrifice availability, so then you can't have consistency. If you can't have consistency in failure modes, you may as well not have it at all.

That's really dumb, because in reality you're always going to have some downtime. Most use cases can afford a little downtime, rather just work to keep it to acceptable levels. By sacrificing consistency you enter a realm where logic and cause and effect no longer apply - it's hell to program with that model - and the resulting bugs will cost you availability anyway.

CP should be the default choice.

[+] lostcolony|4 years ago|reply
It's not all or nothing, and it's not about the downtime. It has everything to do with what guarantees you need about your data. It turns out, there are various levels of consistency (strict, sequential, linearizable, etc), and the stronger the guarantees you need, both the harder it is to enforce at a system-wide level, AND the harder it is to have high throughput (both in the 'happy' case, and in any sort of negative event).

It's not a question, really, of "choosing" to be CP, but instead to figure out where on a spectrum of consistency vs availability you want to fall (and then that shade will fall into the 'consistent' or 'available' camp, as generally understood). Sometimes it's okay to just accept a best attempt at a write (such as a cache). Sometimes you REALLY need to ensure ONE view of the truth across the entire system (such as banking). Most cases fall somewhere in the middle, you can make tradeoffs. For instance, user metadata stores are great cases for eventual consistency. Very unlikely to get conflicts (a user isn't generally updating their data in two different locations), but could be fairly high throughput on reads; relaxing your consistency guarantees makes sense (easier to implement, fewer resources needed to maintain invariants, and allows for higher availability in the event of a partition).

[+] jameshart|4 years ago|reply
> You can make partition tolerance arbitrarily low

I guess you mean 'the need for partition tolerance'. And that is honestly a better way of reading the P - effectively as 'partitionable'.

CAP says if you are making a partitionable system, it can't guarantee both consistency and availability. If you want consistency and availability, you will have to make the system unpartitionable - effectively, not distributed.

The idea of 'partition tolerance' just makes no sense, because it's not really possible to make a 'partition intolerant' partitionable system.

If the definition of a 'partition tolerant' system is one which actively determines, in the face of network partition, whether to stop answering requests because it can't communicate with other nodes, or to nonetheless answer requests in the knowledge that its answers might be inconsistent...

... how would a 'partition intolerant' node be able to do anything different? I suspect the most likely outcome for a system which can't tolerate partition is to lose both consistency AND availability.

[+] jude-|4 years ago|reply
> By sacrificing consistency you enter a realm where logic and cause and effect no longer apply

The "C" in CAP refers to linearizable consistency, which is much stricter than causal consistency. You can build an AP system that is causally consistent.

[+] ignoramous|4 years ago|reply
Google Spanner (and one other system I worked with) had this neat trick up its sleeve: Utilizing comparatively lighter witness replicas, that only participate in a quorums and keep a time-truncated (recent) un-materialized log necessary but answer no queries. Witnesses could, however, be used to bring other full nodes up-to-speed. This meant, availability of the cluster overall generally went up, assuming co-ordinated failure is rare and there's enough (hardware and software) heterogeneity in the cluster [0].

Regarding consistency in distributed systems, there's a lot to untangle here that even experts like Pat Helland are left lamenting the confusion both in the academia and in the engineering circles [1]

[0] http://www2.cs.uh.edu/~paris/MYPAPERS/Icdcs86.pdf

[1] https://pathelland.substack.com/p/dont-get-stuck-in-the-con-...

[+] zffr|4 years ago|reply
> By sacrificing consistency you enter a realm where logic and cause and effect no longer apply - it's hell to program with that model - and the resulting bugs will cost you availability anyway.

Sacrificing strong consistency doesn’t mean your system won’t be eventually consistent. CRDTs are a great example of how you can get availability and simple consistency semantics.

[+] analyst74|4 years ago|reply
Consistency requires more than just choosing the right database, but also design your overall distributed system with that in mind. You have to be very careful when it comes to caching, error handling, load balancing, etc.

Of course, if it's important enough for your use-case, there is always a way to get there.

[+] dang|4 years ago|reply
Some past threads:

You Can’t Sacrifice Partition Tolerance (2010) - https://news.ycombinator.com/item?id=11700856 - May 2016 (21 comments)

You Can’t Sacrifice Partition Tolerance (2010) - https://news.ycombinator.com/item?id=8214310 - Aug 2014 (4 comments)

You Can't Sacrifice Partition Tolerance (response to Stonebraker) - https://news.ycombinator.com/item?id=1821887 - Oct 2010 (13 comments)

You Can't Sacrifice Partition Tolerance - https://news.ycombinator.com/item?id=1768312 - Oct 2010 (50 comments)

[+] chii|4 years ago|reply
i wish that HN can actually aggregate these old threads together in the resubmission directly (and automagically?). And may be randomize the comment sorting so that the discussion can continue from the existing comments, rather than a rehash.
[+] GuB-42|4 years ago|reply
The title should be more explicitly, "You can't sacrifice partition tolerance in a distributed system".

But you can if you don't have a distributed system. I think it is a good argument for monoliths, "real" non-cloud servers and mainframes.

In reality, there are usually backups, and hardware is not perfect so the whole system is most likely somewhat CP, but having everything on the same machine means you have less to worry about P in your application.

Edit: BTW, I know that monoliths don't scale, but you don't always need to scale, or not yet.

[+] deegles|4 years ago|reply
I think about this a lot. Amazon has (among others) the m5zn.12xlarge EC2 instance with 48 vCPUs, 192GB memory, 100gbps network and 19gbps EBS bandwidth... you could run a heck of a monolith before needing to scale horizontally.
[+] tracnar|4 years ago|reply
What's often missing from these discussions is that the trade off in the CAP theorem can change across the system and over time. If you have a cache on your client, you're trading off consistency for availability, same if you do an optimistic update on the client-side. But that does not change the consistency of your database core. You can also "downgrade" to eventual consistency if you detect a network problem.

I feel that a more principled way to navigate between these trade offs and awareness from the application and database of the current consistency guarantees of the data would help to make more robust systems. E.g. you'd want strong consistency by default, with a fallback to strong eventual consistency when the network is poor and the use-case allows it (e.g. not for changing your password, but ok for posting a comment). It needs to be reflected in the UI, and ideally you want the trade off to be decided at each level in a consistent way (client, local database, core database, ...).

[+] forty|4 years ago|reply
Good article but I'm still struggling to understand CAP in practice.

In general, the whole distributed system will include some kind of database and some kind of application servers. What would AP mean in this context? That application server nodes are able to respond meaningful (ie non failing) responses even if they are unable to communicate with the DB? Wouldn't that require every application node to have a whole local copy of the data it requires to work?

[+] TBurette|4 years ago|reply
The CAP theorem applies whenever you have a distributed data store. This means a system you can write data to and read back from. This system is made of multiple nodes connected through a network and the clients can connect and make read/write requests from any node.

If you don't meet those conditions CAP doesn't apply to you. For example :

- an SQL database with master/slave replication but all the clients only ever access the database from the same nod (the clients will always have a consistent view of the one node they access)

- multiple processes on the same machine doesn't apply (it's a concurrent datastore but not a distributed one)

- a single SQL database instance doesn't apply (data is not distributed even if there are multiple clients. Confusingly enough if clients can keep some data and work in offline mode, then it would apply because the clients would be considered as a node of the system)

CAP was clearly created in the context of distributed database : SQL, noSQL, CDN,... Can it apply to an entire system made of multiple components such as database, multiple services,... ? Things get muddier but I'd say yes as long as you check all the boxes: a system with data spread on multiple nodes that clients can access to where the connection can get severed.

Application servers with no local copy losing connection to the server : no. Where is the distribution of data? Where can a network failure partition the data? There is not much A or C choice to make when a partition occurs. Applying CAP wouldn't be very interesting here.

However application servers with local copy : yes CAP applies. In case of a partition there is a trade-off to be made between C and A.

To complicate things further, CAP has basically two versions. There is Brewer's version which talks in general terms is not formal and gets hazy when you drill down to the details. The other version is the one of Lynchs's Paper that provided a proof for the CAP theorem. In that version the definitions are very strict and consequently many real worlds aspects don't fit. Depending on who you ask you may get different answers.

For example, take the context of a database distributed on multiple servers in a datacenter with a reverse proxy that can detect partitions and always redirect read/write queries to non-partitioned nodes. In that context you could say that you have all three C, A and P because there is never going to be a request on a partitioned node. According to Lynch's version of CAP this is not C consistent because you have non-failing nodes that can read stale data (the question of whether there is going to be any request or not is outside the purview of the proof).

[+] RcouF1uZ4gsC|4 years ago|reply
The OP is correct in theory, but not in practice.

If you have a lot of money and are willing and able to build a private network with enough redundancy, you can have both C and A in practice, and P will be low enough not to practically affect the service.

See

https://cloud.google.com/blog/products/databases/inside-clou...

Notice who the author is: Eric Brewer

[+] lostcolony|4 years ago|reply
Redundancy doesn't solve CAP. Even with a dozen connections you can end up with issues (A sent a packet, B received it, the connection you pick dropped the ack; what happens?)
[+] lmm|4 years ago|reply
> If you have a lot of money and are willing and able to build a private network with enough redundancy, you can have both C and A in practice, and P will be low enough not to practically affect the service.

Actually I'd say this is something that's correct in theory but not in practice. In theory you can build a network so redundant that you will never experience a partition. In practice you will always have bugs in your redundancy arrangement, operational errors by sysadmins, and so on.

> If you have a lot of money and are willing and able to build a private network with enough redundancy, you can have both C and A in practice, and P will be low enough not to practically affect the service.

You still have to choose whether to drop C, A, or both when P does happen. Five 9s means about an hour of downtime a year; maybe for some businesses that's small enough to be irrelevant, but by no means all.

[+] linuxhansl|4 years ago|reply
Not sure why you got downvoted. I came here to say the same.

While it is in theory true that you cannot have a CA system, in practice if you control your H/W and data centers as well, you can build enough redundancy that a P is not more likely than (say) being hit by a meteor. So still not a CA system, but practically CA 99.9..% (add as many 9's as you have money).

Edit: Spelling.

[+] redis_mlc|4 years ago|reply
> If you have a lot of money and are willing and able to build a private network with enough redundancy

Google has done this. Possibly AT&T also.

> Notice who the author is: Eric Brewer

Note that Professor Brewer is a teacher/academic, and CAP is intended for instructional purposes.

As a DBA, I find CAP totally inadequate for any business purpose.

Try explaining to a business owner, "Yeah, when our network partitioned, we couldn't accept purchases in both partitions." Their response will be, "So all that money you spent on HA was really based on a lie?"