top | item 5171723

Hat, not CAP: Introducing Highly Available Transactions

65 points| pbailis | 13 years ago |bailis.org | reply

8 comments

order
[+] jmileham|13 years ago|reply
In order to reconcile ACID with CAP, this defines a weakened form of ACID to mean whatever-some-databases-currently marketed-as-ACID-compliant-support in order to say that you can still offer effective ACID compliance and still choose CA over partition tolerance (in the http://codahale.com/you-cant-sacrifice-partition-tolerance/ sense). For a lot of applications, the weakened isolation guarantees aren't, or shouldn't be, negotiable (if you try to sneak by without them, they'll cause data integrity issues at scale).

Not saying that the solution doesn't provide a valuable framework for building robust applications that can overcome those issues (necessarily pushing some of that complexity up the stack to the application developer), but the marketing seems a little bit suspicious?

Edited to add: In fairness, the article doesn't actually claim to have evaded CAP - it recognizes that HAT is a compromise. But I believe it's easy to understate the practical problems with non-serializable transactions. It becomes impossible to prevent duplicate transactions from being created on the split-brain nodes. In banking, for instance, this would be a Bad Thing, and lead to potentially hairy application-specific mop up when the nodes resync.

[+] pbailis|13 years ago|reply
Good point, and well-taken. As I mention in http://www.bailis.org/blog/hat-not-cap-introducing-highly-av... (and devote an full section to in the paper, including documented isolation anomalies like lost updates, write skew, and anti-dependency cycles), there are many guarantees that aren't achievable in a highly available environment. Our goal is to push the limits of what is achievable, and, by matching the weak isolation provided by many databases, hopefully provide a familiar programming interface.

As I tried to stress in the post, we aren't claiming to "beat CAP" or provide "100% ACID compliance"; we're attempting to strengthen the semantic limits of highly available systems. I intended "HAT, not CAP" as a play on acronyms, not as a claim to achieve the impossible.

edit: We're also certainly not claiming to have a "CA" solution, whatever that means. There's a lot of confusion between "CAP atomicity"==linearizability and "ACID atomicity"=="transactional atomicity"/"all or nothing"; see http://www.bailis.org/blog/hat-not-cap-introducing-highly-av...

[+] haberman|13 years ago|reply
> In order to reconcile ACID with CAP, this defines a weakened form of ACID to mean whatever-some-databases-currently marketed-as-ACID-compliant-support

Are you referring to the isolation guarantees? "Repeatable Read" (which this provides) is a pretty reasonable standard of isolation; while "Fully Serializable" is stronger, it's also more expensive. Engines like PostgreSQL that can be run in either mode are most often run in "repeatable read" mode AFAIK.

[+] 3amOpsGuy|13 years ago|reply
To be fair to the article, It's pretty upfront about the constraints. From the article:

'Of course, there are several guarantees that HATs cannot provide. Not even the best of marketing teams can produce a real database that “beats CAP”; HATs cannot make guarantees on data recency during partitions, although, in the absence of partitions, data may not be very stale. HATs cannot be “100% ACID compliant” as they cannot guarantee serializability'

My concern is the pitched low latency use case, if I understand correctly there's no way to avoid an extra round trip?

Could be very useful all the same.

[+] ryanpers|13 years ago|reply
Interesting paper, I hope to see a follow on that actually describes the algorithm in full. As written, it doesn't cover the failure recovery, data-drift and timeout cases.

Also maybe you could speak to a few constraints: - missing updates - unique index

and outline your thoughts as to how an application developer might avoid pitfalls. Most applications I have seen tend to require/run in to these issues.