top | item 19343398

A generalised solution to distributed consensus

454 points| based2 | 7 years ago |blog.acolyer.org | reply

56 comments

order
[+] karl_gluck|7 years ago|reply
Is anyone else surprised at the relatively mild reaction to this result here on Hacker News? Only a couple dozen comments, and many are addressing other topics.

Can we take a second to appreciate that this result will rapidly become the first thing taught in every single distributed systems class? And if this holds as a generalization of trustful distributed consensus as a field, then she has defined its Turing Machine equivalent. And it is even remarkably easy to understand! That is wild!!

Props to the authors. Personally, this is going on the top of my “to memorize” list.

[+] disposablename|7 years ago|reply
This is a very techincal topic, of course the vote to comment ratio is going to be very high.
[+] Vervious|7 years ago|reply
I disagree. This paper generalizes variants of Paxos - certainly not the class of all consensus algorithms. Moreoever, it generalizes Paxos variants, with different requirements - weaker quorums are always a tradeoff with liveness in the presense of faults.

tldr; I don't see world-changing novelty here, theoretically. There are dozens of other consensus algorithms in the world, some randomized, some not quorum based, etc. This is interesting if you're in Paxos land, trying to tune your Paxos for different fault-tolerance requirements. It's no way comparable to a computational model.

[+] anigbrowl|7 years ago|reply
Not really, in fact it's something I've come to expect here. There's a loss in social status for appearing too enthusiastic about a new thing (that might not be so great) and if the new thing replaces a difficult-to-use old thing then the people who invested effort in understanding the old thing don't want to participate in making it obsolete.

Don't forget that people often just want to use an algorithm and are not necessarily that interested in the underlying mathematical structure, only in whether it works reliably and does a better job than the existing solution.

[+] brighton36|7 years ago|reply
I'm very skeptical that many people will want this. Data misrepresentation is rarely (never?) due to a fault in the persistence medium. Its usually a political reason, or human error that perverts consensus.
[+] sagichmal|7 years ago|reply
Great paper with an incredibly important result. It will take years to shake out in industry and implementations, but should dramatically improve situations where strong consistency is important.
[+] pathseeker|7 years ago|reply
Can you explain how? In many cases when a "generalized result" comes out there was already a generalized result that covered all but obscure corner cases.
[+] 0xb100db1ade|7 years ago|reply
Would someone ELI5?
[+] cbhl|7 years ago|reply
Here's my attempt:

Making multiple computers all store the same thing is hard because they have to send messages between each other, and sometimes messages get lost.

Other people have tried to write down how they solved this problem, but their explanations were hard for adults to understand.

This person has tried a new way to solve this problem, and a new way to explain their solution that uses ideas that many big kids who go to big kid school to study computers can understand.

[+] bch|7 years ago|reply
I can’t, but I can say:

1) Heidi Howard’s papers are very fun to read. She’s done lots of work I’ve enjoyed esp wrt Raft 2) She touches on two things in this story’s paper that I first read in another paper of hers that are a joy to realize for two reasons a) She made a practical improvement in the already simple Raft protocol b) quorum != (obvious) majority - in (eg) Raft she identifies that quorum actually spans two states, which are

* leader election and

* “regular” mode with a leader and requisite # followers

Read/watch on and be amazed[0][1]

[0] http://hh360.user.srcf.net/blog/2016/08/majority-agreement-i...

[1] https://youtu.be/gYkueS5sKqo

[+] grogers|7 years ago|reply
Really cool result and a very readable paper. I'm struggling to figure out how this would be applicable to iterated consensus (i.e. logs). It seems like the extension examples are focused on reducing the message round trips for specific cases that don't matter when applied to muti-paxos or raft.
[+] ccleve|7 years ago|reply
I have a lot of questions about this paper. Parts of it aren't clear to me.

Is anyone aware of a forum where consensus algorithms are discussed? A mailing list, subreddit, or website?

I found a Raft mailing list a while ago, but I'm looking for a place for a more general discussion, where real experts hang out.

[+] macintux|7 years ago|reply
For questions specifically about this paper, you might have luck with Adrian Colyer's take on it: https://blog.acolyer.org/2019/03/08/a-generalised-solution-t...

Or Howard's blog post: https://hh360.user.srcf.net/blog/2019/02/towards-an-intuitiv...

Twitter used to be a useful place for general distsys discussions, but I haven't been around for a few years now so I'm not certain how much still takes place.

This tweet, for example, I would have expected to see more people chime in on.

https://twitter.com/adriancolyer/status/1103935905948028928

There's a distributed systems theory Google Group, but it appears to have completely died off:

https://groups.google.com/forum/#!forum/distsys-discuss

To inquire what avenues for theoretical discussion are currently active, I would suggest reaching out on Twitter to Chris Meiklejohn (cmeik), Michael Bernstein (mrb_bk) or Kyle Kingsbury (aphyr).

[+] jvanderbot|7 years ago|reply
I believe you have to go through literature to get this kind of back-and-forth. At least, that's how I manage, even if that feels slow and opinionated.
[+] anonymousDan|7 years ago|reply
My understanding is the cost of all these weaker quorums is reduced fault tolerance. Is that correct?
[+] DblPlusUngood|7 years ago|reply
Yes. In theory, maybe it's useful to have more flexibility to trade-off durability for performance (smaller quorum size hopefully reduces deciding latency [unless your small quorum contains the straggler!]) for specific kinds of data in the same replica group.

In practice though, it seems easier just to run separate instances of Paxos.

[+] jrootabega|7 years ago|reply
Forgive the tangent, but could quantum entanglement one day provide consensus with trivial (if expensive) complexity? E.g., nodes maintain a supply of preentangled particles, observing them at predetermined timestamps, and using that known shared randomness to make decisions?

Edit: ok after that sat there for a while I realized that generating shared pseudorandom bits would be just as effective.

[+] karl_gluck|7 years ago|reply
If you abstract quantum-entangled particles to a communication channel with zero latency, you still have the consensus problem—you just run into the edge cases less frequently. So it would help, but consensus is still a fundamental challenge with parallel processes.
[+] thaumaturgy|7 years ago|reply
As an alternative to Paxos-related consensus algorithms, some projects may want to take a look at the Raft consensus protocol: https://www.brianstorti.com/raft/

Whereas Paxos tries to maintain a fully distributed consensus state, Raft instead proposes a protocol for automatically electing a leader node from a group of follower nodes, and coordinating the replication of the write log through that leader node.

I haven't had the pleasure of experimenting with either one though and would love to hear from people who have.

[+] Vervious|7 years ago|reply
Raft is arguably more difficult to understand at a deep level. After extensive study, I've noticed that Raft hides all of its complexity under its leader election setup. There's nuance after nuance. With Paxos, the complexity is explicit and at the forefront, which is actually better, when trying to reason about a distributed system and its invariants.
[+] DannyBee|7 years ago|reply
This paper is deliberately a generalization of all of them, actually. Raft can also be explained in terms of the rules/system she provides.

She happens to use PaxOS as examples to show they are non-optimal, in part because Raft is already deliberately non-optimal.

The other reason is that Raft is completely trivial in her framework.

It is equivalent to only having one thing decide the registers (and all the hard work is in deciding which thing).

[+] ccleve|7 years ago|reply
It took me months to do a reliable Raft implementation. The author of the Raft paper, Diego Ongaro, wrote in a comment on a Raft mailing list that it could take weeks.

The Raft paper is really easy to read and understand, but the actual code is very tricky.

[+] killjoywashere|7 years ago|reply
Isn't raft the default for Bitcoin?
[+] thecoinrepublic|7 years ago|reply
The client maintains a decision table with one entry for each quorum of each register set.