Paxos | WingNews

[+] tombert|4 years ago|reply

I have a bit of a man-crush on Leslie Lamport (and have (unsuccessfully) tried to get TLA+ to be used at every job I've had for the last five years), and I have read through both the original Paxos paper in addition to the Paxos Made Simple paper, and I'm just going to say that I still found it about five times harder to grok than Raft. Honestly the TLA+ specification of Paxos made it click more than the papers for me, but that could just be because of the repeated readings before.

It's a beautiful algorithm, but Raft has the advantage of being a lot simpler.

[+] uvdn7|4 years ago|reply

I am also a huge fan of Lamport. Paxos is so concise and of such beauty that once you understand it, all other consensus protocols seem just different ways of phrasing Paxos with additional features (e.g. membership changes, make it into a log, etc.). Paxos is the smallest, most elegant consensus building block that I have seen so far. Raft's protocol covers a much bigger area than just consensus (it gives you a log, supports reconfiguration, etc.). Paxos' entire protocol fits a single slide. Period.

I wrote a few posts about how to understand paxos, and this is my personal favorite – https://blog.the-pans.com/understanding-paxos/.

Lamport did comment on it (via email) saying that understanding Paxos is less important than proving it works (via TLA+).

[+] solidangle|4 years ago|reply

You might be interested in this paper [1] which presents Paxos in the same way as Raft is presented in the Raft paper. In my view it really illustrates that most of the simplicity of Raft comes from its excellent presentation and not necessarily from its algorithmic simplicity.

[1]https://arxiv.org/pdf/2004.05074

[+] loquor|4 years ago|reply

> I have a bit of a man-crush on Leslie Lamport

+1 - I really admire him too. I wonder, what makes people like him and Edsger Dijkstra so charismatic in this sense?

[+] unknown|4 years ago|reply

[deleted]

[+] daveevad|4 years ago|reply

My grey matter is failing me but does anyone remember a rant about 5-10 years ago about developing software in this era that had some punch line about consensus algorithms that went a little like, "so, there's this guy named Diego?"

edit: found it

https://circleci.com/blog/its-the-future/

[+] didip|4 years ago|reply

That convo reminded me of Google inside joke: i just want to serve 5TB.

https://youtu.be/3t6L-FlfeaI

[+] kybernetikos|4 years ago|reply

I thought you were going to link my favourite humor piece on consensus https://scholar.harvard.edu/files/mickens/files/thesaddestmo...

[+] lowbloodsugar|4 years ago|reply

>Why don’t I just use Google’s thing?

>-You think that’s going to be around in 6 months?

bahaha.

[+] hardwaresofton|4 years ago|reply

Shameless plug of a blog series where I try to list and summarize every paxos variant:

https://vadosware.io/post/paxosmon-gotta-concensus-them-all/

[+] anchpop|4 years ago|reply

Great read. One thing you may be interested to know: when do you need paxos or similar, and when can you get away with gossip and CRDTs? the answer is that the set of domains that can be expressed using CRDTs is exactly the set of monotonic logic.

It’s a bit tricky to explain precisely in an HN comment but basically, something is monotonic if, once you learn that something is true, nothing can disprove it. For example, let’s say you know of a bunch of nodes on a graph, and you’re slowly learning about edges between the nodes. Once you see a cycle, no amount of new edges will ever “disprove” the cycle. So this means that, if what you care about is cycles, then a CRDT is fine. However, let’s say you’re trying to do disturbed garbage collection of nodes that have no edges leading to them. (And as before you’re slowly learning about new edges.) You can see a node has no edges leading to it that you know of, but if you think a node of garbage-collectible you might always be proved wrong by learning about a new edge. So the problem of distributed garbage collection can’t be solved using CRDTs.

There’s a paper on this that you’ll find if you google for the CALM theorem.

[+] AtlasBarfed|4 years ago|reply

This is utterly fantastic stuff. Assuming it's correct, which with Paxos...

[+] bla3|4 years ago|reply

That was an interesting read, thanks!

[+] gralx|4 years ago|reply

Lamport has published a TLA+ spec of the Paxos consensus algorithm¹, but none I could find of the Paxos algorithm itself (aka multi-Paxos).

In 2016 some academics at Stony Brook University did, along with a machine-checked TLAPS proof, updated in 2019:

https://arxiv.org/abs/1606.01387

¹Paxos consensus refines specs for consensus and voting. See the spec and accompanying material here: https://lamport.azurewebsites.net/tla/paxos-algorithm.html

[+] rdtsc|4 years ago|reply

Is there a mistake on the second diagram where prepare(1,e) is sent to Byzantium? Wonder if it should be prepare(1,a), since Athens hasn't learned the (1,e) value which only Ephesus and Delphi know about at that point?

Otherwise I think it's a pretty good description of Paxos. I like that it walks through a good number of failure scenarios and mentions interesting corner cases such as a lack of an explicit commit in the original papers and that reading also needs to run the full Paxos algorithm.

[+] Jtsummers|4 years ago|reply

It does seem to be a typo, based on the table and text below it.

[+] arsh|4 years ago|reply

There are huge gaps between the scientific paper and how you actually operate and scale Paxos from an engineering point of view. See "Paxos Made Live - An Engineering Perspective" by Tushar Chandra et.al.

Paxos also has drawbacks like being slow or lacking liveness guarantees. Some alternatives are discussed in https://aws.amazon.com/builders-library/leader-election-in-d...

[+] bullen|4 years ago|reply

I prefer a simpler approach where only 100% read uptime is guaranteed: http://talk.binarytask.com/task?id=4662372011677848884

The way to do this is to asynconously write to all nodes in real-time, here you can try it out: http://root.rupy.se

Go to http://root.rupy.se/node?make&info and press Make and you'll se the replication in real-time! (although it's too fast so you'll only see it as being delivered in one go, if you wireshark it you'll see multiple HTTP chunks...)

The source is also here: https://github.com/tinspin/rupy in Root.java

[+] kkwteh|4 years ago|reply

Paxos has a reputation of being hard to understand. I was under the impression that Raft was now preferred for distributed consensus. The creators of the Raft algorithm explicitly had understandability as a goal when designing it and called out Paxos as being hard to understand.

https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf

[+] solidangle|4 years ago|reply

(Multi-)Paxos and Raft are incredibly similar. I would even argue that Raft is just a variant of Paxos in the same way that e.g. Fast Paxos and EPaxos are variants of Paxos. Most of the Raft algorithm is identical to the Paxos algorithm. The main difference between the two algorithms is how leadership election works. In my view the main reason why Raft is considered to be simpler is because it was presented in a really clear paper. Paxos becomes equally simple when using a similar representation as used in that paper. See this paper: https://arxiv.org/pdf/2004.05074

[+] jldugger|4 years ago|reply

> Unmesh Joshi

It feels kinda strange to see an author other than Martin Fowler blogging on martinfowler.com....

[+] sidcool|4 years ago|reply

Unmesh Joshi is the Chief Technologist at ThoughtWorks and a colleague of Martin. Martin publishes a lot of posts on behalf of many colleagues to give visibility to tech posts that interest Martin. It's more like a collaboration and acknowledging contributions.

[+] Jtsummers|4 years ago|reply

This came up with another martinfowler.com post, and I checked the past couple years. Roughly half the content is Fowler's, and the other half are other writers (from memory, not recounting it now). I don't know the ratio going back prior years (only checked 2021 and 2020) but I've seen lots of posts by others on his site for the past few years.

https://martinfowler.com/articles/patterns-of-distributed-sy... - other content by the same author under title Patterns of Distributed Systems.

[+] m00x|4 years ago|reply

Martin Fowler is just a brand now. He's not a spectacular dev that knows all the right ways of doing things, he's just good at finding decent approaches to things and great at writing and communicating them. It's good that he found other writers to keep high quality content.

I used to hate people that preached his ways, but I recognized they're often pretty good. People just take them too much as laws and not opinions.

People have to remember that he's just human like the rest of us.

[+] DerpyBaby123|4 years ago|reply

Maybe so, although he has been featuring other authors on his blog regularly now. He works with them to keep the quality high, check them out.

[+] egberts1|4 years ago|reply

Paxos reminds me of all the TCP variants: Tahoe, Reno, Vegas, Veno, Westwood, Africa, Fast-Reno, Eifel, DOOR, Nice, YeAH, BIC, CUBIC, Illinois, Libra, Hybla, ARENO, Fusion.

[+] throwaway984393|4 years ago|reply

I think a flow chart would have been easier to understand, or perhaps a video or .gif

[+] uvdn7|4 years ago|reply

No, it doesn't. A video might help you _feel_ like you understand it but it doesn't really help you understand it. That's just not how distributed systems work.

47 comments