top | item 40527220

Distributed Snapshots: Chandy-Lamport Protocol

72 points| federicoponzi | 1 year ago |blog.fponzi.me

4 comments

I found this [0] a very accessible explanation as well.

[0] https://blog.acolyer.org/2015/04/22/distributed-snapshots-de...

scrubs|1 year ago

Unusually well written article for distributed work involving tla. Thanks. I liked it and learned something. Bookmarked.

wg0|1 year ago

Noob question - Raft and Paxos solve a different problem?

yencabulator|1 year ago

Those are about distributed consensus, making sure participants come to the same conclusion about something and nobody has the wrong answer.

Distributed snapshots are trying to do as little work as possible to get a consistent view of the distributed computation, without forcing the heavy cost of consensus on it. For example, node A is sending a message to node B, we don't care if we capture

- 1: A before it sends the message, B before it receives the message

- 2: A after it has sent the message, the message, and B before it receives the message

- 3: A after it has sent the message, B after it has received the message

No matter which of those states we restore, the computation will continue correctly.