(no title)
rystsov | 2 years ago
Raft & Paxos: any number of nodes may be down, as soon as the majority is available a replicated system is available and doesn't lie.
Kafka as it's described in the post(): any number of nodes may be down, at most one power outage is allowed (loss of unsynced data), as soon as the majority is available a replicated system is available and doesn't lie.
The counter-example simulates a single power outage
() https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-...
judofyr|2 years ago
https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-... talks about the case of a single failure and it shows how (a) Raft without fsync() loses ACK-ed messages and (b) Kafka without fsync() handles it fine.
This post on the other hand talks about a case where we have (a) one node being network partitioned, (b) the leader crashing, losing data, and combing back up again, all while (c) ZooKeeper doesn't catch that the leader crashed and elects another leader.
I think definitely the title/blurb should be updated to clarify that this is only in the "exceptional" case of >f failures.
I mean, the following paragraph seems completely misleading:
> Even the loss of power on a single node, resulting in local data loss of unsynchronized data, can lead to silent global data loss in a replicated system that does not use fsync, regardless of the replication protocol in use.
The next section (and the Kafka example) is talking about loss of power on a single node combined with another node being isolated. That's very different from just "loss of power on a single node".
rystsov|2 years ago
When we combine network partitioning with single local data suffix loss it either leads to a consistency violation or to a system being unavailable desperate the majority of the nodes being are up. At the moment Kafka chooses availability over consistency.
Also I read Kafka source and the role of network partitioning doesn't seem to be crucial. I suspect that it's also possible to cause similar problem with a single node power-outage https://twitter.com/rystsov/status/1641166637356417027 and unfortunate timing
comet-engine|2 years ago
slt2021|2 years ago
frant-hartm|2 years ago