(no title)
fizwhiz | 2 years ago
So their own homegrown leader election algorithm?
> BlazingMQ’s leader election and state machine replication differs from that of Raft in one way: in Rafts leader election, only the node having the most up-to-date log can become the leader. If a follower receives an election proposal from a node with stale view of the log, it will not support it. This ensures that the elected leader has up-to-date messages in the replicated stream, and simply needs to sync up any followers which are not up to date. A good thing about this choice is that messages always flow from leader to follower nodes.
> BlazingMQ’s elector implementation relaxes this requirement. Any node in the cluster can become a leader, irrespective of its position in the log. This adds additional complexity in that a new leader needs to synchronize its state with the followers and that a follower node may need to send messages to the new leader if the latter is not up to date. However, this deviation from Raft and the custom synchronization protocol comes in handy because it allows BlazingMQ to avoid flushing (fsync) every message to disk. Readers familiar with Apache Kafka internals will see similarities between the two systems here.
"a new leader needs to synchronize its state with the followers and that a follower node may need to send messages to the new leader if the latter is not up to date". I thought a hallmark of HA systems was fast failover? If I come to your house and knock on the door, but it takes you 10mins to get off the couch to open the door, it's perfectly acceptable for me to claim you were "unavailable". Pedants will argue the opposite.
anentropic|2 years ago
how is this possibly a selling point?
fizwhiz|2 years ago
> Just like BlazingMQ’s other subsystems, its leader election implementation (and general replicated state machinery) is tested with unit and integration tests. In addition, we periodically run chaos testing on BlazingMQ using our Jepsen chaos testing suite, which we will be publishing soon as open source. We have also tested our implementation with a TLA+ specification for BlazingMQ’s elector state machine.
scottlamb|2 years ago
> how is this possibly a selling point?
Context.
In an overview doc? The informal version is accessible to more audiences.
As the canonical design doc for the system? It's certainly not.
KRAKRISMOTT|2 years ago