(no title)
nano_o | 6 years ago
A couple nitpicks: it would be nice to see what happens when the leader fails. Optimizing for the case of a stable leader might have impact on recovery time.
Another important aspect for fault-tolerance is whether you can really survive any minority crashing. For example, if only the strictly necessary number of nodes keep up with the leader, then if most of those crash the system will have a really hard time recovering due to the backlog accumulated at slow nodes which now need to catch up for the system to continue operating.
A performance number that does not take those things into account may not be very realistic. Nevertheless the idea is pretty good.
tptacek|6 years ago
nano_o|6 years ago
My point is that it would be nice to benchmark protocols that take into account the issues I brought up, and measure what happens in the worst failure scenarios they are supposed to tolerate. Otherwise we get a false sense of what performance can be achieved if one really cares about fault-tolerance.
This small issue does not diminish the main contribution of the paper in any way.