top | item 8056403

(no title)

leif | 11 years ago

We are working on several ways to prove to ourselves and the community that what we have is correct. The most important is obviously testing. We have tests that demonstrate the problems we found, and Ark passes those tests. Publishing this explanation and soliciting feedback is another.

We have run Jepsen and have not been able to get it to show data loss in TokuMX. The problems it found in MongoDB were already fixed in other ways in earlier versions of TokuMX, but we're trying to get Jepsen to demonstrate the other problems we've found.

Model checking may be another way we can prove correctness, but since Ark is so similar to Raft, I think the Raft model in TLA+[1] is probably sufficient. Anyway, we'd also need a proof that the model is equivalent to the implementation, and I don't know of a way to do that, so I think functional tests are more important.

In any case, we'll look in to using a model checker, and any help would be greatly appreciated. If you're interested, feel free to email me.

[1]: https://ramcloud.stanford.edu/~ongaro/raft.tla

discuss

order

sseveran|11 years ago

If you don't know how to use a model checker you have no business creating a distributed algorithm. Either implement someone else's or do it right. The world has enough broken distributed algorithms and we really don't need any more. Either you are using Raft and you can depend on its proofs and just do the work to ensure that the model is implemented correctly or you are using something different and you need to write the proofs and models.

The casual attitude that most people show when building these algorithms mind blowing.