top | item 23036004

(no title)

scruffups | 5 years ago

The post you're replying to does list compensating transactions (a form of rollback)

One gotcha that is not covered by Sagas (I could be wrong) is when one or many of the network paths involved in the distributed tx become unreachable (network partition event) and you have no idea of the state of that part of the tx. Do you re-try that part and risk sending the same instruction twice (ok in some cases but not all) vs risk of having sent no instruction? If I had to implement a distributed tx I would first verify my mental model using TLA+ and use a (persistent) transactional messaging system with at-least-once delivery as the backbone, and make other accommodations for such scenarios.

discuss

mindcrime|5 years ago

Do you re-try that part and risk sending the same instruction twice (ok in some cases but not all) vs risk of having sent no instruction?

If you can make your compensating action idempotent, then yes, you can just keep retrying it. If it can't be made so for whatever reason, then a failure at that point demands manual intervention.

scruffups|5 years ago

I suppose redundant communication channels (that go over different network modalities, e.g, data center native, satellite, 5G, etc) can be used to recover from network partition. Still, having a protocol with at-least-once delivery guarantee is important as it assures that no messages are lost due to unexpected crash of sender/caller or receiver/callee.