(no title)
scruffups | 5 years ago
One gotcha that is not covered by Sagas (I could be wrong) is when one or many of the network paths involved in the distributed tx become unreachable (network partition event) and you have no idea of the state of that part of the tx. Do you re-try that part and risk sending the same instruction twice (ok in some cases but not all) vs risk of having sent no instruction? If I had to implement a distributed tx I would first verify my mental model using TLA+ and use a (persistent) transactional messaging system with at-least-once delivery as the backbone, and make other accommodations for such scenarios.
mindcrime|5 years ago
If you can make your compensating action idempotent, then yes, you can just keep retrying it. If it can't be made so for whatever reason, then a failure at that point demands manual intervention.
scruffups|5 years ago