top | item 36130894

(no title)

lorendsr | 2 years ago

At some point, if you can't automatically fix something, you have to stop and report to a human for manual intervention/repair. While a saga doesn't guarantee that you avoid manual repair, it significantly reduces the need for it. If each of these has a 1% chance of non-retryable failure:

Step1

Step2

Step1Undo

then this has a 1% chance of needing manual repair (it's okay if step1 fails, but if step1 succeeds and step2 fails, we need to repair):

do Step1

do Step2

and this has a .01% chance (we only repair if Step2 and Step1Undo fails, 1% * 1%):

do Step1

try {

  do Step2
} catch {

  do Step1Undo

}

discuss

order

nivertech|2 years ago

There is also the case when Step1 was successfull, but the Saga Orchestrator (or Saga participant in case of Choreography) for some reason (like communication error) doesn't know about it.

In case Step1's service doesn't expose an API to poll its status, then the only recourse is to execute it again (with the same input key, assuming it's idempotent ;)