top | item 46060405

(no title)

parados | 3 months ago

True, and I agree, but from their report they do seem to be doing Root Cause Analysis (RCA) even if they don't call it that.

RCA is a really bad way of investigating a failure. Simply put; if you show me your RCA I know exactly where you couldn't be bothered to look any further.

I think most software engineers using RCA confuse the "cause" ("Why did this happen") with the solution ("We have changed this line of code and it's fixed"). These are quite different problem domains.

Using RCA to determine "Why did this happen" is only useful for explaining the last stages of an accident. It focuses on cause->effect relationships and tells a relatively simple story but one that is easy to communicate - Hi there managers and media! But RCA only encourages simple countermeasures which will probably be ineffective and will be easily outrun by the complexity of real systems

However one thing RCA is really good at is allocating blame. If your organisation is using RCA then, what ever you pretend, your organisation has a culture of blame. With a blame culture (rather than a reporting culture) your organisation is much more likely to fail again. You will lack operational resilience.

discuss

No comments yet.