Actually everywhere "defense in depth" is used (not only in computing, but also in e.g. aviation), there can't be one single cause for a disaster - each of the layers has to fail for a disaster to happen.
Almost every accident, even where there is no defense in depth has more than one cause. Car accident: person A was on the phone, person B didn't spot their deviation in time to do anything about it: accident. A is the root cause. If B would have reacted faster then there wouldn't be an accident but there would still be cause for concern and there would still be a culprit. The number of such near misses and saves by others is similar to the defense in depth in effect even if it wasn't engineered in. But person B isn't liable even though their lack of attention to what was going on is a contributory factor. So root causes matter, that's the first and most clear thing to fix. Other layers may be impacted and may require work but that isn't always the case.
In software the root cause is often a very simple one: assumption didn't hold.
I think if the people you're working with insist on narrowing it down to a single Root Cause, they're missing the entire point of the exercise. I work with large drones day to day and when we do an accident investigation we're always looking for root causes, but there's almost always multiple. I don't think we've ever had a post-accident RCA investigation that resulted in only one corrective action. Several times we have narrowed it down to a single software bug, but to get to the point where a software bug causes a crash, there's always a number of other factors that have to align (e.g. pilot was unfamiliar with the recovery procedure, multiple cascaded failures, etc)
Yes, that's true in the general sense. But root causes are interesting because they are the things that can lead to insights that can help the lowest levels of engineering to become more robust. But at a higher level it is all about systems and the way parts of those systems interact, fault tolerance (massively important) and ensuring faults do not propagate beyond the systems they originate in. That's what can turn a small problem into a huge disaster. And without knowing the root cause you won't be able to track those proximate causes and do something about it. So RCA is a process, not a way to identify the single culprit. So this is more about the interpretation of the term RCA than about what RCA really does.
rob74|2 years ago
jacquesm|2 years ago
In software the root cause is often a very simple one: assumption didn't hold.
tonyarkles|2 years ago
jacquesm|2 years ago
mytailorisrich|2 years ago
NikolaNovak|2 years ago