top | item 33909204

(no title)

swordsmith8 | 3 years ago

At Monte Carlo, we did some work on root cause analysis for data failures, like ETL job failures, timeouts, data delays, etc. I think there's a lot that can be done from a data science perspective to automate RCA, or provide better insights into data pipeline problems.

We put together this blog post, showing how an orchestration DAG (like a dbt schedule DAG) can be converted into a Bayesian network. You can then ask causal attribution questions in the form of conditional probability queries against the BN. The idea is still pretty basic / preliminary, but I think it could be extended in all sorts of interesting ways e.g. attributing bad row-level data to upstream transformations, etc.

Would be interested to hear what people think.

discuss

order

No comments yet.