How do you monitor data pipelines?
2 points| garymlin | 5 years ago
Things that would be important to catch:
* A table that should be getting new data everyday is no longer receiving data
* A table’s schema changes
* A column that should be unique is no longer unique
* A column that shouldn’t have nulls has nulls
* A numerical column has values that go beyond the expected range
* The distribution of categorical values is past some threshold (ie more than 80% “no” values in a column)
Also are there other obvious things that are important to catch?
No comments yet.