top | item 36777767

(no title)

whytai | 2 years ago

For testing:

- we have a dedicated dev environment for analysts to experience a dev/test loop. None of the pipelines can be run locally unfortunately.

- we have CI jobs and unit tests that are run on all pipelines

Observability:

- we have data quality checks for each dataset, organized by tier. This also integrates with our alerting system to send pagers when data quality dips.

- Airflow and our query engines hive/spark/presto each integrate with our in-house lineage service. We have a lineage graph that shows which pipelines produce/consume which assets but it doesn't work at the column level because our internal version of Hive doesn't support that.

- we have a service that essential surfaces observability metrics for pipelines in a nice ui

- our airflow is integrated with pagerduty to send pagers to owning teams when pipelines fail.

We'd like to do more, but nobody has really put in the work to make a good static analysis system for airflow/python. Couple that with the lack of support for column level lineage OOTB and it's easy to get into a mess. For large migrations (airflow/infra/python/dependecy changes) we still end up doing adhoc analysis to make sure things go right, and we often miss important things.

Happy to talk more about this if you're interested.

discuss

No comments yet.