top | item 25375466

Ask HN: How to Effectively Test ETL?

3 points| poplarstand | 5 years ago

Hey all,

Some time ago I published a Kaggle dataset as part of a passion project. It was generated by a fairly involved handwritten Python pipeline that I hacked together while learning Pandas. The dataset became far more popular than I expected, and now I'm anxious about possible bugs.

I'd like to solicit your advice on building out a decent test suite. How would you go about validating data? Are there any books on best practices that you'd recommend?

discuss

order

No comments yet.