Ask HN: How to Effectively Test ETL?
3 points| poplarstand | 5 years ago
Some time ago I published a Kaggle dataset as part of a passion project. It was generated by a fairly involved handwritten Python pipeline that I hacked together while learning Pandas. The dataset became far more popular than I expected, and now I'm anxious about possible bugs.
I'd like to solicit your advice on building out a decent test suite. How would you go about validating data? Are there any books on best practices that you'd recommend?
No comments yet.