I've taken a stab at making a solution for it via https://github.com/data-catering/data-caterer. It focuses on making integration tests easier by generating data across batch and real-time data sources, whilst maintaining any relationships across the datasets. You can automatically set it to pick up the schema definition from the metadata in your database to generate data for it. Once your app/job/data consumer(s) use the data, you can run data validations to ensure it runs as expected. Then you can clean up the data at the end (including data pushed to downstream data sources) if run in a shared test environment or locally. All of this runs within 60 seconds.It also gives you the option of running other types of tests such as load/performance/stress testing via generating larger amounts of data.
No comments yet.