top | item 31455524

(no title)

jm1271 | 3 years ago

Thanks for this post! Naive question: why not "just use Great Expectations"? At first blush GE seems like it has a lot of what you need out of the box: checks definable in YAML, extensibility, and connectors to many major data sources.

Was there something you all found lacking there which made "roll your own" the right approach here?

discuss

order

jabagonuts|3 years ago

As a software engineer new to the data space, I am baffled by why people recommended great_expectations. It has a lot of questionable dependencies that inflate image sizes and lead to conflicts at scale. It is also a very ambitious project that fails to deliver on many fronts, including documentation and basic data quality checks. The complexity in writing your own checks is way too high. There’s a lot of very abstract concepts you have to understand before you can write a single line of code. If you think I’m wrong, stop now and go look at some of their code examples. You’re better of using python’s built-in unittest to run a query and then make assertions on the result as a task in your DAG