I think using a data warehouse as your data lake or lake house is optimal. Even for data that isn't relational. Storage is so cheap now and is decoupled from compute costs for several providers that I don't even give it a thought. You get a fast, scalable SQL interface which is still nice and useful for non-relational data. Then all, or most, of the transformations needed for analysis can be pure SQL using a tool like DBT. In my experience, it greatly simplifies the entire pipeline.
CRConrad|4 years ago
I don't get it... Looks to me like DBT is a Python SQL wrapper / big library that among other things includes an SQL generator / something else like that -- but not "pure" SQL?
rpedela|4 years ago
You can give it truly pure SQL in both models and scripts, and mixing in Jinja if you need it for dynamic models. But I'd recommend at least using ref/source.