(no title)
phoobahr | 2 years ago
With columnar data DuckDuckGo is somuchfaster at this.
For one of my projects I have what sounds like a dumb workflow: - JSON api fetches get cached in sqlite3 - Parsing the JSON gets done with sqlite3 JSON operators (Fast! Fault tolerant! Handles NULLs nicely! Fast!!). - Collating data later gets queried with duckdb - everything gets munged and aggregated into the shape I want it and is persisted in parquet files - When it's time to consume it duckdb queries my various sources, does my (used to be expensive) groupbys onthefly and spits out pandas data frames - Lastly those data frames are small-ish, tidy and flexible
So yeah, on paper it sounds like these 3 libraries overlap too much to be use at the same time but in practice they can each have their place and interact well.
No comments yet.