top | item 46431946

(no title)

sandblast2 | 2 months ago

I consult for a small company which feeds some of the largest market research companies. This company finds data providers for each country, collect the data monthly and need to massage it into a uniform structure before handing it over. I help them scripting this. I found importing the monthly spreadsheets into mongodb and querying the set can replace an awful lot of manual scripting work. That aggregator queries are a good fit for an aggregator company shouldn't be that big of a surprise, I guess.

The mongodb instance is ephemeral, the database itself is ephemeral, both only exist while the script is running which can be measured in seconds. The structure is changing from month to month. All this plays to the strengths of mongodb while avoiding the usual problems. For eg one stage of the aggregate pipeline can only be 100MB? A source csv is a few megabytes at most.

Ps.: no, Excel can't do it, I got involved with this when the complexity to do it in Excel has become unbearable.

discuss

solatic|2 months ago

duckdb wouldn't help?

https://duckdb.org/docs/stable/data/csv/overview

https://duckdb.org/docs/stable/sql/functions/aggregates

cpursley|2 months ago

Postgres has jsonb helper functions for this.

unknown|2 months ago

[deleted]