(no title)
RobinL
|
1 month ago
Worse in some ways, better in others. DuckDB is often an excellent tool for this kind of task. Since it can run parallelized reads I imagine it's often faster than command line tool, and with easier to understand syntax
briHass|1 month ago
I've been using this pattern (scripts or code that execute commands against DuckDB) to process data more recently, and the ability to do deep investigations on the data as you're designing the pipeline (or when things go wrong) is very useful. Doing it with a code-based solution (read data into objects in memory) is much more challenging to view the data. Debugging tools to inspect the objects on the heap is painful compared to being able to JOIN/WHERE/GROUP BY your data.
groundzeros2015|1 month ago
mrgoldenbrown|1 month ago
The bottleneck in the example was maxing out disk IO, which I don't think duckdb can help with.
chuckadams|1 month ago
On the other hand, unix sockets combined with socat can perform some real wizardry, but I never quite got the hang of that style.