(no title)
throwaway-aws9 | 3 months ago
Here's an oldie on the topic: https://adamdrake.com/command-line-tools-can-be-235x-faster-...
throwaway-aws9 | 3 months ago
Here's an oldie on the topic: https://adamdrake.com/command-line-tools-can-be-235x-faster-...
faizshah|3 months ago
Go try doing an aggregation of 650gb of json data using normal CLI tools vs duckdb or clickhouse. These tools are pipelining and parallelizing in a way that isn’t easy to do with just GNU Parallel (trust me, I’ve tried).
unknown|3 months ago
[deleted]
Demiurge|3 months ago
CraigJPerry|3 months ago
working memory requirements
So for each date in the dataset we need 16 bytes to accumulate the result.That's ~180 years worth of daily post counts per gb ram - but the dataset in the post was just 1 year.
This problem should be mostly network limited in the OP's context, decompressing snappy compressed parquet should be circa 1gb/sec. The "work" of parsing a string to a date and accumulating isn't expensive compared to snappy decompression.
I don't have a handle on the 33% longer runtime difference between duckdb and polars here.
adammarples|3 months ago
jgalt212|3 months ago
https://www.youtube.com/watch?v=3t6L-FlfeaI