(no title)
djhworld | 4 months ago
> Workers download, decompress, and materialize their shards into DuckDB databases built from Parquet files.
I'm interested to know whether the 5s query time includes this materialization step of downloading the files etc, or is this result from workers that have been "pre-warmed". Also is the data in DuckDB in memory or on disk?
philbe77|4 months ago
You can have GizmoEdge reference cloud (remote) data as well, but of course that would be slower than what I did for the challenge here...
The data is on disk - on locally mounted NVMe on each worker - in the form of a DuckDB database file (once the worker has converted it from parquet). I originally kept the data in parquet, but the duckdb format was about 10 to 15% faster - and since I was trying to squeeze every drop of performance - I went ahead and did that...
Thanks for the questions.
GizmoEdge is not production yet - this was just to demonstrate the art of the possible. I wanted to divide-and-conquer a huge dataset with a lot of power...
philbe77|4 months ago
DuckDB blog: https://duckdb.org/2025/10/09/benchmark-results-14-lts