(no title)
philbe77 | 4 months ago
You can have GizmoEdge reference cloud (remote) data as well, but of course that would be slower than what I did for the challenge here...
The data is on disk - on locally mounted NVMe on each worker - in the form of a DuckDB database file (once the worker has converted it from parquet). I originally kept the data in parquet, but the duckdb format was about 10 to 15% faster - and since I was trying to squeeze every drop of performance - I went ahead and did that...
Thanks for the questions.
GizmoEdge is not production yet - this was just to demonstrate the art of the possible. I wanted to divide-and-conquer a huge dataset with a lot of power...
philbe77|4 months ago
DuckDB blog: https://duckdb.org/2025/10/09/benchmark-results-14-lts