top | item 44675847

(no title)

_willmanning | 7 months ago

I'll chime in (as a Vortex maintainer), that we are greatly indebted to Azim's work on FastLanes & ALP. Vortex heavily utilizes his work to get state-of-the-art performance.

I would add that Vortex doesn't have standalone Clickbench results. Azim is presumably referring to the duckdb-vortex results, which were run on an older version of duckdb (1.2) than the duckdb-parquet ones (1.3). We'll get those updated shortly; we just released a new version of Vortex & the duckdb extension. Meanwhile, I believe the DataFusion-Vortex vs DataFusion-Parquet speedups show substantial improvements across the board.

The folks over at TUM (who originally authored BtrBlocks) did a reasonable amount of micro-benchmarking of Vortex vs Parquet in their recent "Anyblox" paper for VLDB 2025: https://gienieczko.com/anyblox-paper

They essentially say in the paper that Vortex is much faster than the original BtrBlocks because it uses better encodings (specifically citing FastLanes & ALP).

I'm looking forward to seeing the FastLanes Clickbench results when they're ready, and Azim, we should work together to benchmark FastLanes against Vortex!

discuss

order

azimafroozeh|7 months ago

As we discuss in the FastLanes paper, the way BtrBlocks implements cascaded encodings (Vortex now) is essentially a return to block-based compression such as Zstd — which we're trying to avoid as much as possible. This design doesn't work well with modern vectorized execution engines or GPUs: the decompression granularity is too large to fit in CPU caches or GPU shared memory. So Vortex ends up being yet another Parquet-like file format, repeating the same mistakes. And if it still underperforms compared to Parquet... what’s the point?

We just released FastLanes v0.1, and more results — including ClickBench — are coming soon. Please do benchmark FastLanes — and keep us posted!