top | item 44674730

(no title)

jmakov | 7 months ago

Any comparison with https://github.com/vortex-data/vortex? Also any plans to integrate with polars?

discuss

order

azimafroozeh|7 months ago

Vortex borrows a few ideas from the FastLanes project, such as bit-packing and ALP. However, it’s unclear how well these are implemented — their performance on ClickBench appears worse than Parquet in both storage size and decompression speed, which is counterintuitive.

Technically, Vortex is documented more like a BtrBlocks-style format, which we’ve benchmarked and compared against in depth.

_willmanning|7 months ago

I'll chime in (as a Vortex maintainer), that we are greatly indebted to Azim's work on FastLanes & ALP. Vortex heavily utilizes his work to get state-of-the-art performance.

I would add that Vortex doesn't have standalone Clickbench results. Azim is presumably referring to the duckdb-vortex results, which were run on an older version of duckdb (1.2) than the duckdb-parquet ones (1.3). We'll get those updated shortly; we just released a new version of Vortex & the duckdb extension. Meanwhile, I believe the DataFusion-Vortex vs DataFusion-Parquet speedups show substantial improvements across the board.

The folks over at TUM (who originally authored BtrBlocks) did a reasonable amount of micro-benchmarking of Vortex vs Parquet in their recent "Anyblox" paper for VLDB 2025: https://gienieczko.com/anyblox-paper

They essentially say in the paper that Vortex is much faster than the original BtrBlocks because it uses better encodings (specifically citing FastLanes & ALP).

I'm looking forward to seeing the FastLanes Clickbench results when they're ready, and Azim, we should work together to benchmark FastLanes against Vortex!