(no title)
apavlo | 5 months ago
→ Meta's Nimble: https://github.com/facebookincubator/nimble
→ CWI's FastLanes: https://github.com/cwida/FastLanes
→ SpiralDB's Vortex: https://vortex.dev
→ CMU + Tsinghua F3: https://github.com/future-file-format/f3
On the research side, we (CMU + Tsinghua) weren't interested in developing new encoders and instead wanted to focus on the WASM embedding part. The original idea came as a suggestion from Hannes@DuckDB to Wes McKinney (a co-author with us). We just used Vortex's implementations since they were in Rust and with some tweaks we could get most of them to compile to WASM. Vortex is orthogonal to the F3 project and has the engineering energy necessary to support it. F3 is an academic prototype right now.
I note that the Germans also released their own fileformat this year that also uses WASM. But they WASM-ify the entire file and not individual column groups:
→ Germans: https://github.com/AnyBlox
rancar2|5 months ago
Centigonal|4 months ago
Will one of the new formats absorb the others' features? Will there be a format war a la iceberg vs delta lake vs hudi? Will there be a new consortium now that everyone's formats are out in the wild?
digdugdirk|5 months ago
Also, back on topic - is your file format encryptable via that WASM embedding?
tomnicholas1|5 months ago
I would love to bring these benefits to the multidimensional array world, via integration with the Zarr/Icechunk formats somehow (which I work on). But this fragmentation of formats makes it very hard to know where to start.