top | item 44563986

(no title)

Nelkins | 7 months ago

Cool, but this is very specific to DataFusion, no? Is there any chance this would be standardized so other Parquet readers could leverage the same technique?

discuss

order

gdubya|7 months ago

The technique can be applied by any engine, not just DataFusion. Each engine would have to know about the indexes in order to make use of them, but the fallback to parquet standard defaults means that the data is still readable by all.

aerzen|7 months ago

But does data fusion publish a specification of how this metadata can be read, along with a test suite for verifying implementations? Because if they don't, this cannot be reliably used by any other impl

ethan_smith|7 months ago

The Arrow/Parquet community is already discussing standardization via the Parquet format GitHub - this approach intentionally uses existing extension points in the format specification to remain compatible while the standardization discussions progress.