top | item 46992814

(no title)

aynyc | 18 days ago

What's the difference between feather and parquet in terms of usage? I get the design philosophy, but how would you use them differently?

discuss

order

tosh|18 days ago

parquet is optimized for storage and compresses well (=> smaller files)

feather is optimized for fast reading

aynyc|18 days ago

Given the cost of storage is getting cheaper, wouldn't most firms want to use feather for analytic performance? But everyone uses parquet.

willtemperley|18 days ago

Feather (Arrow IPC) is zero copy and an order of magnitude simpler. Parquet has a lot of compatibility issues between readers and writers.

Arrow is also directly usable as the application memory model. It’s pretty common to read Parquet into Arrow for transport.

aynyc|18 days ago

When you say compatibility issues, you mean they are more problematic or less?

It’s pretty common to read Parquet into Arrow for transport.

I'm confused by this. Are you referring to Arrow Flight RPC? Or are you saying distributed analytic engine use arrow to transport parquet between queries?