I recently looked into this as well. Specifically how the two formats differ. As it stands right now the “Feather” file format seems to be a synonym for the Arrow IPC file format or “Arrow files” [0]. There should be basically no overhead while reading into the arrow memory format [1]. Parquet files on the other hand are stored in a different format and therefore occur some overhead while reading into memory but offer more advanced mechanism for on disk encoding and compression [1].As far as I can tell the main trade-off seems to be around deserialization overhead vs on disk file size. If anyone has more information or experience with the topic I'd love to hear!
[0] https://arrow.apache.org/faq/#what-about-the-feather-file-fo...
[1] https://arrow.apache.org/faq/#what-is-the-difference-between...
EDIT:
More information: https://news.ycombinator.com/item?id=34324649
RobinL|3 years ago
reichardt|3 years ago