top | item 43404444

(no title)

alexott | 11 months ago

Plain parquet has a lot of problems. That’s why iceberg and delta arise

discuss

order

timenova|11 months ago

Can you elaborate what kind of problems does plain parquet have?

pacbard|11 months ago

Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

A Parquet file is a static file that has the whole data associated with a table. You can't insert, update, delete, etc. It's just it. It works ok if you have small tables, but it becomes unwieldy if you need to do whole-table replacements each time your data changes.

Apache Iceberg fixes this problem by adding a metadata layer on top of smaller Parquet files (at a 300,000 ft overview).