top | item 43283138

(no title)

zhousun | 1 year ago

The only datastack iceberg (or lakehouse) will never replace is OLTP systems, for high-concurrency updates optimistic concurrency control & object store is simply a no go.

Iceberg out-of-the-box is "NOT" good at streaming use cases, unlike formats like Hudi or Paimon, the table format does not have the concept of merge/ index. However, the beauty of iceberg is it is very unopinionated, so it is indeed possible to design an engine to stream write to iceberg. As far as I know this is how engines like Upsolver was implemented: 1. Have in-memory buffer to track incoming rows before flushing a version to iceberg (every 10s to a few minutes). 2. Build Indexing structure to write position deletes/ deletion vector instead of equality deletes. 3. The writer will all try to merge small files and optimize the table.

And stay tuned, we at https://www.mooncake.dev/ are working on a solution to mirror a postgres table to iceberg, and keep them always up-to-date.

discuss

order

No comments yet.