(no title)
ylow
|
1 year ago
Can you elaborate? As I understand Delta Lake provides transactions on top of existing data and effectively stores "diffs" because it knows what the transaction did. But when you have regular snapshots, its much harder to figure out the effective diff and that is where deduplication comes in. (Quite like how git actually stores snapshots of every file version, but very aggressively compressed).
No comments yet.