I clicked because of the bait-y title, but ended up reading pretty much the whole post, even though I have no reason to be interested in ZFS. (I skipped most of the stuff about logs...) Everything was explained clearly, I enjoyed the writing style, and the mobile CSS theme was particularly pleasing to my eyes. (It appears to be Pixyll theme with text set to the all-important #000, although I shouldn't derail this discussion with opinions on contrast ratios...)For less patient readers, note that the concise summary is at the bottom of the post, not the top.
Aachen|1 year ago
> As we’ve seen from the last 7000+ words, the overheads are not trivial. Even with all these changes, you still need to have a lot of deduplicated blocks to offset the weight of all the unique entries in your dedup table. [...] what might surprise you is how rare it is to find blocks eligible for deduplication are on most general purpose workloads.
> But the real reason you probably don’t want dedup these days is because since OpenZFS 2.2 we have the BRT (aka “block cloning” aka “reflinks”). [...] it’s actually pretty rare these days that you have a write operation coming from some kind of copy operation, but you don’t know that came from a copy operation. [...] [This isn't] saving as much raw data as dedup would get me, though it’s pretty close. But I’m not spending a fortune tracking all those uncloned and forgotten blocks.
> [Dedup is only useful if] you have a very very specific workload where data is heavily duplicated and clients can’t or won’t give direct “copy me!” signal
The section labeled "summary" imo doesn't do the article justice by being fairly vague. I hope these quotes from near the end of the article give a more concrete idea of why (not) use it
unknown|1 year ago
[deleted]
londons_explore|1 year ago
Didn't read the 7000 words... But isn't the dedup table in the form of a bunch of bloom filters so the whole dedup table can be stored with ~1 bit per block?
When you know there is likely a duplicate, you can create a table of blocks where there is a likely duplicate, and find all the duplicates in a single scan later.
That saves having massive amounts of accounting overhead storing any per-block metadata.
emptiestplace|1 year ago
going_north|1 year ago
[1] https://despairlabs.com/blog/posts/2024-10-27-openzfs-dedup-...
ThePowerOfFuet|1 year ago
unknown|1 year ago
[deleted]