top | item 37194166

(no title)

allanjude | 2 years ago

That is not how this will work.

The reason the parity ratio stays the same, is that all of the references to the data are by DVA (Data Virtual Address, effectively the LBA within the RAID-Z vdev).

So the data will occupy the same amount of space and parity as it did before.

All stripes in RAID-Z are dynamic, so if your stripe is 5 wide and your array is 6 wide, the 2nd stripe will start on the last disk and wrap around.

So if your 5x10 TB disks are 90% full, after the expansion they will contain the same 5.4 TB of data and 3.6 TB of parity, and the pool will now be 10 TB bigger.

New writes, will be 4+2 instead, but the old data won't change (they is how this feature is able to work without needing block-pointer rewrite).

See this presentation: https://www.youtube.com/watch?v=yF2KgQGmUic

discuss

order

crote|2 years ago

The linked pull request says "After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks". That'd mean that the disks do not contain the same data, but it is getting moved around?

Regardless, my entire point is that you still lose a significant amount of capacity due to the old data remaining as 3+2 rather than being rewritten to 4+2, which heavily disincentives the expansion of arrays reaching capacity - but that is the only time people would want to expand their array.

It just seems to me like they are spending a lot of effort on a feature which you frankly should not ever want to use.

rincebrain|2 years ago

I don't think that's true.

I don't use raidz for my personal pools because it has the wrong set of tradeoffs for my usage, but if I did, I'd absolutely use this.

Yes, your data has the old data:parity ratio for older data, but you now have more total storage available, which is the entire goal. Sure, it'd be more space-efficient to go rewrite your data, piecemeal or entirely, afterward, but you now have more storage to work with, rather than having to remake the pool or replace every disk in the vdev with a larger one.

ilyt|2 years ago

> So the data will occupy the same amount of space and parity as it did before.

So you lose data capacity compared to "dumb" RAID6 on mdadm.

If you expand RAID6 from 4+2 to 5+2, you go from using 33.3% data for parity to 28.5% on parity

If you expand RAIDZ from 4+2 to 5+2, your new data will use 28.5%, but your old (which is majority, because if it wasn't you wouldn't be expanding) would still use 33.3% on parity.

wkat4242|2 years ago

Could you force a complete rewrite if you wanted to? That would be handy. Without copying all the data elsewhere of course. I don't have another 90TB of spare disks :P

Edit: I suppose I could cover this with a shell script needing only the spare space of the largest file. Nice!

lloeki|2 years ago

> Could you force a complete rewrite if you wanted to?

On btrfs that's a rebalance, and part of how one expands an array (btrfs add + btrs balance)

(Not sure if ZFS has a similar operation, but from my understanding resilvering would not be it)

Not that it matters much though as RAID5 and RAID6 aren't dependable upon, and the array failure modes are weird in practice, so in context of expanding storage it really only matters for RAID0 and RAID10.

https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu...

https://www.unixsheikh.com/articles/battle-testing-zfs-btrfs...

keeperofdakeys|2 years ago

The easiest approach is to make a new subvolume, and move one file at a time. (Normal mv is copy + remove which doesn't quite work here, so you'd probably want something using find -type f and xargs with mv).