(no title)
timewizard | 8 months ago
That's pretty much built into most mass storage devices already.
> If a disk bitflips one of my files
The likelihood and consequence of this occurring is in many situations not worth the overhead of adding additional ECC on top of what the drive does.
> ext* won't do anything about it.
What should it do? Blindly hand you the data without any indication that there's a problem with the underlying block? Without an fsck what mechanism do you suppose would manage these errors as they're discovered?
yjftsjthsd-h|8 months ago
> What should it do? Blindly hand you the data without any indication that there's a problem with the underlying block?
Well, that's what it does now, and I think that's a problem.
> Without an fsck what mechanism do you suppose would manage these errors as they're discovered?
Linux can fail a read, and IMHO should do so if it cannot return correct data. (I support the ability to override this and tell it to give you the corrupted data, but certainly not by default.) On ZFS, if a read fails its checksum, the OS will first try to get a valid copy (ex. from a mirror or if you've set copies=2), and then if the error can't be recovered then the file read fails and the system reports/records the failure, at which point the user should probably go do a full scrub (which for our purposes should probably count as fsck) and restore the affected file(s) from backup. (Or possibly go buy a new hard drive, depending on the extent of the problem.) I would consider that ideal.
throw0101d|8 months ago
> That's pretty much built into most mass storage devices already.
And ZFS has shown that it is not sufficient (at least for some use-cases, perhaps less of a big deal for 'residential' users).
> The likelihood and consequence of this occurring is in many situations not worth the overhead of adding additional ECC on top of what the drive does.
Not worth it to whom? Not having the option available at all is the problem. I can do a zfs set checksum=off pool_name/dataset_name if I really want that extra couple percentage points of performance.
> Without an fsck what mechanism do you suppose would manage these errors as they're discovered?
Depends on the data involved: if it's part of the file system tree metadata there are often multiple copies even for a single disk on ZFS. So instead of the kernel consuming corrupted data and potentially panicing (or going off into the weeds) it can find a correct copy elsewhere.
If you're in a fancier configuration with some level of RAID, then there could be other copies of the data, or it could be rebuilt through ECC.
With ext*, LVM, and mdadm no such possibility exists because there are no checksums at any of those layers (perhaps if you glom on dm-integrity?).
And with ZFS one can set copies=2 on a per-dataset basis (perhaps just for /home?), and get multiple copies strewn across the disk: won't save you from a drive dying, but could save you from corruption.
yjftsjthsd-h|8 months ago
I looked at that, in hopes of being able to protect my data. Unfortunately, I considered this something of a fatal flaw:
> It uses journaling for guaranteeing write atomicity by default, which effectively halves the write speed.
- https://wiki.archlinux.org/title/Dm-integrity
timewizard|8 months ago
Which implies you can already correct errors through a simple majority mechanism.
> or it could be rebuilt through ECC.
So just by having the appropriate level of RAID you automatically solve the problem. Why is this in the fs layer then?
ars|8 months ago
That's 10^14 bits for a consumer drive. That's just 12TB. A heavy user (lots of videos or games) would see a bit flip a couple times a year.
magicalhippo|8 months ago
According to that 10^14 metric I should see read errors just about every month. Except I have just about zero.
Current disks are ~4 years, runs 24/7, and excluding a bad cable incident I've had a single case of a read error (recoverable, thanks ZFS).
I suspect those URE numbers are made by the manufacturers figuring out they can be sure the disk will do 10^14, but they don't actually try to find the real number because 10^14 is good enough.
Dylan16807|8 months ago
timewizard|8 months ago