top | item 41079581

(no title)

Tobu | 1 year ago

> Error handling on CRC read error > 2 or more copies of file, CRC on error, read other copy, data returned to userspace, does not correct bad copy

That's been implemented; in Linux 6.11 bcachefs will correct errors on read. See

> - Self healing on read IO/checksum error

in https://lore.kernel.org/linux-bcachefs/73rweeabpoypzqwyxa7hl...

Making it possible to scrub from userspace by walking and reading everything (tar -c /mnt/bcachefs >/dev/null).

discuss

order

amtadt|1 year ago

Self healing is dangerous because it can potentially corrupt good data on disk, if RAM or other system component is flaky.

Repro: supposedly only good copy is copied to ram, ram corrupts bit, crc is recalculated using corrupted but, corrupted copy is written back to disk(s).

cesarb|1 year ago

> crc is recalculated using corrupted bit

Why would it need to recalculate the CRC? The correct CRC (or other hash) for the data is already stored in the metadata trees; it's how it discovered that the data was corrupted in the first place. If it writes back corrupted data, it will be detected as corrupted again the next time.

newZWhoDis|1 year ago

That’s why you need ECC RAM.

Our RAM should all be ECC and our OSes should all be on self-healing filesystems.