(no title)
prirun | 7 months ago
I believe that that using non-ECC RAM is a potential cause of silent disk errors. If you read a sector without error, then a cosmic ray flips a bit in RAM containing that sector, you now have a bad copy of the sector with no error indication. Even if the backup software does a hash of the bad data and records it with the data, it's too late: the hash is of bad data. If you are lucky and the hash is created before the RAM bit flip, at least the hash won't match the bad data, so if you try to restore the file, you'll get an error at restore time. It's impossible to recover the correct data, but at least you'll know that.
The good news is that if you backup the bad data again, it will be read correctly, and be different from the previous backup. The bad news is, most backup software skips files based on metadata such as ctime and mtime, so until the file changes, it won't be re-saved.
We are so dependent on computers these days, it's a real shame that all computers don't come standard with ECC RAM. The real reason for that is that server menufacturers want to charge higher prices to data centers for "real" servers with ECC.
No comments yet.