top | item 32730516

(no title)

valbu | 3 years ago

Not often do I find a comment that brings up very similar experience. Had intermittent data corruption because memory bit error on very upper range that almost always was unused and went unnotced a long time. ZFS backend PC was actually ok, it was main PC that used the share but point remains. After that no more non-ECC memory ever on any computer for me (ok except some laptops).

discuss

order

joecool1029|3 years ago

I'm only going to add a related anecdote that wasn't a failing of ECC vs non-ECC but rather of BIOS behavior.

Background: Lenovo Thinkpad T520 laptop, random crashes and data corruption.

Diagnosis: Eventually let memtestx86+ run a bunch of times for like a week and it wasn't showing any errors. Finally about to give up I pressed some key on the keyboard and it blew errors immediately all over the screen. This suggested EC or maybe some BIOS-controlled keyboard driver was writing to memory it shouldn't have been.

Fix: I am a Linux user, the kernel has an option to reserve low memory for poorly behaving BIOS that likes to write where it shouldn't. CONFIG_X86_RESERVE_LOW should be set to at least 64kb and increased up to 640kb if this issues continue to happen. There are some other options to scan for this misbehavior but I honestly don't know how Linux currently handles it: https://lkml.org/lkml/2013/11/11/683