top | item 40054576

(no title)

Nomadeon | 1 year ago

As we went from zero to 10K+ embedded systems (full PCs with significant RAM) the issues got weirder.

The best was a one-off error log along the lines of "unknown type System.DateTime". Huh? That's a system defined type that just went missing. Never saw it again.

Another at a different employer was a crash that occurred after a check condition that absolutely should have gated the crash from being reached. Single threaded. Simple microcontroller. Had to reflash it to flip the bit back. After doing the math on how much RAM we had in the wild vs. cosmic bit flip rates reported in super computers, we had to expect one flip per year.

If it's a safety critical system, server or not, use ECC RAM!!

discuss

order

repiret|1 year ago

I am of the opinion that far more than safety critical systems should use ECC. You should use ECC anytime bit flips might cost you more money then the ECC does, which is why I insist on ECC for my desktop computers.

eschneider|1 year ago

Given a large enough installed base, any unlikely but possible problem will occur for some segment of the user population. Guaranteed. :/