top | item 20604986

(no title)

danjayh | 6 years ago

"The fault occurs when bits inside the microprocessor are randomly flipped from 0 to 1 or vice versa. This is a known phenomenon that can happen due to cosmic rays striking the circuitry."

This is called Single Event Upset. For all those of you that aren't in the industry, essentially the problem is that any bit inside the flight computer (RAM, cache, NVM, registers - ANY bit) can change state randomly at ANY time. It's rare, but when you get into millions of flight hours, it WILL happen. The software and hardware have to be designed to mitigate problems caused by this behavior.

discuss

order

GistNoesis|6 years ago

If this is due to cosmic rays, doesn't flying over the poles make it more likely that such event happens ?

How are we sure there aren't local bursts of cosmic rays that would make suddenly a few of those Single Event Upset, for example when there are some high likelihood of seeing northern lights ?

How do you test your hardware and software to show that you are indeed cosmic-ray proof ?

fit2rule|6 years ago

You test the living crap out of it, and not just in the lab on the workbench but also in operation while online - while the thing is running in operation, it is also consistently testing itself to ensure that the hardware is performing as expected.

Online software tests check for cosmic ray bit flips about 1000 times a second, in addition to whatever hardware mechanisms are in place to detect this (ECC, etc.) This is a standard module in most SIL-4 applications, where 2 of 3 consensus model is being used.

What I don't understand is why Boeing aren't using 2-of-3 computer architecture in this application - or maybe they are, and the '3 voting units' are considered to be 'one computer' and they've just added another one to be sure.

In rail transportation systems, this is taken even further by using 2-of-3 configurations where each computer is a different architecture completely ..