top | item 39438859

(no title)

jorticka | 2 years ago

If it's consistent and persistent, wouldn't that classify as broken hardware requiring device change?

Even with 3 chips, if one is permanently wrong you are then left with only 2 working ones so no redundancy is left for further degradation.

> just compute 3x

That might be difficult if CPU is broken. How are you sure you actually computed 3 times if you can't trust the logic.

discuss

order

MertsA|2 years ago

>wouldn't that classify as broken hardware requiring device change?

Yes but you need to catch it first to know what to take out of production.

>That might be difficult if CPU is broken. How are you sure you actually computed 3 times if you can't trust the logic.

That's kind of my point. Either it's a heisen-bug and you never see those results again when you repeat the original program or it's permanently broken and you need to swap out the sketchy CPU. If you only care about the first case then you only need one core. If you care about the second case then you need 3 if you want to come up with an accurate result instead of just determining that one of them is faulty. It's like that old adage about clocks on ships. Either take one clock or take three, never two.

namibj|2 years ago

You don't need to know which one of the two was bad; it's not worth the extra overhead to avoid scrapping two in the rare case you catch a persistent glitch; sudden hardware death (blown VRM or such, for example) will dominate either way, so you might as well build your "servers" to have two parts that check each other and force-reset when they don't agree. If it reboot-loops you take it out of the fleet.