top | item 31225181

(no title)

_rtld_global_ro | 3 years ago

Given Intel's AVX extension could cause silent failures on servers (very high work load for prolonged time, compare to end user computers), I'm not sure it would be a big win for servers either: https://arxiv.org/pdf/2102.11245.pdf.

discuss

order

jcranmer|3 years ago

I'm downvoting you because the assertion you're implying--that use of AVX increases soft failure rates more than using non-AVX instructions would--is not sustained by the source you use as reference.

tialaramex|3 years ago

Indeed, I'd summarise that source as "At Facebook sometimes weird stuff happens. We postulate it's not because of all the buggy code written by Software Engineers like us, it must be hardware. As well as lots of speculation about hypothetical widespread problems that would show we're actually not writing buggy software, here's a single concrete example where it was hardware".

If anything I'd say that Core 59 is one of those exceptions that prove the rule. This is such a rare phenomenon that when it does happen you can do the work to pin it down and say yup, this CPU is busted - if it was really commonplace you'd constantly trip over these bugs and get nowhere. There probably isn't really, as that paper claims, a "systemic issue across generations" except that those generations are all running Facebook's buggy code.