The Pentium FDIV bug, reverse-engineered

[+] jghn|1 year ago|reply

An anecdote regarding this bug that always cracks me up. My college roommate showed up with a shiny new pentium machine that year, and kept bragging about how awesome it was. We used some math software called Maple that was pretty intensive for PCs at the time, and he thought he was cool because he could do his homework on his PC instead of on one of the unix machines in the lab.

Except that he kept getting wrong answers on his homework.

And then he realized that when he did it on one of the unix machines, he got correct answers.

And then a few months later he realized why ....

[+] Timwi|1 year ago|reply

“some math software called Maple”. I still use a version of Maple that I bought in 1998. I found subsequent versions of it much harder to use and I've never found an open-source software that could do what it can do. I don't need anything fancy, just occasionally solve an equation or a system of equations, or maybe plot a simple function. That old copy of Maple continues to serve me extremely well.

[+] mleo|1 year ago|reply

The mention of Maple brings back vivid memories of freshmen year of college when the math department decided to use the software as part of instruction and no one understood how to use it. There was a near revolt by the students.

[+] molticrystal|1 year ago|reply

The story was posted a couple days ago and ken left a couple comments there: https://news.ycombinator.com/item?id=42388455

I look forward to the promised proper write up that should be out soon.

[+] pests|1 year ago|reply

There is a certain theme of posts on HN where I am just certain the author is gonna be Ken and again not disappointed.

[+] mega_dingus|1 year ago|reply

Oh to remember mid-90s humor

How many Intel engineers does it take to change a light bulb? 0.99999999

[+] zoky|1 year ago|reply

Why didn’t Intel call the Pentium the 586? Because they added 486+100 on the first one they made and got 585.999999987.

[+] perdomon|1 year ago|reply

This dude pulled out a microscope and said "there's your problem." Super impressive work. Really great micro-read.

[+] layer8|1 year ago|reply

To be fair, he knew what the problem was (errors in the lookup table) beforehand.

[+] sgerenser|1 year ago|reply

I wonder what node generation doing this type of thing becomes impossible (due to features being too small to be visible with an optical microscope)? I would have guessed sometime before the first Pentium, but obviously not.

[+] ornornor|1 year ago|reply

These things completely fly above my head. I wish I could understand them because it’s pretty cool, but I got lost at the very succinct SRT explanation and it went downhill from there. Still, die shots are always pretty to look at.

[+] Cumpiler69|1 year ago|reply

This is probably one of the reasons Intel went to a microcode architecture after.

I wonder how many yet to be discover silicone bugs are out there on modern chips?

[+] Lammy|1 year ago|reply

Older Intel CPUs were already using microcode. Intel went after NEC with a copyright case over 8086 microcode, and after AMD with a copyright case over 287/386/486 microcode:

- https://thechipletter.substack.com/p/intel-vs-nec-the-case-o...

- https://www.upi.com/Archives/1994/03/10/Jury-backs-AMD-in-di...

I would totally believe the FDIV bug is why Intel went to a patchable microcode architecture however. See “Intel P6 Microcode Can Be Patched — Intel Discloses Details of Download Mechanism for Fixing CPU Bugs (1997)” https://news.ycombinator.com/item?id=35934367

[+] kens|1 year ago|reply

Intel used microcode starting with the 8086. However, patchable microcode wasn't introduced until the Pentium Pro. The original purpose was for testing, being able to run special test microcode routines. But after the Pentium, Intel realized that being able to patch microcode was also good for fixing bugs in the field.

[+] wmf|1 year ago|reply

They always used microcode: https://www.righto.com/2022/11/how-8086-processors-microcode...

I'm not sure when Intel started supporting microcode updates but I think it was much later.

[+] userbinator|1 year ago|reply

Look at how long the public errata lists are, and use that as a lower bound.

[+] KingLancelot|1 year ago|reply

Silicone is plastic, Silicon is the element.

[+] Thaxll|1 year ago|reply

So nowdays this table could have been fixed with a microcode update right?

[+] phire|1 year ago|reply

The table couldn't be fixed. But it can be bypassed.

The microcode update would need to disable the entire FDIV instruction and re-implement it without using any floating point hardware at all, at least for the problematic devisors. It would be as slow as the software workarounds for the FDIV bug (average penalty for random divisors was apparently 50 cycles).

The main advantage of a microcode update is that all FDIVs are automatically intercepted system-wide, while the software workarounds needed to somehow find and replace all FDIVs in the target software. Some did it by recompiling, others scanned for FDIV instructions in machine code and replaced them; Both approaches were problematic and self-modifying code would be hard to catch.

A microcode update "might" have allowed Intel to argue their way out of an extensive recall. But 50 cycles on average is a massive performance hit, FDIV takes 19 cycles for single-precision. BTW, this microcode update would have killed performance in quake, which famously depended on floating point instructions (especially the expensive FDIV) running in parallel with integer instructions.

[+] jeffbee|1 year ago|reply

With a microcode update that ruins FDIV performance, sure. Even at that time there were CPUs still using microcoded division, like the AMD K5.

50 comments