The 8088 processor in the first IBM PC had a bug that gave me some grief.
(The code below is likely to have bugs of its own - I wrote it from memory as an illustration of the CPU bug - and thanks to 'tlb' for catching an error in my first draft. I also left out the question of what data segment the various MOV instructions use for their memory references, as it isn't relevant to this CPU bug.)
If you needed to work in a different stack from the one you were currently running on, you might do something like this:
mov saveSP, sp
mov sp, mySP
...
mov sp, saveSP
This saves the original SP (Stack Pointer) register, loads it with your private value, and then restores SP when you are done.
Suppose you wanted to switch not only to your own stack pointer but also your own stack segment. With 16-bit registers you could only address 64KB at a time, and you would need to change a segment register to access memory outside that range.
So you would save, change, and restore both the SS (Stack Segment) and SP registers:
Now imagine that an interrupt triggered in between one of the changes to SS and the matching change to SP. The interrupt code would now be running on the new stack segment but the old stack pointer, corrupting memory and crashing.
Not to worry! Intel had your back. The documentation promised that after a MOV SS or POP SS, interrupts would automatically be disabled until the next instruction (the matching MOV SP or POP SP) completed.
But they kinda forgot to implement that feature. So if you followed the docs, you would have these very rare and intermittent crash bugs.
Word got around fairly soon, and the fix was simple enough, disable interrupts yourself around the paired instructions:
mov saveSS, ss
mov saveSP, sp
cli
mov ss, mySS
mov sp, mySP
sti
...
cli
mov ss, saveSS
mov sp, saveSP
sti
This still left you unprotected against NMI (Non-Maskable Interrupt), but by the time most of us built NMI switches for our IBM PC's, we'd also upgraded to newer CPUs with this bug fixed. It was only the earliest 8088s (and perhaps 8086s) that had the bug.
"As someone who worked in an Intel Validation group for SOCs until mid-2014 or so I can tell you, yes, you will see more CPU bugs from Intel than you have in the past from the post-FDIV-bug era until recently."
It's funny to hear that the bug increases are an effect of Intel trying to compete with ARM SoCs in mobile devices, because the errata those have are much worse --- and indeed a lot of embedded stuff is like that because the general line of thought there is that bugs are worked around in software and there's little expectation of being able to run existing code flawlessly, unlike with a PC.
> the general line of thought there is that bugs are worked around in software and there's little expectation of being able to run existing code flawlessly, unlike with a PC.
Nowadays there’s hardly a device that can’t easily be updated after shipment - so the cost and effort required to make a perfect error-free CPU is not as incentivezed.
Stratoscope|4 years ago
(The code below is likely to have bugs of its own - I wrote it from memory as an illustration of the CPU bug - and thanks to 'tlb' for catching an error in my first draft. I also left out the question of what data segment the various MOV instructions use for their memory references, as it isn't relevant to this CPU bug.)
If you needed to work in a different stack from the one you were currently running on, you might do something like this:
This saves the original SP (Stack Pointer) register, loads it with your private value, and then restores SP when you are done.Suppose you wanted to switch not only to your own stack pointer but also your own stack segment. With 16-bit registers you could only address 64KB at a time, and you would need to change a segment register to access memory outside that range.
So you would save, change, and restore both the SS (Stack Segment) and SP registers:
Now imagine that an interrupt triggered in between one of the changes to SS and the matching change to SP. The interrupt code would now be running on the new stack segment but the old stack pointer, corrupting memory and crashing.Not to worry! Intel had your back. The documentation promised that after a MOV SS or POP SS, interrupts would automatically be disabled until the next instruction (the matching MOV SP or POP SP) completed.
But they kinda forgot to implement that feature. So if you followed the docs, you would have these very rare and intermittent crash bugs.
Word got around fairly soon, and the fix was simple enough, disable interrupts yourself around the paired instructions:
This still left you unprotected against NMI (Non-Maskable Interrupt), but by the time most of us built NMI switches for our IBM PC's, we'd also upgraded to newer CPUs with this bug fixed. It was only the earliest 8088s (and perhaps 8086s) that had the bug.tlb|4 years ago
anonymousiam|4 years ago
A most prescient remark in 2014.
Here's where they are more recently:
https://www.zdnet.com/article/intel-fixed-236-bugs-in-2019-a...
https://www.techradar.com/news/latest-intel-cpus-have-imposs...
Flow|4 years ago
Did they really intend to just "skip" validation or did they try to automate it further, to decrease time to produce a new chip?
userbinator|4 years ago
amelius|4 years ago
How does that work for Apple's M1?
bombcar|4 years ago