I used to use "Lanner" gear for voip and these had embedded intel ethernets. I don't have any more of them to test, but I swear I've seen it on them as well. We suspected power supply problems because the link lights would just go dark every once in a blue moon and need a power cycle to set right, but then we were never be able to reproduce it.
I am impressed by his original troubleshooting, but this followup seems impractical. Of his three suggestions, only the third (Intel providing improved board testing tools) even seems like it could possibly prevent this sort of problem. Asking for hardware-enforced "sane" behavior is like asking, "why doesn't my computer know I don't want my program to deadlock, segfault, or loop indefinitely?" That is, if the controller could do that then it would solve the Halting Problem. Improved drivers, his second suggestion, are always a good thing, but drivers only get patched to handle broken hardware in response to the discovery of broken hardware. There is no way to anticipate each particular way a NIC could possibly be broken ahead of time.
The market demands controllers with flexible and expandable functionality. Board manufacturers use the EEPROM to specify exactly what behavior is required. If a particular manufacturer underestimates the importance of correctness and doesn't perform the code review and testing necessary to prevent a PoD, that isn't Intel's fault.
I have a plurality of systems with Intel motherboards which demonstrate the same kind of problems. The motherboards in question have two Intel ethernet controllers, one of which is an 82574L.
The systems connect to two different networks. When the systems attach to one of the networks (but not the other) using the 82574L interface (but not the other), that interface dies after some unpredictable amount of time.
I have tried posting comments to the Intel engineer's blog post (and PM-ing the engineer directly), but they do not appear. In fact, there seem to be no comments at Intel's site, despite the post having nearly 6000 views (at my time of writing).
As I say in my updated post, this is a complex issue with clear combinatorial factors. More than likely it's not limited to one chip, one packet, or one EEPROM configuration. A quick reading of the web shows various unexplained issues with this family of Intel ethernet controllers randomly exhibiting the exact behavior I've described. Different controllers, different mobo OEMs, different EEPROM settings. Are all of these issues related to some kind of "packet of death"? Certainly not. However, are at least some of them? Almost certainly, even if they're not vulnerable to my (extremely specific) "packet of death". We still don't know exactly why this is happening (even in my extremely specific case).
Company from Taipei flashes some Intel equipment, then it appears to function correctly, but can be bricked remotely with a specially crafted incoming packet.
I think you're reading way more into this, than there is to it.
Taiwan (Republic of China) is by the way, basically it's own country with it's own leadership and currency. I find it somewhat hard to put China (People's Republic of China) and Taiwan (Republic of China) together.
If this were a DoS backdoor, it would've not been that much harder to make it less discoverable. Just use two magic bytes, or three. The chance of false positive are virtually zero and yet you'd still be able to use basic ICMP/ping to trigger it if needed.
Firmware images usually have checksums. Was this an Intel blob suffering from bitrot, or does Intel have some more or less error prone way to build your own FW images for NICs?
I suspect NICs these days are tiny computers in their own right. As a motherboard manufacturer, you can probably program them to do all sorts of nifty, with the possible downside of strange things happening if you get it wrong.
It is not the firmware that was corrupted. It is the EEPROM that was incorrectly programmed.
The EEPROM is typically 4kB on the 82754. When it is reprogrammed either by the end-user (eg. via ethtool(1)) or by the manufacturer, the programming procedure recomputes a checksum on the first 128 bytes IIRC (when reprogramming via ethtool, the kernel driver e1000e is responisble for automatically updating the checksum.
So all in all, no, the packet of death issue was not caused by bitrot.
noonespecial|13 years ago
http://blog.krisk.org/2013/02/packets-of-death-update.html
I used to use "Lanner" gear for voip and these had embedded intel ethernets. I don't have any more of them to test, but I swear I've seen it on them as well. We suspected power supply problems because the link lights would just go dark every once in a blue moon and need a power cycle to set right, but then we were never be able to reproduce it.
jessaustin|13 years ago
The market demands controllers with flexible and expandable functionality. Board manufacturers use the EEPROM to specify exactly what behavior is required. If a particular manufacturer underestimates the importance of correctness and doesn't perform the code review and testing necessary to prevent a PoD, that isn't Intel's fault.
gonzo|13 years ago
GiHe|13 years ago
I have a plurality of systems with Intel motherboards which demonstrate the same kind of problems. The motherboards in question have two Intel ethernet controllers, one of which is an 82574L.
The systems connect to two different networks. When the systems attach to one of the networks (but not the other) using the 82574L interface (but not the other), that interface dies after some unpredictable amount of time.
I have tried posting comments to the Intel engineer's blog post (and PM-ing the engineer directly), but they do not appear. In fact, there seem to be no comments at Intel's site, despite the post having nearly 6000 views (at my time of writing).
Something is not right here.
kkielhofner|13 years ago
As I say in my updated post, this is a complex issue with clear combinatorial factors. More than likely it's not limited to one chip, one packet, or one EEPROM configuration. A quick reading of the web shows various unexplained issues with this family of Intel ethernet controllers randomly exhibiting the exact behavior I've described. Different controllers, different mobo OEMs, different EEPROM settings. Are all of these issues related to some kind of "packet of death"? Certainly not. However, are at least some of them? Almost certainly, even if they're not vulnerable to my (extremely specific) "packet of death". We still don't know exactly why this is happening (even in my extremely specific case).
kkielhofner|13 years ago
Would you be able to e-mail me (CAPTCHA here):
http://tinyurl.com/66srzt
I'd like to discuss your issue further. Thanks!
brownbat|13 years ago
Company has US branch that's a government contractor: http://government-contractor.bizdirlib.com/ceo/Synertron_Tec...
Charming.
ersii|13 years ago
Taiwan (Republic of China) is by the way, basically it's own country with it's own leadership and currency. I find it somewhat hard to put China (People's Republic of China) and Taiwan (Republic of China) together.
eps|13 years ago
fulafel|13 years ago
noonespecial|13 years ago
mrb|13 years ago
The EEPROM is typically 4kB on the 82754. When it is reprogrammed either by the end-user (eg. via ethtool(1)) or by the manufacturer, the programming procedure recomputes a checksum on the first 128 bytes IIRC (when reprogramming via ethtool, the kernel driver e1000e is responisble for automatically updating the checksum.
So all in all, no, the packet of death issue was not caused by bitrot.
Maci|13 years ago