top | item 29213064

The case of the 500-mile email (2002)

368 points| thunderbong | 4 years ago |ibiblio.org | reply

93 comments

order
[+] PhilRodgers|4 years ago|reply
On a similar theme, I remember reading a story about a server that would crash mysteriously every couple of weeks. They eventually worked out that this happened whenever there was a new moon or a full moon, and the resulting high tide caused a battleship moored in a nearby harbour to rise just high enough that its powerful radar would interfere with the server.
[+] gabriel_fishman|4 years ago|reply
On a much smaller scale, I once worked for a wireless ISP. We had a customer who called in late September saying her service had been out for a few weeks. I went to her house and discovered that she was in a wheelchair and couldn't reach the controls for her air conditioner, so she was turning it on and off using the on/off switch on a power strip that was sitting on a desk. Her router was plugged into the same power strip. So as soon as the weather got cool enough to not need the AC, she lost her internet.
[+] Thlom|4 years ago|reply
I once experienced a moored ship whose satellite Internet was extremely unreliable. It worked for 2 seconds and then it stopped working for two seconds, over and over. Only time it worked reliably was when there were no wind at all. After checking satellite images and corresponding to antennas pointing angle and ships position we eventually figured out that the antenna was pointing directly towards a wind mill. So when the rotor was turning it was intermittently blocking the signal between the satellite and the antenna. Luckily they were able to move the ship 50 meters forward and magically the Internet started working again.
[+] hinkley|4 years ago|reply
I recall in the early days of WiFi, the advice around ops circles was that if you were trying to bridge two buildings using wireless, you had to set it up in the summer, not the winter. Because the water in the leaves of deciduous trees is enough to attenuate the signal. So now you've gone from "it's working" to "we have to start over," or worse, "yeah we can't actually do this."
[+] moepstar|4 years ago|reply
Heh, i remember a story of a server somewhere in train-station in Russia (iirc) - that'd sometimes spontaneously reboot...

Turns out, this always happens once a train with radioactive waste on it passes by - causing a few bits to flip in memory and a subsequent crash...

Can't seem to find the story online tho...

[+] mrtksn|4 years ago|reply
There must be a site with all of these stories but could not find it right now. The story about the car that would break down if you buy vanilla Ice Cream is one of my favorites. There’s also the story about the switch that is not connected anywhere but crashes the server every single time.
[+] js2|4 years ago|reply
> "When did this start? A few days ago, you said, but did anything change in your systems at that time?"

> "Well, the consultant came in and patched our server and rebooted it.

> Having established that--unbelievably--the problem as reported was true, and repeatable

As far as problems go, this is an easy one to solve. Accurate description of the problem. Accurate reporting of what changed. Problem is consistently repeatable.

Compared to a problem that doesn’t reliably reproduce and for which the person reporting it claims nothing changed, this one is child’s play.

But it’s still amusing to read every time.

I had something similar occur in the early 2000s on a patch release of Solaris 2.6 (I think) where the sleep call was broken and would always return almost instantly. This caused all sorts of weird behaviors on the running system.

I also recall the first time I ran into an issue with MTU on a dedicated frame relay link we had to admin our web farm in the late nineties. One day a developer reported they could login to our bastion but when they ran “ls -l” in a big enough directory their ssh connection would hang. It turned out the connection would hang whenever a packet was generated near the MTU and we eventually tracked it down to an issue with the frame relay connection. We played with MTUs until we found out what worked. It then took a while to convince our provider what was going on but they eventually replaced a line card on the far end of the connection which allowed us to re-raise the MTU to 1500.

Another fun problem I had was an email to SMS gateway I wrote for myself that worked by posting to a Verizon web form. I developed it on a Mac (probably 2002 or so) where it worked fine but when I deployed it to my Linux box on my same home network, the script couldn’t connect to Verizon’s site. It turned out the Linux box was a setting a flag on the TCP connection (ECN I think) that was tripping up Verizon’s web firewall. The work-around was disabling ECN on the Linux box.

[+] ethbr0|4 years ago|reply
> Compared to a problem that doesn’t reliably reproduce and for which the person reporting it claims nothing changed, this one is child’s play.

You forgot +inaccurate reporting of the problem.

We had an employee in IT at a client who would tell us "It's broken." This went on, for every report, for three years, with us asking the same follow-up questions every time. For who, in what way, when doing what, what changed, etc.

As far as I know, that individual still works there.

People like that are the best argument for why a basic income and removing some folks from the workforce would increase efficiency.

[+] kingcharles|4 years ago|reply
In the mid-90s I used to repair PCs. Customer brought PC in where left mouse button did not function.

Easy. Replace mouse. NOPE.

OK, software issue. Reinstall mouse driver. NOPE.

OK, deeper software issue. Replace HDD from working PC. NOPE.

OK, replace RAM? NOPE.

OK, replace motherboard and all add-in cards. NOPE.

At this point we have a different HDD, motherboard, CPU, RAM, video card and mouse. Still left mouse button doesn't work. Mouse moves fine. Right button works.

Only thing left is the case and the PSU.

Replace PSU. Left mouse button works perfectly.

FML.

[+] lostgame|4 years ago|reply
This is just such a classic story describing the Murphy’s Law of working with computers. This gave me a chuckle.

Solving programming problems is sometimes similar.

[+] glitchc|4 years ago|reply
Wha? Was it a PS/2 mouse? x86 system? More details please.
[+] ww520|4 years ago|reply
This reminds me a problem I'm currently having. My iPhone freezes completely sometimes when I ride BART, requiring a hard reboot. I notice it happens when passing the Daly City station. It seems there's a strong signal tower nearby that the strong signal causes the problem. It's probably the strength level read by the hardware causing an out of bound error somewhere and corrupting the phone's memory.
[+] not1ofU|4 years ago|reply
Do you have an IMSI Catcher [0] detection app on your phone? I used to have the same issue (EU country), using Metro. One single stop which was above ground and near International conference centre. Evertime I went through that staion my phone would lock up. Needed Hard hard reboot (remove battery), Until I removed the IMSI catcher detection software. After that I used in flight mode using that metro line.

Edit: rooted / android / HTC phone. [0]: https://en.wikipedia.org/wiki/IMSI-catcher

[+] jrockway|4 years ago|reply
Does this happen to anyone else? There aren't a ton of iPhone variants out there, so if it's a baseband-level defect, it would be happening a lot.

I'll also say, if it's just the signal strength being too high, it seems unlikely that would cause memory corruption. The signal strength is probably just an integer, and there aren't any operations defined on integers that involve using other bytes of memory. (If you have an uint8 and add 1 to 255, you just get 0; it doesn't upgrade the integer to a uint16 and overwrite adjacent memory.)

[+] 3np|4 years ago|reply
Probably just someone broadcasting a new 0-day
[+] geoffmunn|4 years ago|reply
At a very large bank here in Australia & NZ, all XML messages going through the main message bus had a trailing space character appended to the end, which broke XML validation on the receiving endpoint.

So the solution was for all endpoints to trim the very last character - not just if it was a space, but to chop off the last character. Apparently this had been the solution for years.

This worked really well until one day someone (probably a new grad) saw the character issue and figured they'd fix it.

A bank-wide P1 incident occurred because every single XML message was now unparsable due to the malformed closing '</xml ' tag. Every single application in the bank had to do an emergency update on its XML parser.

[+] lqet|4 years ago|reply
Isn't XML with trailing whitespace still valid XML?
[+] potamic|4 years ago|reply
Why didn't they just rollback the fix instead?
[+] dang|4 years ago|reply
Past related threads (less than I expected given how often it has been reposted):

We can't send email more than 500 miles (2002) - https://news.ycombinator.com/item?id=23775404 - July 2020 (135 comments) (<-- thanks ayewo for finding this!)

The case of the 500-mile email (2002) - https://news.ycombinator.com/item?id=14676835 - July 2017 (56 comments)

Every time we lift a pallet from the shipping room, the server times out (2006) - https://news.ycombinator.com/item?id=13347058 - Jan 2017 (82 comments)

The case of the 500-mile email - https://news.ycombinator.com/item?id=10305377 - Sept 2015 (1 comment)

The 500-mile email (2002) - https://news.ycombinator.com/item?id=9338708 - April 2015 (139 comments)

The case of the 500-mile email - https://news.ycombinator.com/item?id=1293652 - April 2010 (24 comments)

The case of the 500-mile email - https://news.ycombinator.com/item?id=385068 - Dec 2008 (28 comments)

The case of the 500-mile email - https://news.ycombinator.com/item?id=123489 - Feb 2008 (7 comments)

[+] zenexer|4 years ago|reply
I know this gets posted quite often, but I still enjoy reading it every time.
[+] abeppu|4 years ago|reply
I would enjoy seeing a "greatest hits" list of pages that are repeatedly submitted and discussed here.
[+] trollied|4 years ago|reply
I think I read about this for the first time on a dialup BBS in the 90s :)
[+] thot_experiment|4 years ago|reply
A classic and wonderful piece of internet lore. If I ever have kids this is one of the ones I'll be telling around the campfire. The one about the internet going down because the delivery truck blocked LoS is a good one too.
[+] abalaji|4 years ago|reply
No matter how many times this gets posted, I make sure to read it. Such a good story, especially the ending using oft unused unix tools
[+] post-it|4 years ago|reply
It's a lot like the SR-71 speed check story for me.
[+] jancsika|4 years ago|reply
Could you build this into a protocol?

Like an ssh setting that only allows incoming connections that can prove (well, suggest) their proximity by a series of latency tests?

[+] rtkwe|4 years ago|reply
You could but it's much easier to get servers in a near-by area with AWS and other easy virtual hosting providers.
[+] netflixandkill|4 years ago|reply
There's no reason you couldn't, but distance is not the only source of latency, so you're unlikely to find an existing case of someone doing that intentionally.

Easy enough to whitelist geo-ip matches or large net block ranges for a similar result.

[+] betaby|4 years ago|reply
You can set a TTL limit in the kernel. Close enough to a latency limit.
[+] dataflow|4 years ago|reply
I think DRM would be a much better use case than security (SSH).
[+] eb0la|4 years ago|reply
It happened to me, too. Back in 1995 I was in charge of the Sparc server that handled email. I got a call telling me the we couldn't send mails outside Spain. Back then, we had a slow internet connection (128K if I remember well) and sometimes the academic network had issues speaking with the outside world. Two days later we had more complaints. This time it couldn't be a network issue. We had the same problem: one OS upgrade made sendmail use a default config, not ours. Fortunately mail didn't bounce, and after the fix the server was above 20 load average for two days.

Good news was no spam came that week.

[+] tempestn|4 years ago|reply
Less of a crazy bug than a funny one: I had a friend named Peter March. When his pay check fell on April 1st and was made out to Peter April he obviously thought it was an April Fools joke. It wasn't.
[+] hoppla|4 years ago|reply
I worked for a company where a proxy server was used for all internet access. Every now and then a pages would not load. Error logs pointed me to the usual culprit - dns. When looking at dns traffic in tcpdump everything looked normal, except some dns replies came from rfc1918 addresses instead from the dns servers public IP address. When I talked to the ISP, they blamed me (the proxy) for reusing UDP sockets, and it was by design that their load balancer would only support one DNS request at a time per UDP socket. If there were two or more in flight, only the first response would be NATed properly. Luckily, I knew that the ISP used the same proxy product internally, so when I asked how they configured their proxy to avoid this issue, they fixed the load balancer within the hour
[+] ronzensci|4 years ago|reply
We have a banking website which refuses to login when I connect on the 5G Wifi but allows me to login when I connect on the regular 802.11 WiFi (non-5G mode). How does the website login know which WiFi speed am I connecting on?
[+] 3np|4 years ago|reply
Could it be that one of the networks assign IPv4 or IPv6 and the other doesn't, and you therefore end up hitting different IPs?
[+] hoppla|4 years ago|reply
Perhaps if they try to call JavaScript functions that are not available yet… I would check the developer console for errors
[+] beebeepka|4 years ago|reply
Back in the day, I had a Nokia E71 smartphone that I used to keep next to my work provided laptop.

My laptop would freeze for a couple of seconds right before each incoming call. Every single time.

It wasn't all that baffling to me so I decided to test the thing while placing the phone on top of my huge desktop tower. My over clocked computer simply restarted itself instead of freezing. Props to Lenovo, I guess

[+] zoomablemind|4 years ago|reply
Anyone experienced an old VisualStudio (was it with VC6 still) bug, where the compiler would flag the last line of the source file as an error, when it did not terminate with a CR/LF newline? All code would clearly look correct in the editor, yet could not be built.
[+] tardismechanic|4 years ago|reply
In all the best ways possible, this reads like an Asimov short story.

Sigh, I miss him so much...

[+] IncRnd|4 years ago|reply
While an extremely interesting read, I've read this about once a month on HN. The story keeps getting posted.