top | item 8066545

Count to ten when a plane goes down

1112 points| UltraMagnus | 11 years ago |johncbeck.tumblr.com

273 comments

order

Some comments were deferred for faster rendering.

noisy_boy|11 years ago

Firstly, firing the intern doesn't make sense - it was a mistake waiting to happen and he just happened to do it at the wrong time.

Secondly, the punishment meted out should be:

1. Proportional to the degree of carelessness (in this case not that much since he accidentally hit a wrong key adjacent to the right one, didn't mow down anybody while driving drunk)

2. Inversely proportional to the likelihood of the error (in this case the likelihood was very high since the reset key was a. uncovered/single-press b. right next to single reset key).

3. Proportional to intention (this was a completely unintentional error)

If you say, that the punishment should also be dependent on the degree of damage, I would say that the responsibility of managing the risk of such damage wasn't his but of the person responsible for implementing such a high risk design. If such a person is not around, find the person who approved such a design. Government departments are usually very good with paper trail.

jaredandrews|11 years ago

The author has actually responded to this on his blog: http://johncbeck.tumblr.com/post/92502108047/so-what-did-you....

    So what did you do after you got fired from the embassy?

    What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)
So I guess it wasn't a big deal after all.

zeidrich|11 years ago

Punishment is not a good way to correct behavior.

Your number 2 is particularly wrong. For punishment to work in affecting behavior, you can't punish for a very unlikely event, especially accidental.

Punishment changes behavior by making people anxious and afraid of the punishment. If you punish something that's very unlikely, it does no good. It's like if pushing ctrl-F restarted the stations, and the guy has never pushed ctrl-F before, and never been punished for it. It's very unlikely that he would ever push ctrl-F. But he happens to trip getting up and accidentally hit ctrl-F while he's catching his balance. Does that warrant a heavier punishment? It's more unlikely, certainly, but what would the punishment change about his behavior?

Punishment works because you are afraid of it. It works because you want to avoid it. But accidents don't happen because of defiance or a rational decision making process.

If you were punished moderately and frequently for a common mistake like hitting F7, it could correct behavior because you would be more vigilant when hitting F6. Having it be proportional to the degree of carelessness in terms of correcting behavior is not important. If someone is more careless, they will get more frequent punishment. If the punishment is too strong, it will just make people fearful instead of correcting the behavior.

Firing someone who consistently makes mistakes is a corrective action, not punitive.

Punishment is generally more of a cultural thing and less of a means of correcting an issue. Punishment is expected, so it's delivered. In western culture we have a particular need to find someone responsible and punish them. Rarely though do you feel "I don't want to get punished, so I am going to do this right." but it's not uncommon to think "I don't want to get punished so I'll avoid this altogether."

Corrective behavior is better when it's not punitive. Look at the design of the software, correct that problem. Look at the systems that allowed this to happen, correct them. Work with the staff and find out why this could happen, help them correct it. If people are punished for writing the software poorly, they're just going to cover up the flaws that they find instead of bringing them to light to correct them. If staff are punished for making mistakes, they're going to hide them instead of seeing if they can fix them.

Punishment is often just a game to abdicate responsibility. "Oh, it wasn't my fault. It was his fault. The proof that it is his fault is that he got punished for it. I've done my part to solve this problem."

Especially in complex environments like corporations and government, I think that the last thing you should do is look for a person to blame. Instead of looking for the person responsible for implementing the design, or the person who approved it. Look at why it was implemented, how it was approved. Instead of pinning it on an individual, pin it on a system.

I think you should only look at an individual if they are committing malfeasance for the purpose of benefiting themselves outside of the system. If the person approved the design because they weren't aware of the potential risk, then find out why. If they approved it because there was supposed to be another safeguard to stop it from accidentally happening, find out why that wasn't there. If they approved it because they gave the contract to their friend who wasn't the best decision, and overlooked issues for a cut, then go ahead and blame them.

If there's a problem with the person, say the designer was just irreconcilably bad, then remove him. If it's a problem with training, then train him. If it was something he did as a greenhorn in the past, and now he's much better, then for God's sake don't punish him for a mistake he made years ago when he was put into a project that was more important than the skills he was hired with, unless he grossly lied about his skills.

nathanb|11 years ago

I think you're wrong to ignore the consequences of the actions as an input to the punishment. A small amount of unintentional carelessness that causes huge damage could still be punished. One could argue that a certain degree of mindfulness in critical situations is a job requirement, and casually making a careless -- though unintended -- mistake demonstrates a lack of mindfulness indicating that the person is not properly qualified for the job.

I understand that we don't want a culture that fires people for making the sort of mistake anyone might make. But to be so careless on a day that is clearly an exception where something more important than standard business procedure is going on, can't you at least see why firing the intern for such a lack of mindfulness might at least make sense, even if you disagree with it?

I interned for a government organization that maintains hydroelectric dams and the software that controls them throughout the Southeastern US. A careless mistake could -- in the worst case -- cause blackouts, cost the company millions of dollars, or even cost lives (if the data-control feedback loop caused a turbine to spin up at the wrong time or to fail to shut off in an emergency). And, as is quite common in organizations with non-software-engineers running the show, the development processes were entirely haphazard. The environment was such that it would be really easy for me to push unreviewed code, or to make a stupid deployment mistake, or to be careless in a number of ways that the system didn't protect me against.

But it was OK, because they hired smart, competent people who understand the need to triple-check, if necessary, before committing. People who understood the gravity of the situation, and who didn't phone it in if they weren't feeling it that day. If I demonstrated that I wasn't one of those people, I would fully expect to be fired.

acdha|11 years ago

He's used a longer version of that story to view this as a management lesson – it's definitely not just “blame the intern”:

http://globis.jp/774-2

mikegreco|11 years ago

The author states they felt it was appropriate when they were fired. In what world would it be appropriate to get fired for a single, simple, incredibly easy to make mistake? Doubly insane when there were exactly zero safeguards in place to prevent the mistake from being made.

dimitar|11 years ago

According to his next post:

What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)

bonaldi|11 years ago

The world where you know you that a) you have an incredibly powerful key with no safeguards at your fingertips and b) you might be in a breaking news situation and nonetheless you go for the key right next to the dangerous one carelessly enough that you miss?

Think about it: Unix is equally as "insane". If you're the guy on the console who meant to clean out some crap dir and accidentally typoed "rm -rf /" and then caused an international crisis you're going to get fired too.

Then years later HN will call for Dennis Ritchie to get fired instead.

nerfhammer|11 years ago

He publicly embarrassed USG, POTUS and a major US ally. SK is going to call up the state dept and demand an explanation. Someone has to be fired. This isn't some startup in California where everyone just plays it cool. The termination of his boss and his boss's boss and his boss's boss's boss all the way up were probably considered as well.

pfisch|11 years ago

He was probably fired because they realized they couldn't put a summer intern in control of such a critical system. I would make the same call.

dm2|11 years ago

I do agree with you.

An equally sufficient solution would have been to install a safety switch on any button with that much importance. Something like this but probably smaller, or just a plastic cover that fit over the F7 key: http://www.thinkgeek.com/product/15a5/

A fireable offense would be lying about the action or trying to cover it up.

Should the lady who asked for the reset be fired for playing a game and asking him to reset the computer?

Should the technician that didn't install some sort of safety be fired for not foreseeing this issue?

He would be much less likely to make the same mistake in the future than the person who would replace him.

If there are terminals that could erase a presidential report and there is no backup available, you send a non-critical staff member to guard every one of those terminals, or at least put a sticky-note in the middle of the monitor.

I'd say several other people deserved to be fired for this, but the intern was not one of them.

mseebach|11 years ago

Appropriate or not, it's probably what was to be expected in what I imagine even in 1981 was not the most enlightened HR management regime (the US foreign service). Also, 32 years is a lot of time to wash away the bitterness of having been unfairly fired from a summer job, especially if you, as the author, ended up doing pretty well for yourself.

Also, whether or not it was appropriate is completely irrelevant to the story being told.

ufmace|11 years ago

On the other hand, they're hiring people to do these things, not computer programs. Shouldn't we expect them to notice that when doing routine-thing-x, it's awfully easy to accidentally do catastrophically-dangerous-thing-y, and thus it would be a very good idea to be extremely slow and deliberate when doing routine-thing-x?

I have to routinely create and drop databases on my local system. Our production databases, which I also have to connect to, contain hundreds, maybe thousands, of person-months of work. I realized that it would be a good idea, before issuing DROP DATABASE commands, to deliberately stop and double-check what server I'm connected to. Luckily, I haven't screwed that one up yet.

logfromblammo|11 years ago

I sure am glad that I never accidentally pushed an unlabeled and unprotected "get fired immediately" button. If it is important to not have all the workstations on site shut down at once, go to the control terminal and disconnect the keyboard before the system startup employee comes in without any clue as to what is going on and starts his ordinary daily routine. Maybe write a note and wedge it into his keys?

Based solely on the shortened account, it was not appropriate at all to fire him. Convincing him that it was is just doubly inappropriate. There may be more to the story, but as it is, it looks like angry scapegoating against a hapless, lowest-level employee.

rrss1122|11 years ago

When so many people higher up are given incomplete information or even downright embarrassed on the world stage because of one simple mistake, I feel it is appropriate. It is still, however, doubly insane that one person's stray keystroke can do all of that.

sergiotapia|11 years ago

Insane, but somebody had to pay for that screw up and you can bet your ass it wasn't going to be the guy managing the newbie 23 year old. It'll be the newbie 23 year old himself.

ProAm|11 years ago

attention to detail is a more valuable skill than people realize

chengiz|11 years ago

IMO cleary he is not telling us the entire story.

njharman|11 years ago

In the world of bureaucratic need for scapegoats.

endtime|11 years ago

> In what world

Japan, I guess? I've never been there but the story was consistent with my impression of their work culture.

mathattack|11 years ago

Lots of fields are like that. You generally get paid better because of the risk. (I hope he was!)

ck2|11 years ago

How about when Russia returned the data recorders after years of refusing to South Korea - made a press spectacle of it - and then South Korea discovered the recorders were empty and missing the data tapes when the press was gone.

Or the US navy crew who received medals after shooting down the Iranian airline.

Once there is loss of life, it is 100% politics afterwards with little to no practicality, just look at all the mass shootings where there were zero changes afterwards. We simply do not value life, it is politics first.

nness|11 years ago

> Or the US navy crew who received medals after shooting down the Iranian airline.

You make it seem like they received the medal for having shot down the plane. In reality, those who were awarded medals, were awarded Tour of Duty medals for their time spend in a combat zone. I believe the distinction is important, particularly since that class of medals are routinely awarded to individuals during their time in the military.

jackschultz|11 years ago

The issue form most new stories is that even when the truth comes out, the great majority of people will never hear the actual facts. One issue is because news stations move on from caring about the story quickly. Or, the bigger issue in my opinion, is that people won't believe the new, correct facts since the old ones will have been engrained in their head. Solving both these issues would be really helpful for society, but are obviously damn hard to solve since we haven't really gotten anywhere in this space.

Aardwolf|11 years ago

When there is such chaotic news story, I usually switch from news to Wikipedia. That has all the facts and continues the story even after all media lost interest.

justizin|11 years ago

It's a sign of poor management that someone has to be fired when something goes wrong, outages are learning situations for all involved, and it is widely held that the person who took the action that caused an outage is not responsible, but that all involved are responsible.

See John Allspaw's Swiss Cheese Theory : http://www.kitchensoap.com/2012/02/10/each-necessary-but-onl... .

[ Edit: I guess it's not Allspaw's model, but he applies it to systems engineering rather well - http://en.wikipedia.org/wiki/Swiss_cheese_model ]

"Accidents emerge from a confluence of conditions and occurrences that are usually associated with the pursuit of success, but in this combination—each necessary but only jointly sufficient—able to trigger failure instead."

The person who pushed the button is not at fault, the manager is not at fault, the guy who designed the button is not at fault - all are jointly responsible.

Blaming the intern does, however, reflect extremely poorly on Itoh and everyone else in the chain of command. A superior who demands retribution for a simple mistake that happened to cause him or her pain is basically worthless.

But, I forget, we're talking about Ronald Reagan.

tokenadult|11 years ago

I remember this time sequence very well because I was living in Taiwan when the incident happened. Yes, people who lived in east Asian time zones saw news reports that appeared to be based on knowledgeable sources that the plane might have landed safely with all passengers alive. This explanation of why the Western-aligned diplomats and military officials based in east Asia didn't have complete information when they were interviewed by the press is quite interesting, and explains puzzling memories I have from that day.

jere|11 years ago

>And let’s hope that there is no stupid 23-year-old with his finger on an important keyboard in this information chain.

No. This is something you would read in Design of Everyday Things where Don Norman would totally shame the the engineers who made that system. Software shouldn't be designed with the assumption that no one makes errors.

ak39|11 years ago

What I find incredible to believe is that this problem could have happened without the F7 erroneous keystroke by a human. A simple power outage could have resulted in this exact same catastrophe.

Why didn't the backups work? System wasn't "robust" enough. (Did I just use the word "robust"?)

tootie|11 years ago

Alan Cooper's About Face is a great book on interaction design for computers and one of his axioms is "Hide the escape lever". Basically make sure the ejector seat control isn't right next to the throttle.

kosei|11 years ago

Really brave of the author to share this story. I know most people would be afraid to admit this kind of a public "mistake".

abcd_f|11 years ago

> That Korean announcement and the slow response by the US President — both caused by delayed real information — caused decades of conspiracy theories.

I appreciate that the OP was a part of the situation, but conspiracy theories were not caused by this.

It was time of very high tension between the US and Soviet Union. So when a plane veers off the course into not just Soviet airspace, but into an explicitly cordoned off top secret area, ignores all communication attempts, ignores the presence of fighter jets and just keeps on flying, then the situation itself is a fertile soil for conspiracy theories.

brudgers|11 years ago

'It was a time of very high tension' doesn't quite capture how different it was.

Through the glass of a yellow newspaper box, the Miami News headline that the Soviets had shot down a plane carrying a Congressman. My first thought was "This is the war." Not 'a' but 'the'. The primary stance of the US military was squared off against the USSR and had been for more than 30 years.

steven2012|11 years ago

Wow. I got goosebumps when I read that article. I'm old enough to actually remember when KAL 007 was shot down, and while I wasn't old enough to hear about the conspiracy theories, I do remember the thing about people being safe and landing in Russia. To think that this was just a small mistake on the part of someone, which caused international ripple effect, and who later blogged about it is really something incredible.

userbinator|11 years ago

"With great power comes great responsibility."

Incidentally, "features" like this are why I don't trust systems that have some centralised control - IMHO giving any one individual (or organisation, in many cases these days) such power over others is not a good thing.

disputin|11 years ago

Scapegoat. The ritual expulsion of the evil spirits wrapped neatly in a little parcel to appease the elders and thereby prevent them blaming each other - harmony continues in the hall of power. Meanwhile the problem was in the process, not the employee, so nothing has been fixed, and the guy who had learned the lesson is no longer there, and so the problem will recur with the next lamb to the slaughter.

ryanobjc|11 years ago

I disagree that it was appropriate that you were fired, but interesting story all about.

kysol|11 years ago

For security reasons I think that it might have been justified, considering the events that had just taken place. Still a crappy way to go out though.

sitinaud|11 years ago

That fact that he wasn't too concerned about having accidentally reset all the computers in the building suggests that he may not have had an appropriate temperament/attitude for a sysadmin managing critical systems.

lotsofmangos|11 years ago

I often suspect that most of the work involved in keeping a power hierarchy going, is involved with trying to pretend that this kind of shit doesn't happen all the time.

lamontcg|11 years ago

And then conspiracy theorists latch onto this kind of shit, but believe that it must be malicious silliness...

Why didn't Reagan respond immediately? Well, he was waiting to hear from Chancellor Gorkon that the KAL flight had been successfully beamed aboard and was en route to Pluto, of course... Clearly they'd have their shit together better so it couldn't have been a 23-year old rebooting all the computers accidentally and wiping out hours of critical work -- that would just be ridiculous...

privatedan|11 years ago

My first thought after reading the article was that it was ridiculous to fire/scapegoat the author for hitting the wrong key, too. This has happened to me before, where a single keystroke ( in my case, a line break in a config file ) caused me to take down a production system. My punishment? Designing a more robust system that would protect itself from a badly formatted config file. To this day, ten years later, a similar error has not been repeated, despite several attempts of people to push bad config files to our production systems. If I had been instead fired, no doubt a similar, but perhaps not exact, error would have been repeated every year or so.

If I had made the same mistake twice without any attempts to fix the situation long term, then, yes, I think that would have been a fire-able offense.

If you're working with people who care primarily about their own positions and egos without regard to the team as a whole, well, be prepared to be thrown under the bus when it comes time for those people to protect themselves.

joewaltman|11 years ago

Great story....thanks for your willingness to share.

peterwwillis|11 years ago

> On this day, I highlighted her workstation and hit the F6 key to reset. But my screen went temporarily black and then seemed to be starting again. I realized that I had mistakenly hit F7 and reset all the workstations in the embassy.

Ugh.

Those with automation capabilities: keep this lesson in mind, because it will happen to you in production one day. 'dsh -a reboot' is incredibly easy to type and can have disastrous effects. Creating abstraction layers around common admin tasks can help catch simple mistakes and give prompts before dangerous behavior.

jheriko|11 years ago

I hope you fought that firing...

... incompetence like that comes from having F6 next to F7 and no checks or authorisation needed for a potentially dangerous action etc. Processes should be designed for people to make the common mistakes... its what they do.

jheriko|11 years ago

nm. just seen the follow up. :)

hyperliner|11 years ago

"My boss, a >> Japanese << computer engineer named Itoh, poked his head in the door. "

hmmm, I am pretty sure Mr. Itoh was not Japanese working in the American embassy. I am pretty sure he was American.

billmalarky|11 years ago

If he was a first gen American his culture would have been greatly shaped by Japanese culture.

joshuaheard|11 years ago

I thought the headline meant to count to ten when a plane goes down...while you are in it!

instakill|11 years ago

Me too. I still don't know how counting to 10 will help you press the right key on your keyboard though.

feld|11 years ago

Reset all computers in the embassy with F7? No warning prompt?

Fire the idiot who wrote that function.

smacktoward|11 years ago

In fairness, it was a different world back then. There were so few people administering computer networks that you could generally assume someone who was doing so had been thoroughly trained; and the thing about highly trained people is that they tend to view things like failsafes and safeties as pointless time-wasters.

"I know what I'm doing when I hit F7, but the damn system makes me sit there for 30 seconds before it does what I told it to do! Piece of junk."

The result was that software in that era tended to come with a lot more sharp edges. The age of the Recycle Bin that would save you from yourself didn't arrive until administering systems became something the general public was expected to do.

oh_sigh|11 years ago

There was probably something like a warning prompt that came up, but he may have been used to disregarding the message because it was always what he intended to do.

dade_|11 years ago

Two servers side by side. Broken KVM that only switched the monitor. The screen was Windows NT, but the keyboard in front of the monitor was connected to another server running OS/2. Ctrl-Alt-Del The Windows screen did nothing, but the sudden hard drive activity on the OS/2 server told me what I needed to know. Not thirty seconds later, I had visitors. I was around 21 at the time, so yeah... Experience.

mseebach|11 years ago

You know what's totally plausible? That hitting F6 and F7 prompted for confirmation, but using the same prompt - and he just hit "Y", "Return" as he'd done 1000 times before (just like everybody does with the UAC dialog in Windows 7 today), and that bit didn't make it into the story because it's totally irrelevant to the point he's making. Heck, it probably wasn't even F6 or F7. If he's a normal human, he likely can't remember.

hueving|11 years ago

It is possible there was a warning for both functions and the operator just acknowledged the warning assuming it was for the single reset without reading it. This is a common problem with warning too often in interfaces.

xophe|11 years ago

You must have missed the main point of this essay, so I'll reiterate here.

"take a deep breath, count to ten"

mullingitover|11 years ago

Stole the words right out of my keyboard. Who puts the 'reset every workstation in the building' button right next to the 'reset just one workstation' button with no confirmation prompt?

raverbashing|11 years ago

I don't remember how it is on the newer versions, but try using fdisk in an old linux distro and see how many confirmations you get when deleting a partition.

Or just the old rm -rf /

Older stuff asks for much less confirmations.

michaelneale|11 years ago

That was one my my reactions too (yeah it was a different age). My other reaction was about the total loss due to a restart - in what could have easily happened by faults in many other systems - presumably there was no way to save data as work went no, no journaling or anything. Indeed this was a different age.

discardorama|11 years ago

It's easy to have this attitude now. But if you ask people who were in IT 30 years ago, they'll tell you that systems in those days had _many_ sharp edges. You were expected to know your way around, and the consequences of mistakes were pretty severe.

kysol|11 years ago

My thoughts exactly. With that sort of "global" function, you would expect some sort of countdown timer prompt on each terminal that could be cancelled.

JustSomeNobody|11 years ago

Think about the resource constraints of systems back then.

SonicSoul|11 years ago

agreed 100% (if there truly wasn't a prompt). OP took the fall because the head administrator was too ashamed to admit that their system was so poorly designed to allow for this to happen.