top | item 39280756

Alaska Airlines flight 1282 NTSB preliminary report [pdf]

266 points| tomalpha | 2 years ago |ntsb.gov

266 comments

order

gtmitchell|2 years ago

A very thorough preliminary report. I've worked for a long time in quality systems, and this is a perfect example of a systemic failure. They've got work being handed off between Boeing employees and 3rd party contractors with insufficient controls in place to verify that very basic tasks are being performed.

I'd be curious to know how many non-conformances they typically see during assembly of a plane and whether management is actually allowing the quality department sufficient independence to investigate these issues and fully resolve them. I'm guessing that the production personnel are under tremendous time constraints and are constantly pressure the quality assurance people to sign off on whatever paperwork is holding up the line, no matter the safety implications.

Also, I think a lot of middle and upper level management needs to lose their jobs over this. I hope this mess ends up in textbooks and gets beaten into the head of every MBA student in the country.

onetimeuse92304|2 years ago

> I'd be curious to know how many non-conformances they typically see during assembly of a plane (...)

Very likely that number is meaningless. I suspect this is the kind of environment that incentivises hiding non-conformances whenever possible.

For example, better quality control usually results in an increase of number of defects, at least temporarily. But that just because large portion of these defects were undetected before.

So... you are looking at a number that you have nothing to compare to that also depends on how closely the process is monitored and also depends a lot on the definition of what is non-conformance.

It is like trying to give an answer to "what is the length of Britain's coastline?" Everybody knows that you can get whatever answer you want depending on how long the ruler is.

wmidwestranger|2 years ago

> Also, I think a lot of middle and upper level management needs to lose their jobs over this.

Given that if a worker doing the work raised this problem or took initiative to resolve it, they'ld probably be punished; I completely agree.

Reminds me so much of how, once-upon-a-time, there seemed to be actual engineering management in a cooperatively adversarial relationship with business managers but not anymore. Now any sort of engineering in business seems to be completely business managed and business minded. I'm sure it's great for profits while it lasts but I haven't observed engineering becoming better and I suspect business is suffering by overextending itself too, I just don't have any solid observations.

(Well, maybe one, my brother does warehouse / logistics management and says, despite there being every reason in the world, he has never seen the accounting software and the inventory software successfully and productively linked. So, big opportunity there for a serious player but maybe not the profitable compared to the issue?)

shadowgovt|2 years ago

Rumor has it the controls are there, but subvert-able.

Apparently, there are two ticketing systems (one for "history of plane," Boeing internal, and one for "day-to-day onsite work," visible by contractors and Boeing management). The work to fix the rivets was logged in the day-to-day, but management and the onsite staff managed to convince themselves that merely opening the plug to fix the vacuum-seal trim did not constitute "removing" the plug, and since there was only an entry in the history-of-plane log for removing, not opening, they didn't log it there (when the intent was "there's no entry for 'just opening' because there's no such thing as 'just opening', breaching the pressure vessel at all constitutes 'removal of plug'").

The final inspection that should have caught the error would have been triggered by the update in the history-of-plane ticketing queue.

(And as for 'how many non-conformances,' the same source claims that Spirit is one of the few subcontractors with on-site staff at the factory because their parent company delivers such consistently shoddy out-of-compliance product that they are continuously doing final warrenty-work onsite. So maybe "fire that vendor" should be on the docket too).

lp4vn|2 years ago

>I'd be curious to know [...] whether management is actually allowing the quality department sufficient independence to investigate these issues and fully resolve them

If management in the aerospace industry works like management in the software industry, then I guess they are pushing for results as agressively as possible without much concern about safety or anything else.

mihaaly|2 years ago

I have tendency to hope for unrealistic things too. Not a reliable trait of mine, no. But in my clear moments I am afraid that much sooner will come the world peace and union of all nations and religions than unimaginative but determined bean counters learn from millions of catastrophes of the past and future to come giving up pushing their core value and first rule of 'take more, give less' into the infinity and beyond. And giving up their personal wealth with it.

fransje26|2 years ago

According to the "insider source", 392 non-conforming defects in the fuselage door installation in the last 365 calendar days.

> As a result, this check job that should find minimal defects has in the past 365 calendar days recorded 392 nonconforming findings on 737 mid fuselage door installations (so both actual doors for the high density configs, and plugs like the one that blew out). That is a hideously high and very alarming number, and if our quality system on 737 was healthy, it would have stopped the line and driven the issue back to supplier after the first few instances.

Source:

https://leehamnews.com/2024/01/15/unplanned-removal-installa...

michaelt|2 years ago

> I'd be curious to know how many non-conformances they typically see during assembly of a plane

Well, the report says "During the build process, one quality notification (QN NW0002407062) was noted indicating the seal flushness was out of tolerance by 0.01 inches.

So I'd say they've had about 2,407,062 quality issues :)

hipsterstal1n|2 years ago

I was speaking with some friends at a Christmas party who work for the Navy - they’ve taken deliveries of planes from Boeing with the same sort of issues that start in the factory. They even went as far as to say the whole lot of planes should’ve been rejected but weren’t. Multiple things not built to spec.

The planes they worked on did not share an assembly line with the 737 but another Boeing model…

TurkishPoptart|2 years ago

"....after the left mid exit door (MED) plug departed the airplane leading to a rapid decompression"

Lol, they said the door plug "departed" instead of "blew the f* off"

dzdt|2 years ago

The report seems to mesh with and confirm many details of the anonymous insider account at https://leehamnews.com/2024/01/15/unplanned-removal-installa.... The bolts were not reinstalled following work on the plug rivets/seal. The official system doesn't record that work was done requiring the bolts to be removed.

lambda|2 years ago

Yep, biggest new thing here is discussion of witness marks showing no evidence of the bolts, and photo before the rivet repair showing that at least two of the bolts were present, and photo after the rivet repair during installation of the insulation showing at least 3 bolts missing.

So this all just serves to confirm that report, and what people suspected for a while; the bolts were just missing, removed for removing the door plug for the rivet rework and never reinstalled.

okdood64|2 years ago

Can someone explain to me why this door plug wasn't actually a plug that physically stays sealed from cabin pressure? That seems like a sensible failsafe?

stefan_|2 years ago

[deleted]

mcmatterson|2 years ago

The fact that a critical piece of the evidence was cell phone photos sent between workers coordinating door re-assembly doesn't exactly instill a whole lot of confidence in their permit-to-work process. I didn't like it when it was medical teams doing shift handover via a Google Doc, and I don't like it when it's a matter of flight safety either. Or, as Homer might eruditely say: "guess I forgot to put the bolts back in" [1]

[1] (https://www.youtube.com/watch?v=IiNPLIauEig)

ipython|2 years ago

This is a puzzling attitude to me. Every time we technologists see a crappy proprietary solution being used for a problem, the first exclamation is, "why not use <commodity solution X>? That's so dumb, they spent $10k on that tool when they could have spent $100 on X!"

There must be a middle ground here- the paradox is that Google, Apple, etc have this ability to generate user friendly software and hardware at scale. But they aren't considered "battle proven". The expensive proprietary systems that are used instead tend to be hard to use and brittle, so what's the middle ground?

gowings97|2 years ago

The data/photos should be in the ERP/MES.

imoverclocked|2 years ago

> The investigation continues to determine what manufacturing documents were used to authorize the opening and closing of the left MED plug during the rivet rework.

I mean, there is already a ton of documentation and process surrounding the construction of an airplane. Adding more process doesn't safety make. Having a safety culture without the fear of retaliation, on the other hand, makes a world of difference.

nostromo|2 years ago

> Overall, the observed damage patterns and absence of contact damage or deformation around holes associated with the vertical movement arrestor bolts and upper guide track bolts in the upper guide fittings, hinge fittings, and recovered aft lower hinge guide fitting indicate that the four bolts that prevent upward movement of the MED plug were missing before the MED plug moved upward off the stop pads.

Ooofff. No bolts at all! How did this pass Boeing QA?

__loam|2 years ago

MBAs prioritizing the bottom line over engineering culture.

luxuryballs|2 years ago

Also I’m kinda surprised an airline wouldn’t inspect a newly acquired plane before putting customers in it.

userbinator|2 years ago

A lot of comments here are going on about process, as if humans are mindless and otherwise perfectly controllable robots...

I'm going to be contrarian and say that this is exactly the sort of thing that happens when you train humans to be robots: They lose all signs of common sense and critical thinking, and what's worse is that on top of that, they'll still have their inherent imperfection. Normally the former would counteract the latter, but not if you only make them rigidly follow some process all the time. They stop thinking about what they're doing. They stop paying attention to all the other things in their environment they would've noticed, and even if they do, they won't question it because they'll just assume someone else also following a rigid process will take care of it. They won't think "this door plug should've been bolted in place now that the work that needed it opened is done, but where are the bolts?"

I'm not saying to throw out all the process and make them figure everything out, but I think there has to be a balance, similar to how overautomation and reliance on that has also lead to avoidable incidents in aviation.

p_l|2 years ago

Nope.

The process was there so that the people would know there was work being done on the doors despite not being there for it. If you see an unfinished work from a previous shift, it does not mean you can start messing with it - there might be context you do not know.

Which is why such things are supposed to be noted in appropriate ways. Similarly why aviation has so many procedures everywhere - because we know and understand that sometimes you miss things. For any human reason, not just mismanagement. The process is a way to have reliable place to double check with.

This is different from over reliance on automation, which is arguably less of an issue of automation itself (it's just more visible in such areas) as much as getting out of training because you do not encounter certain things so often. 96 people died because in a stream of many deviations, among other things, the crew never trained how to do IFR landing without ILS, autopilot or no autopilot.

The process is the part that says "yeah, I haven't done this in a long time, I need to train, here is documentation that provides we need to do it and can't delay".

Similarly CMES is supposed to track "work was done on this part of the ticket, now different work needs to be done, do not assume it will be done by other teams"

projektfu|2 years ago

Potentially the result when you rob workers of the right to pride in workmanship. The most common complaint from old Boeing people who have left is that after the merger the McDonnell-Douglas people took over and the company switched from pride in engineering and quality of workmanship to cost cutting and bean counting. Also, shortly after the merger the corporate HQ moved, reflecting the priorities of the CEO. It has since moved again, apparently to be better for lobbying.

dreamcompiler|2 years ago

Summary: Fuselage was delivered to Boeing with some damaged rivets near the door plug. They had to remove the door plug to fix the rivets. Then they reattached the door plug but forgot to reattach the 4 bolts that would keep it in place. Possibly because of a shift change at the plant.

There was noticeable damage to the door plug's mechanical fittings from the violence of it being blown out of the plane. But the holes where the bolts belonged were pristine. That would not have been true if the holes had had bolts in them.

aftbit|2 years ago

> The accident airplane was required to be equipped with a CVR that retained, at minimum, the last 2 hours of audio information, including flight crew communications and other sounds inside the cockpit.

>The CVR was downloaded successfully; however, it was determined that the audio from the accident flight had been overwritten. The CVR circuit breaker had not been manually deactivated after the airplane landed following the accident in time to preserve the accident flight recording.

Classic. If they use CD quality audio at 1411kbps, they can store 2 hours of audio in about 1.2 GB. Given how cheap flash is these days, why not 20x that so that we don't have to rely on people pulling circuit breakers after accidents? If there's some concern about robustness and recertification, why not require all aircraft to carry two CVRs, one of the old "robust" style for kinetic accidents, and one that's less robust but has 20x the capacity, so we can record a full day after less violent accidents?

cjbprime|2 years ago

The largest US pilots union opposes it on pilot privacy grounds. (To be clear, I think having an expectation of vocal privacy while you are in charge of an airliner is absurd.)

dmitrygr|2 years ago

  > Given how cheap flash is these days
How cheap is flash that will survive a sudden stop from 400mph to 0 mph in no seconds flat, will survive a post-crash fire, and/or submersion for years in salt water?

Flash data retention at high temps is TERRIBLE (and gets worse for MLC/TLC/etc), see any flash datasheet. It is NOT nearly as simple a problem as you might think.

Yes, it is a solvable problem, but please do not dismiss it so outright as "trivial"

hinkley|2 years ago

The piece of hardware that was chosen for the avionics-adjacent software I was working on was chosen before any software was written, which was 3 years before the plane was 'supposed' to fly, and 5 years before anyone sane expected it to be in service.

Irritatingly, they didn't even pick the top-of-the-line machine from the vendor at that time. They picked a middling one. And then put an LTS OS version on it that didn't fully support the motherboard chipset. I spent way, way too much time an energy trying to get the software to run on the sort of timescales necessary. It took me months to get anyone to let me talk to the vendor in order to sort out the fact that the storage was being run in legacy PATA mode, reducing our IO throughput by an order of magnitude and the application throughput by about a third.

Ten minutes on the phone and I got them to agree to give us a patch that aliased the chipset to one it was backward compatible with, that was actually supported by the OS. But they really wanted us to take the never version of the OS that didn't have this problem.

That's not even the most hard-ware crippled I'd ever been, but it was top three.

bronco21016|2 years ago

What’s missing from this accident investigation without the recording?

WalterBright|2 years ago

They should also have a video recorder on a 2 hour loop. Many difficult investigations would have been easy if the investigators could see what the instruments were showing and what the crew was doing. And even, who exactly was in the pilot's seat!

guardiangod|2 years ago

>[evidences] indicate that the four bolts that prevent upward movement of the MED plug were missing before the MED plug moved upward off the stop pads.

Ok

>Photos from the interior repair that show the lack of bolts

Huh. Well that's conclusive.

ezfe|2 years ago

Photos may not have been after conclusion of repair so technically not conclusive. However it certainly lines up.

WatchDog|2 years ago

> The flight crew reported that the cockpit door had opened during the depressurization event. In a revision to the Flight Crew Operations Manual, issued on January 15, 2024, Boeing confirmed that the door functioned as designed.

Interesting for terrorists. Cause a rapid decompression, and get easy access to the cockpit.

bradfa|2 years ago

Causing rapid decompression is quite hard. Opening a normal door is very difficult during flight except at very low altitudes.

mlindner|2 years ago

If they cause a rapid decompression they incapacitate themselves and won't be able to use the cockpit.

Also how do you cause a rapid decompression without a gun of some kind?

mihaaly|2 years ago

They may go then and blow the whole thing up good insted of such half measures. Sufficiently rapid decompressions may not that reliable and predictable to carry out so I can control the plane afterward easy kind of feats.

ledauphin|2 years ago

yeah, this is a significant hole in the locked cockpit door plan.

hn8305823|2 years ago

I wonder how close the door plug was to hitting the tailplane or vertical stabilizer/rudder?

reddit_clone|2 years ago

Considering all those scary scenarios, what happened was probably the most favourable outcome. It could have been a major disaster hundred different ways..

aftbit|2 years ago

Depressurization happened around 17:12:33 PST but the aircraft continued to climb until 17:13:41 PST, and the autopilot was configured for 10k ft at 17:13:56 PST. Why did it take the pilots a full minute to begin an emergency descent after the failure? I would expect that the nature of the accident would be clear nearly immediately, at least in the need to descend the aircraft.

michaelt|2 years ago

According to the plane's "memory items" [1] in response to a cabin altitude warning or rapid depressurization, pilots must:

OXYGEN MASKS - DON

OXYGEN REGULATORS - Set to 100%

CREW COMMUNICATIONS - ESTABLISH

PRESSURIZATION MODE SELECTOR - MAN AC/MAN

OUTFLOW VALVE SWITCH - CLOSE

Hold in CLOSE until outflow Valve indicates fully closed

If Pressurization is Not Controllable

PASSENGER SIGNS - ON

PASSENGER OXYGEN SWITCH - ON

EMERGENCY DESCENT - ANNOUNCE

The pilot flying will advise the cabin crew, on PA system, of impending rapid descent. The pilot monitoring will advise ATC and obtain area altimeter setting.

PASSENGERS SIGN - ON

DESCENT - INITIATE

I do giggle a little at the thought of a door flying off, the air rushing out of the cabin, and the pilots responding by switching the seatbelt light on.

The plane was only at 16,000 feet when it lost its door and according to [2] you've got 20-30 minutes of 'useful consciousness' at such an altitude, even without your oxygen mask on. So there was no need for an abrupt dive.

[1] https://www.theairlinepilots.com/forumarchive/b737/b737memor... [2] https://skybrary.aero/articles/time-useful-consciousness

rootusrootus|2 years ago

A minute is a long time when you're sitting at your computer. But after the sudden depressurization, I imagine the pilot is focused first on making sure he has complete control of the airplane, assessing the situation, running checklists. Besides, 10K is just barely above the normal pressurization altitude anyway, it doesn't pose an immediate risk to the passengers that justifies just nosediving towards the ground. Especially given how much air traffic is at lower altitudes that close to PDX.

Edit: Re-reading, it was more like 16K feet when it popped, 10K is what ATC assigned them when requested. Still low enough not to be a critical emergency. Some people absolutely will get altitude sickness at that level, but it's likely to be mild. Many people climb mountains much taller.

bombcar|2 years ago

You follow the procedure because in an emergency you don't know what is going wrong.

Better to climb for a bit more as you get your oxygen mask on than to try to descend immediately and make some problem worse.

We know it was a door plug blowing out, but in the past it has been entire major sections of the airframe ripping off, in which case sudden extra stresses are not what you want.

ogurechny|2 years ago

Let's assume that they did change the course when they were sure they can do that.

Pilots can't look at the rear view mirror, and see the whole plane. Accident reports on engine malfunctions routinely mention that someone had to check their appearance through the passenger window, and relay that to the pilots. In case of a blast so severe that the door flies off, it is safer to assume the worst. Say, that part of the plane disintegrated because of sudden collision. In such conditions, indications on what works and what doesn't are probably really messy and unreliable, and there can be not enough means to control the plane properly. Lower the nose too much, and you might not be able to pull it up any more.

Pilots probably did checklists with one eye on the instruments to check that they were not losing speed, that angles were correct, autopilot inputs resulted in stable flying, and so on, and deduced that everything still worked. By that time, they were probably informed that the plane was seemingly intact, although with a hole in its side.

bronco21016|2 years ago

Step one is to put on the oxygen mask and establish communications. After the startle factor, the masks being put on, then declaring an emergency, a minute really isn’t that long.

blantonl|2 years ago

I would expect that the nature of the accident would be clear nearly immediately

Not really. The cockpit door was blown open, and the pilot's headsets were blown off. It was a pretty chaotic event, and when you are flying an airplane, you definitely don't want to figuratively "jerk the wheel" - you remain calm and start running checklists.

throwworhtthrow|2 years ago

You can't leave your assigned altitude/trajectory without coordinating with ATC. Otherwise you may collide with another plane, which would make a bad situation worse.

NordSteve|2 years ago

Let's imagine you're the pilot, and you're super busy with an emergency. You also know that there are mountains to the east of the airport. You also have ATC on the radio and they know about all of the meaningful obstacles in your area. Asking for lower in this situation (rather than using your emergency authority) is exactly what you want to do.

Looking at the track, they descend to 10000' until they start their downwind to base turn. Once they start that turn, they get a lower altitude (looks like 7000') until they are established on final and can fly an approach.

sigwinch28|2 years ago

More altitude means more time to work the problem.

engcoach|2 years ago

Fast hands in the cockpit are scary. Pilots take their time in emergencies because rushing will take your birthday away

bagels|2 years ago

Clearance + they were probably putting on their masks, and other tasks.

flaminHotSpeedo|2 years ago

> The accident airplane was required to be equipped with a CVR that retained, at minimum, the last 2 hours of audio information, including flight crew communications and other sounds inside the cockpit ... The CVR was downloaded successfully; however, it was determined that the audio from the accident flight had been overwritten. The CVR circuit breaker had not been manually deactivated after the airplane landed following the accident in time to preserve the accident flight recording

How the fuck is this still a problem on brand new aircraft?

johnflan|2 years ago

You gotta wonder how the technician explained away the four leftover bolts after

ultimoo|2 years ago

I look forward to reading a report from NTSB's internet outage.

newZWhoDis|2 years ago

Seems to solidly confirm the leak.

jbverschoor|2 years ago

> In a revision to the Flight Crew Operations Manual, issued on January 15, 2024, Boeing confirmed that the door functioned as designed.

Smells like CISCO

TurkishPoptart|2 years ago

"....after the left mid exit door (MED) plug departed the airplane leading to a rapid decompression"

Lol, they said the door plug "departed" instead of "blew the f** off"

mihaaly|2 years ago

What is the analogy of leaving out all bolts from that door?

'Forgetting' to put in any of the screws holding a gas tank in place in a car?

'Missing' all welds in one of a skyscraper's lower columns?

An 'oversight' of providing rendundant instruments in an airplane with natural tendency to stall?

What a hopeless shitshow is going on there behind the company gates that these kind of things can happen in succession?

A duck forgot how to swimm, an eagle forgot how to fly, Boieing forgot how to build airplanes?

belltaco|2 years ago

>The CVR was downloaded successfully; however, it was determined that the audio from the accident flight had been overwritten. The CVR circuit breaker had not been manually deactivated after the airplane landed following the accident in time to preserve the accident flight recording

In addition to local storage, why isn't the audio(along with location, altitude and some sensor information) also streamed using something like Starlink or Inmarsat to a secure location where you can store more data for cheaper and with more redundancy?

fredoralive|2 years ago

The current 2 hour limit (which is now 25 hours in Europe) is a legacy of privacy concerns. If pilots are concerned that their bosses would make a habit of yanking longer CVR units to micromanage what goes on in the cockpit (or using events several hours before an incident to somehow push blame onto the pilot for an it), they’d love the idea of it being beamed to a remote location. Yes, I’m sure there could be complicated byzantine cryptographic scheme that would theoretically solve it, but not sure they’d trust it.

There’s also bandwidth and satellite coverage not being magic of course.

skywhopper|2 years ago

This is an old system that works well and reliably for pretty much every incident. I’m not aware of another case of this sort of thing (relevant flight recorder data being overwritten) happening in recent years anyway. If you spend time constantly upgrading systems like this you’re asking for a higher failure rate, for very little gain.

That said, there’s a standard and reliable 25-hour flight voice recorder that solves this problem. But it’s only used outside the US. That’s a regulatory inertia situation and I suspect this incident will speed changes in this area.

However, finally, and particularly in relation to your proposal of streaming cockpit voice recordings to some cloud server. There is some resistance to this (and to longer recordings in general) from air crew on privacy grounds. The privacy issue is less about how much personal info is revealed in a crash situation and more about how easy it would be for a bad actor in management —or whatever operations group runs the audio storage—to listen in on conversations. And you can be sure this would happen if something like your system were implemented without the appropriate regulatory controls (and tbh even with them it would probably still happen).

outworlder|2 years ago

Starlink is a consumer system. Won't happen without a specialized product. Inmarsat is expensive. And we are talking about streaming audio from all planes currently in flight.