How the pilots of Lion Air Flight 610 lost control

[+] Animats|7 years ago|reply

There have been a surprising number of air disasters in recent years caused primarily by air data sensors returning false data.

- Birgenair Flight 301 - B757. Pitot tube clogged possibly by insect nest, false overspeed indication, autopilot commanded pitch-up, alarms, stall warning, crew confused about speed, loss of control. 189 dead.

- Air France Flight 447 - Airbus A330. Well known. Pitot tube clogged by ice, confusion about airspeed, loss of control. 228 dead.

- Saratov Flight 703 - An-198. Pitot tube frozen. Three airspeed indicators all disagreed. Loss of control. 71 dead.

- Lion Air - Angle of attack vane failure, from parent article.

That information could be checked against GPS, and at least one aircraft does this. But that has its own problems.[1] Checking against an inertial system is another possibility. Those are complicated, though. The classic airspeed, altimeter, and angle of attack vane are so simple.

[1] https://www.gpsworld.com/gps-disruption-a-full-fledged-aviat...

[+] SagelyGuru|7 years ago|reply

So, all it takes is one malfunctioning sensor, which luckily does not happen very often. Add an autopilot which slavishly follows the faulty sensor, as only computers can do so well. Plus two confused pilots, trained to rely on autopilots, suddenly required to do some serious debugging of a complex multi-sensor, multi-control system in seconds, under extreme stress. To sum it all up, you are dead.

I contend that the primary cause is not the sensor but the overconfident reliance on autopilots. When an autopilot intervenes, its whole purpose is to resume stable flight. Should anything unacceptable happen instead, the autopilot ought to switch itself off. Too many (blind) cooks spoil the broth.

Actually, I am shocked at how sloppy their approach to reliability engineering must be not to have even thought of such a basic thing. Reminds of that NY finance company that lost half a billion $ in half an hour because they could not turn off their buggy automatic trading system.

[+] anoncoward111|7 years ago|reply

This is in no way to detract from your analysis, but Air France wasn't instrument error, it was really pilot error.

The instrument malfunctioned, but one pilot was pulling a lever up, while another pilot was pulling another lever down. This generated a stall. It could have been avoided if the cabin was more well lit, so that the pilot making the error could have been visually seen to be making the mistake.

[+] 7952|7 years ago|reply

Of course it is better to have more sensors, but that is not the whole issue. The problem is that firstly, the automation is unable to deal with multiple complex failures. Secondly, pilots are not always capable of dealing with those same failures, particularly when the automation itself is part of the failure. The main solution to this is better training. But we should also question if the automation should include better handling of failures before shunting the problem back to an overwhelmed pilot

[+] beamatronic|7 years ago|reply

Could the ram air turbine tell you the airspeed? Or the GPS?

[+] WalterBright|7 years ago|reply

4 accidents over how many years and how many flights? I'd say that was very, very few.

Flight 301 - 1996 Flight 447 - 2009 Flight 703 - 2018 Lion Air - 2019

Flights per year: 40 million just this year

https://www.statista.com/statistics/564769/airline-industry-...

[+] sk5t|7 years ago|reply

In what way do you find the number of air disasters surprising? What number would be unsurprising?

[+] sebazzz|7 years ago|reply

Aren't sensors being fitted duplicate/redundant?

[+] jeffrallen|7 years ago|reply

Garbage in, garbage out. More things change, more they stay the same.

[+] gugagore|7 years ago|reply

Do aircraft inertial measurement systems ever provide linear velocity information that isn't trash? I had assumed not (except for special cases where you can do e.g. a zero-velocity update)

An inertial measurement system cannot measure velocity. It's insensitive to which inertial frame it is in. (to move from a non moving frame to a moving frame, you must accelerate, which it could detect. but then you have to integrate these accelerations over time)

[+] WalterBright|7 years ago|reply

What the pilots experienced was indistinguishable from runaway stab trim, and shutting off the stab trim from the switches on the console is the correct response. There's a loud distinctive sound when the trim runs, and the wheels that bracket the console turn, so it's pretty obvious. The previous flight's pilots had indeed done this. The pilots are trained for this. They has 12 minutes to shut off the repeated action of the runaway stab trim.

Additionally, the airplane should have been grounded after the previous flight, as runaway stab trim is a serious problem, until the fault was found and corrected.

Equally badly, the flight crew was probably not informed of what had happened on the previous flight.

[+] cameldrv|7 years ago|reply

Well sort of but not quite. When the pilots clicked the trim up switch on the yoke, it disabled the MCAS system for 5 seconds -- i.e. it made the problem go away temporarily. Then MCAS comes right back with more nose down trim. This is not a typical runaway trim situation where it's just continuously rolling in more trim. The pilots had no idea that this system existed or that a "runaway trim" failure could have these characteristics.

Sure, it's easy to say from the ground with what we know now that all they had to do was flip a couple of switches, and that a previous crew managed to land safely. However, the job of an airplane is not to be safe only with quick thinking, above average pilots. If a single sensor failure can present a situation that 99% of pilots will successfully diagnose and recover from, you're looking at multiple crashes per year.

[+] lsh123|7 years ago|reply

Exactly. While Boeing indeed added a new system, the malfunctions and resolutions matrix didn’t change. Thus Boeing didn’t need to change training and claimed that new planes can be flown without any additional training. It’s hard to fault Boeing here.

[+] privateSFacct|7 years ago|reply

What is never mentioned is that since forever runaway stabilizer is one of the FEW memory items.

Here it is. To be qualified as a pilot you have to have this memorized.

I. Runaway Stabilizer

CONTROL COLUMN - HOLD FIRMLY

AUTOPILOT (if engaged) - DISENGAGE

Do not re-engage the autopilot.

If the Runaway Continues

STAB TRIM CUTOUT SWITCHES (both) - CUTOUT

All this drama around "fighting the controls" and "fighting the plane" is weird. This is not some procedure you need to lookup, this is one of a few memory items.

[+] dingaling|7 years ago|reply

That process isn't sufficient on the 737Max. That is the point; to make it fly like a 737NG Boeing added an additional system, MCAS, that requires the AoA sensors to be manually disconnected during a malfunction or it will continue commanding elevator pitch. You could run through your memory drill as many times as you like, it wouldn't have helped in this case.

[+] ratsimihah|7 years ago|reply

That would make sense. Do you think they lacked experience or received improper training?

From what I remember when preparing for the helicopter license, it was emphasized that being able to use manual sensors and controls was critical in case of such failures.

[+] AceyMan|7 years ago|reply

The page is graphics loaded page with floating paragraphs over a single "sheet of wallpaper" showing interior and exterior diagrams of the aircraft. I can see it not working worth a damn in any kind of Reader View.

FWIW, nothing new here, but it's a good 3rd grade overview for the layperson.

And I am a Boeing guy, through and through, but they screwed the pooch on this one. RIP, Lion Air 610.

/Acey

(me: FAA licensed dispatcher)

[+] Havoc|7 years ago|reply

Yeah the layout is quite annoying

[+] sargun|7 years ago|reply

Can you point us to a better article?

[+] gumby|7 years ago|reply

I can't see how you can learn anything from it not in a reader view either. oh, and if you need any assistive technology (e.g. perhaps you're blind): tough luck!

[+] iamhamm|7 years ago|reply

These human-machine interface failures always fascinate me. The NYT makes the quip about not being able to look down and note the trim, but speed doesn’t need to be high to have these failures. It seems to be more related to system complexity misunderstandings or compounding interpretation issues. Look at the grounding of the Royal Majesty - not exactly some high speed object; it all amounted to a misunderstood icon. See: https://ti.arc.nasa.gov/m/profile/adegani/Grounding%20of%20t...

[+] WatchDog|7 years ago|reply

You would think it would make sense to automatically disable the MCAS system and sound a warning if the aircraft detects conflicting readings from it's sensors.

[+] akira2501|7 years ago|reply

Due to the way the engines are mounted on the frame, which enhances their efficiency, they also cause more of a "upward thrust vector." This makes it very easy for this plane to reach dangerously high AOA in certain scenarios, particularly during turns.

The MCAS has a specific and important function and just turning it off is probably not going to increase safety. The real problem was the Boeing did not disclose the existence of this device and it's functions in aircraft training, according to one source, because they did not want to inundate new pilots with too much information about the plane and it's attendant safety systems.

Perhaps, had the pilots known, they would have seen the stick shaker/stall warning system activating on _one side only_ as a serious indication of an Airspeed/AOA system fault and the potential for incorrect MCAS outputs being generated.

They might have known to disable the electronic trim control, bypassing the MCAS, and then to manually fly and trim the plane with the aforementioned thrust vectoring taken into consideration. They could have trained for this. That would have all given them the best safety margin for survival here.

[+] ams6110|7 years ago|reply

Yes, especially since there are only 2 AOA sensors so you can't know which one is incorrect by comparing to a third sensor. I've read that it's possible to derive AOA from other sensors but don't think these aircraft have that ability.

[+] reactor|7 years ago|reply

From the article, "Outside the plane, one of the plane’s angle of attack sensors falsely indicated that the plane’s nose was pointed too high, and the aircraft could stall."

I'm not in any way eligible to comment on aviation systems, but why that equipment needs to measure the angle of airplane be outside? Couldn't something like gyroscope mounted inside do the job?

[+] JshWright|7 years ago|reply

It's measuring the angle of the plane relative to the air moving past it (i.e. the angle the plane is "attacking" the air). Knowing the angle of the plane in an absolute sense isn't as useful when it comes to detecting stalls, etc.

[+] txcwpalpha|7 years ago|reply

Angle of Attack is based on the angle between the plane and the surrounding airflow, not based on gravity/landscape/etc.

[+] unknown|7 years ago|reply

[deleted]

[+] jumelles|7 years ago|reply

I'm amazed that Boeing isn't the focus of more - much, much more - criticism. It seems clear to me they are vastly more culpable than the pilots or airline.

[+] pcurve|7 years ago|reply

I agree that Boeing's share of responsibility is big. I think media coverage has largely reflected it too. I don't recall reading any article where it put the blame on the pilots. Based on what I've read, it felt more like, 70% Boeing, 29% airline, 1% pilot.

[+] cameldrv|7 years ago|reply

What the plane did here is not technically reasonable. We had optimal estimation and sensor fusion in the sixties. With two AoA sensors, three pitot/static systems, GPS, and a full IMU, the plane had more than enough information to determine that there was no high AoA situation requiring its intervention.calling for the computer's intervention.

Since this dangerous high AoA situation is so rare, even a simple rule requiring both AoA sensors to agree that AoA was dangerously high to override the pilot completely solves the problem. Even with just one AoA sensor and a little memory, the simple fact that the system had rolled in so much nose down trim, which should have lowered AoA, with no apparent effect should have clued it into the fact that it’s model of the system was wrong and caused it to stop.

This is not a particularly sophisticated insight. The engineers all knew this when they designined the system. They had a reason for doing it the way they did, though, and that was to slip the new system in a bit under the radar of the FAA. Since the MCAS is required to maintain the stability of the plane at high angle of attack, it should have been certified as a Stability Augmentation System. That would have subjected it to more redundancy requirements and eliminated this failure. The problem was that Boeing wanted the MAX to not require significant training or certification beyond the 737/737NG. A big new Stability Augmentation System would have required extra certification and probably pilot training. Instead, they chose to sort of launder this system through an existing one, the Elevator Feel Shift system. The EFS adds some nose down trim at certain speeds and altitudes to make the 737NG feel like a classic 737.

Since the FAA already determined that the EFS wasn't a stability augmentation system, but it could control the elevator trim, Boeing figured they could piggyback the new MCAS onto it, adding just one new input, the AoA sensor, and no new outputs. Since it only controlled the trim and not the main flight controls, Boeing could keep to its manual control philosophy, and they could slip it through certification. They couldn’t give the computer both AoA inputs, because the air data system is supposed to have two totally independent and manually selectable sets of sensors. If you give a computer both AoA inputs, you lose that redundancy concept.

So why did the EFS system never cause any problems despite having the same lack of redundancy? My guess is that the EFS is essentially an open-loop controller. It applies a fixed amount of forward trim for a given speed/altitude combination. If the pitot/static system goes haywire, the worst that happens is fixed, moderate amount of nose down trim, easily and naturally compensated for by the pilot or autopilot. The MCAS appears to be closed loop in that it will just keep adding more and more nose down trim until the AoA sensor says things are OK again, the pilot pulls its circuit breaker, or the plane smashes into the ocean.

People in the industry blaming the pilots one bit are making a mistake the industry collectively stopped making seventy years ago. We don’t blame the pilot anymore, we blame the system. Given enough time, humans will make any mistake that can be made. If the plane cannot be flown completely safely by significantly below average pilots, it's an unsafe plane. Demanding that the system be safe even with imperfect pilots is why commerical aviation is so amazingly safe today. It’s also ironically why the MCAS is in there in the first place. Boeing could have just put in the manual to never exceed 14 degrees of AoA, and even average pilots would never have a problem. The FAA would never certify such a plane to carry paying passengers with such a limitation though. It would be too dangerous. Eventually someone would screw up and the plane would go out of control, so there had to be a computer to prevent this. As it stands, the plane does not IMO meet certification requirements for the transport category, and the only reason it isn't grounded is because there are literally half a trillion dollars worth of these things in service or on order. I'm racking my brain to think of another single product produced anywhere that is so valuable.

[+] unionemployee|7 years ago|reply

I agree that that more focus should be placed on the system, however, with so many NTSB reports naming "pilot error" as the cause of accidents/incidents, I think we're still very much focused on the the pilots.

[+] ScottBurson|7 years ago|reply

Very interesting comment. Closed-loop systems always add new failure modes. The FAA should know this.

[+] unknown|7 years ago|reply

[deleted]

[+] NelsonMinar|7 years ago|reply

All these years of listening to Boeing bigots talk about how bad Airbus planes are because they are too automated, flown by machines, unsafe.. And then a Boeing automated system seems to a significant cause of a fatal accident. Not sure what if any conclusion to draw from that, just the context.

[+] 1stranger|7 years ago|reply

Would it be possible to have a single switch that disables all automatic systems and puts the plane in complete manual control as much as possible? Can these planes even operate in a complete manual mode?

[+] drpixie|7 years ago|reply

Certainly possible but not required. In the B737 family, most of the main controls are essentially power assisted direct controls.

Boeing consistently has preferred more-or-less manual controls, while Airbus took the other direction (in which the pilot guides the computer, which operates the controls).

All airlines can fly happily in completely "manual" mode. Boeings are doing this most of the time, Airbus's only do this when the situation is outside the computer's envelope.

[+] BXLE_1-1-BitIs1|7 years ago|reply

There was an erroneous stick shaker that captured the crew's attention and diverted attention from the trim stuffing the nose down. The NYT graphics show a miniscule part of the problem. Most likely the FDR data has been played through a full motion simulator - it would be a hairy ride, but the crews have to keep it confidential until the Indonesian authorities release the data.

[+] heyjudy|7 years ago|reply

https://outline.com/pELX5F

[+] gumby|7 years ago|reply

It doesn't feel like the NYT is trying to communicate something to their reader. Or if they are I can't figure out what the message is.

Apart from the whizzbang combined with tiny paragraphs (making it hard to read or comprehend), not all the text is available to Safari's reader mode.

[+] hotswapster|7 years ago|reply

[deleted]

[+] Havoc|7 years ago|reply

Both inconsistent readings and a plane actively working against the pilots will.

That seems like a pretty massive failing on the manufacturers part.

[+] StreamBright|7 years ago|reply

Human technology has it limits, reliability is one of them. The more complex systems we produce the less reliable they become. The correct course of action is not just blame the manufacturer but have a run book for every scenario of failure that makes sure the best outcome. System operation people are aware of this as much as emergency unit workers and pilots too. In this situation the pilots failed to follow the standard protocol to disengage the auto-pilot and safely navigate the plane like the previous flight team did.

[+] privateSFacct|7 years ago|reply

Except that runaway trim is a MEMORY checklist item.

Literally - they should have the procedure here memorized, and there are LOTS of cutout options, from temp right at fingertips to cutoff switches.

135 comments