top | item 16747839

Half of European flights delayed due to system failure

215 points| el_duderino | 8 years ago |bbc.com | reply

103 comments

order
[+] tim333|8 years ago|reply
Apparently the Enhanced Tactical Flow Management System (ETFMS) packed up.

>“ETFMS facilitates improvements in flight management from the pre-planning stage to the arrival of the flight. It maximises the updating of flight-related data and thus improves the real picture of a given flight, thereby contributing to the Gate-to-Gate Concept,” Eurocontrol explains on its website.

>The agency initially reported that contingency procedures were immediately put in place which reduced the capacity of the European network by around 10 per cent. http://www.airtrafficmanagement.net/2018/04/eurocontrol-give...

They don't seem to say what went wrong but do say

>In over 20 years of operation, the ETFMS has only had one other outage which occurred in 2001. The system currently manages up to 36,000 flights a day.

Tech details from wikipedia fr:

Written in ADA , and running on HP-UX , the system is based on an exchange of messages between the airlines (who will file / change / update flight plans), the air traffic control bodies and the CFMU , the messages are written in ADEXP format.

ETFMS uses at least 5 fundamental notions:

flight plan : describes the 4D trajectory of an airplane. regulation: aircraft rate applied to a "volume traffic". Example: 50 aircraft / hour "Traffic volume": association of a geographical reference (air sector, waypoint, airport, etc.) and a set of aircraft flows. list of takeoff slots or called slot . Example: if the rate of regulation is 30 planes / h, there will be a slot every 2 minutes: 10 hours, 10 h 2, etc. the delay: difference in time between the take-off time desired by the company and the schedule calculated by ETFMS.

[+] makmanalp|8 years ago|reply
It would be fascinating to read about what bug took down a system with one outage in 20 years. I remember reading in the chubby paper (from google) that user error was the cause of more than half the downtime incidents. Wonder if that's the case here too.
[+] isostatic|8 years ago|reply
> Eurocontrol announced the system restart later in the day, after what it called extensive testing.

So they turned it off and on again.

[+] tnolet|8 years ago|reply
5hr delay on a 1,5hr flight from Berlin to Paris. If they'd use Docker with React this would have never happened!
[+] reificator|8 years ago|reply
> If they'd use Docker with React this would have never happened!

It's probably because they wrote it in Go. Did you know Go doesn't even have generics?

[+] Sharlin|8 years ago|reply
This is a great demonstration of Poe’s law.
[+] emersonrsantos|8 years ago|reply
For those curious, Eurocontrol MUAC (Maastricht Upper Area Control Center) migrated to 50 virtual SUSE Linux Enterprise servers running under IBM z/VM hypervisor on a IBM z196 mainframe system in 2013.
[+] justadudeama|8 years ago|reply
Does any of this effect the nature of the failure? Have they said what went wrong?
[+] amaccuish|8 years ago|reply
anywhere we can read up on this? :)
[+] sleavey|8 years ago|reply
I was on a plane just about to push back at Heathrow, and the pilot informed us we'd be delayed 15 minutes due to this failure. In the end it was 10 minutes, and we landed only 5 minutes late at my destination. Doesn't appear to have been a big deal, at least for me.
[+] dx034|8 years ago|reply
10% reduction probably meant that they tried to keep most flights on time and "strategically" delayed some flights. Makes it worse for some passengers but keeps knock-on effects for transfer passengers under control.
[+] icebraining|8 years ago|reply
"departures are now limited to 10/hour at #brusselsairport"

Impressive that they maintain this rhythm even during a once-in-a-decade unexpected system malfunction.

[+] ttul|8 years ago|reply
I guess that’s the “paper and pencil” speed.
[+] njitbew|8 years ago|reply
Yes, this sucked. I just had a 1 hour delay on a 1 hour flight (AMS-ZHR). Unfortunately, no compensation until its a 2 hour delay (but thank god it was only one hour!). Passengers who had a layover were noticeably less happy.
[+] CaptainZapp|8 years ago|reply
You'd probably won't have been eligable for compensation since this delay was definitely beyond the airline's control.

While a lot of airlines try to weasel out of their obligations (mechanical failue, for example, which however is the airline's responsibility) I would think such a case is pretty clear cut.

[+] tvanantwerp|8 years ago|reply
I was at a conference in Northern Virginia some months ago and saw a presentation from the folks at Upside, a startup specializing in booking business travel. They described the legacy system which handles pretty much all booking in the US, a system called SABRE. They described it as an ancient 6-bit computer system in Texas with no modern API. Everything they do tech-wise is a modern wrapper around that system. So I'm not at all surprised by any air travel computer failures if tech like that is central to the system.
[+] tyingq|8 years ago|reply
You're talking about TPF[1]. Many smart people and organizations have tried and failed to build something that could match it, including a company Google paid $700 million to buy. I personally know of at least 10 failed attempts :)

Not sure where "6 bit" is coming from though, and you can use gcc/c++ now, not just assembler[2]. And it's in Tulsa, Oklahoma, not Texas. Sabre's HDQ is in Texas, the data center is not. The hardware is very modern and new Z series mainframes in big loosely coupled clusters.

Amadeus, Sabre's main competition, also still has TPF at the core.

There is one notable non TPF reservation system. http://www.navitaire.com/ Last I checked, it couldn't scale well enough to handle a large airline.

Both Sabre and Amadeus are replacing TPF, but one function at a time (shopping, fare engine, booking, check in, etc). And very slowly.

[1]https://en.m.wikipedia.org/wiki/Transaction_Processing_Facil...

[2]https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0....

Fwiw, TPF is basically a huge, distributed, and transactionally consistent nosql database. Most orgs still using it have extracted most of the business logic that was in it, out to Linux boxes that front it. Not for stability reasons, but faster time to market with new features. To date, attempts to replace the high contention and high transaction rate nosql type traffic haven't scaled well enough.

Just in the US, 2.5 million people fly each day. And the processes to sell, board, etc each passenger are lots of transactions each. It's a pretty big scale. I think it's fairly close to Amazon sales per day, but with more contention and sub transactions.

All Visa credit card transactions are also still on TPF.

[+] userbinator|8 years ago|reply
I've heard of enough misguided "modernisations" (and failures thereof) that I think the "legacy system" was the part that stayed working throughout, and it's the newer stuff added around it that failed. The old stuff may be old but there's a reason it's old... it's outlasted any attempts at replacing it.
[+] dhimes|8 years ago|reply
Wow. I remember sabre from genie- General Electric Network Information Exchange. In the 1980s. I’m not sure compuserve was invented yet. My father and I could use “electronic mail” to stay in touch. For you youngsters: I hsd never yet had a “remote control” for a tv yet. Yep. Had to get up to sdjust the volume or change the channel (don’t get me started on adjusting the antenna).

Wish I didn’t know this.

[+] ubernostrum|8 years ago|reply
In the air-travel world, a lot of seemingly-random limitations on systems that interface with reservations comes from the fact that they were originally built with telephone interfaces intended for use only by trained staff.

Every few years, for example, someone digs up and reposts one of the articles explaining why some airlines didn't allow 'Q' and 'Z' in account passwords (they were passing things directly through to SABRE on the backend, and so only allowed letters that could be "dialed" on the 1960s rotary phones SABRE was designed to interface with).

[+] DavidAdams|8 years ago|reply
Can confirm. My flight from Zurich was delayed by over an hour today, causing my family and me to need to run, OJ Simpson-style, through the Philadelphia airport to make our connection.
[+] candiodari|8 years ago|reply
No worries.

Everyone at that organisation is paid boatloads of money. [1]

They don't pay taxes on it. (~10%, "for solidarity", which means they get to enjoy healthcare paid by ~55% taxed nationals)

And 90% of the organisation (especially the management) has absolutely nothing, nothing whatsoever, to do with guiding planes anywhere. In fact, those departments are severely understaffed. The department doing "regulatory support" is about 2/3rds of the organisation (tldr: making sure half the local government officials don't have to get their own coffee - and before you say it, no, Eurocontrol employees don't get them coffee, they're merely in charge of making sure someone's there to get them coffee, and steak, cake, and ... The coffees, I might add, are baffling. Done from a steam boiler machine in front of you, with fair trade beans, sweetened not with sugar, but with expensive imported bars of chocolate meled in milk that's frothed in front of you (they put in the chocolate somehow while they're frothing the milk with steam, melting it in while not getting the steam on the chocolate somehow), and you get the rest of the case of that (expensive) chocolate to take home for the kids. No, not when you ask, they'll ask you if you want that. Btw, it's not really the rest of the case they give you, you get a fresh case. Oh and of course, of the bar they opened they prepare just one coffee (about 1/4th of the bar). The rest gets thrown into the trashcan, they don't use the same bar for the next coffee. As for the steaks ... oh my God)

And in case you're wondering: the odds of 2 planes colliding with zero guidance outside of the ATC zones around airports (which aren't covered by Eurocontrol) over even a region as big as Europe are more than 10 billion to 1, against, per year. So if Eurocontrol didn't exist at all, and we just allowed every plane to fly wherever ... nothing would go wrong at all.

So ... what is the problem here ? Disruption of millions of travelers for no reason whatsoever ? Let's please not pretend anyone at Eurocontrol cares (well, they care about not being interfered with, and that will make them care NOW, but if one thing's guaranteed it's that the Software/ATC departments will remain the same size, and only the bribery departments will grow)

[1] http://www.eurocontrol.int/sites/default/files/content/docum...

[+] kzrdude|8 years ago|reply
Just like there are no bugs without security impact, is it real to say that this has no impact on flight security? Any error can be a contributing factor.
[+] webreac|8 years ago|reply
Safety is ensured by ATC controlers. ETFMS is there to ensure that traffic does not increase beyond controlers capacity. I have read that without ETFMS, the traffic is reduced by 10%. I have been involved in ATC simulations where controlers had to land about 40 flights per hour using new procedures and tools. We have tried with 38 flights per hour, it was too easy for the controlers: their work was perfect even without tools. With 42 flights, controlers were getting angry because the traffic could not be managed. At 40, we could see the benefits of out new procedures and tools (more regular separation of flights). IMHO 10% less traffic gives far enough capacity margin to ensure safety.
[+] gsich|8 years ago|reply
Similar to every incident at a nuclear power plant. "no danger for the population"
[+] thaumasiotes|8 years ago|reply
> Just like there are no bugs without security impact

I'm compelled to once again bring up the report that went "when I zoom in, the text becomes blurry".

[+] ehudla|8 years ago|reply
Is that the Ada code base?
[+] maartn|8 years ago|reply
So Trump and Putin finally hooked up. No need to get all SuSe over it... nerds
[+] jumelles|8 years ago|reply
I fear this sort of thing is going to become more and more common at airports.
[+] isostatic|8 years ago|reply
Why?

Sure, the skies are more and more crowded, meaning more and more people will be affected by a once-in-20 year failure, but why would it happen more and more common?

[+] bluedino|8 years ago|reply
Twenty years from now...uber car service system failure affecting 90% of North America
[+] jlgaddis|8 years ago|reply
The only reason this will never happen is because there's a only snowball's chance in hell that Uber will still be around in 20 years.
[+] matte_black|8 years ago|reply
In situations like this I’m glad I book my flights with Chase Sapphire Reserve. It comes with a trip delay reimbursement for up to $500 per ticket if a flight is delayed more than 6 hours. No sweat!
[+] ytwySXpMbS|8 years ago|reply
I'm glad I live in the EU in such circumstances: EU regulation 261 [1] covers so much, with €250 to €600 compensation depending on flight distance for delays over 4 hours, with a percentage of full compensation for shorter delays. No specific credit card required.

[1] https://en.wikipedia.org/wiki/Flight_Compensation_Regulation...

[+] joezydeco|8 years ago|reply
...for a $450 annual fee. If you plan to get kicked off a flight more than once a year, perhaps it's worth it.
[+] repiret|8 years ago|reply
I’ve never had a problem getting a refund from an airline (in the US) when I chose not to take a flight because it was significantly delayed.
[+] kylegordon|8 years ago|reply
Compensation is built into the EU regulations.
[+] Dig1t|8 years ago|reply
Found the Chase employee.