top | item 34064191

UK bank fined £49M over IT system meltdown

139 points| shaman1 | 3 years ago |bbc.com

80 comments

order

azalemeth|3 years ago

I have a bank account with TSB and got compensation as a result of this mix-up.

Some rather personal experiences of the fiasco:

– Rather pointlessly, the website changed from being mostly static to entirely written in a very JS-heavy, "dynamic" way. I still can't use it in my normal browser (FF) with its extensions because it relies heavily upon CORS requests and referrer information that my somewhat privacy-paranoid extensions block.

– This was introduced at the time of the switchover, and until that point the IT system used looked identical between Lloyds, TSB and Halifax / BOS systems (I have accounts with some of those)

– The online browser-based system was telemetry and JS heavy, replacing a far leaner page

– I was unable to log in during the time of the fiasco, mostly due to 403 errors or timeouts. Often the page would just hang as an async request wasn't answered.

– Once I did manage to log in, I was amazed to see another person's account details (!!!), replete with (their) name and statement.

– I was unable to use online banking to pay bills or check my balance – I could see someone else's account in detail but was too honest to do anything with that knowledge. I can't remember if my card stopped working but I was effectively forced to make other arrangements for quite an extended period of time.

varispeed|3 years ago

> – The online browser-based system was telemetry and JS heavy, replacing a far leaner page

I remember one of those banks using the "leaner" page also had heavy telemetry turned on at some point. I type very fast, so I noticed that when I was entering my user id, it was lagging heavily. Then I turned on developer tools only to see that they were logging all keystrokes to analytics. Including username and password. At first I thought I got a virus or something, but these appeared to be legit scripts from the bank. So I decided to not use that bank account for a while. I wonder why would they turn something like that on.

jbkkd|3 years ago

Honest question - why do you still have an account with them?

usr1106|3 years ago

As a privacy-aware user, when making a contract with a bank (or buying a flight ticket or whatever) you should get assertions that their web site meets certain quality standards so you can use your browser to access the account or actually check in.

Paper did not have those incompatibility problems...

However, from the BBC article I conclude that even customers with a default browser could not necessarily use their account

Edit: Forgotten not added.

Rastonbury|3 years ago

How did you get compensation? Thank goodness for monzo and revolut being so quick to set up but I had money trapped in TSB for some time. I thought it would only last a day or two at most. The services and ability to get support were non-existent during that time I totally stopped trying to call. I closed my TSB account shortly after

antihero|3 years ago

> I still can't use it in my normal browser (FF) with its extensions because it relies heavily upon CORS requests and referrer information that my somewhat privacy-paranoid extensions block.

So you have extensions that literally break normal browser behaviour and you are blaming them somehow? CORS is part of browser security and should be respected.

Not saying that TSB aren't clearly a shitshow but maybe just disable the extension for that site.

iechoz6H|3 years ago

"I could see someone else's account in detail but was too honest to do anything with that knowledge"

Are you patting yourself on the back for not commiting fraud?

arpinum|3 years ago

The 250+ page analysis of the incident was an excellent insight into how large IT projects fail: https://www.tsb.co.uk/news-releases/slaughter-and-may/slaugh...

money quote: > This situation has all the hallmarks of business management strong-arming the IT organization into an unrealistic timeline. When business leaders push for overly-aggressive timelines, or regulators ask for multiple competing risk frameworks and excessive after-the-fact incident reporting, this all puts a strain on the delivery organization’s ability to untangle the complexity before ‘go live’.

zeristor|3 years ago

The report is by Slaughter & May, one of the more delightful company names in the City of London.

My understanding was that they’re a law firm, perhaps they’ve also branched into IT consultancy?

ilyt|3 years ago

Sadly far too often managers and decisionmakers think timeline and deadlines is a topic for haggle, not discussion. They treat it same as haggling on a bazaar.

heurisko|3 years ago

2008: "The UK-based IT department of the fifth largest bank continues to dwindle as more jobs go overseas... This round of cuts, starting in June and lasting 12 months, involves up to 250 permanent IT roles and 200 contractors from the bank's technical delivery division, responsible for software development and design." [1]

2018: "Timeline of trouble: how the TSB IT meltdown unfolded". [2]

It's probably more complicated than that, but perhaps not much more complicated.

[1] https://www.itpro.co.uk/197982/lloyds-tsb-cuts-more-uk-it-jo... [2] https://www.theguardian.com/business/2018/jun/06/timeline-of...

makomk|3 years ago

That probably has very little to do with it. The immediate cause of all the problems was that Lloyds TSB was forcibly split up in order to try and increase competition and the Lloyds half kept the IT department, and when the TSB half tried to move over to the existing IT platform of their new parent company everything broke.

varispeed|3 years ago

There was also a heavy enforcement of IR35 in the banking sector, so that substantially reduced the access to talent pool.

There was also a tightening of posted worker regulations, so that banks couldn't ship workers from overseas as a source of cheap talented workforce.

jayceedenton|3 years ago

I will always remember this incident as the time when the UK general public were exposed, en masse, to Spring error messages.

The confusion caused by ordering a member of the general public not to request a bean from a bean factory in a destroy method implementation still makes me laugh, even now.

ilyt|3 years ago

I remember when we were telling devs to stop returning java 503's with stacktraces to the user.

Devs fixed it by returning 200's with stack traces.

And as page was ESI stitched together on Varnish, when they fucked up there wasn't just a stacktrace, but a bunch of different ones in various parts of the page.

Nextgrid|3 years ago

The fact that those messages were visible to external users is a major problem and sign of incompetence.

grumpyprole|3 years ago

I really hope my bank runs cobol rather than any of this spring bean junk.

chinabot|3 years ago

If I cant raise a specific exception or its in a block of code I dont expect an error I generally make the exception memorable or dangerous sounding just so the user is more inclined to report it (hopefully not via a tweet tho)

jesusthatsgreat|3 years ago

Sounds like classic A type personalities with zero technical chops deciding how long a technical project should take to further their own agenda:

> The Migration Programme experienced delays from the outset and fell behind the IMP timings. While progress had been made, on 20 September 2017 the firm decided that the Migration Programme would have to be re-planned. However, nine days after it had resolved to re-plan, and before it had concluded its re-planning exercise, TSB publicly announced it would now migrate in Q1 2018.

UltraViolence|3 years ago

It's not just the CEO who should've been fired. The COO, CTO and CIO also should've left the building with a cardboard box in their hands.

This is a shameless indulgence in incompetence and recklessness. They didn't even bother to test large swaths of the transitional data or have a fallback plan if things went wrong.

Most likely their customers will now simply leave and the bank will be shut down.

VincentEvans|3 years ago

I sneer at the emphasis on “1.4 billion records!” in the article as if it’s a lot.

At a recent place of employment I created and was responsible for a database that had about that many records and in actuality was a single 2tb postgres db and completely unremarkable.

I never claimed to have worked with big data.

gghhzzgghhzz|3 years ago

It's not really the quantity of data that is important in migrations like this.

It's what is and isn't in the data - often a lot of junk in my experience if the source system is a legacy system that has evolved

what meaning that data has within a completely different system

what the demands are on the completeness of that data is in the new system

how to deal with exceptions

and whether that data can ever be frozen, or whether it is still online (as in the case of banking transactions)

This is unlikely to be simply a technical problem of ETLing tables, changing date ranges from inclusive to exclusive and mapping some address fields.

Of course the size of the data after a certain point does make a big difference to risk planning and business continuity planning. It's not possible to rollback and try again within the migration window should a catastrophic issue occur, and it's not possible to simply run some bulk updates to fix issues during the go-live validation.

It is noted though in this project that the data migration itself was not found to contribute to the failure.

pixl97|3 years ago

What were your latency requirements on pulling a record out and how complex were the joins to pull said records?

If you have a simple db structure with a few tables and very clear data/index rules then billions and billions of records is pretty easy. Your indexes cut out 99% of the work and everything runs smooth and efficient.

But then you can have eldritch horrors where your stored procedures look like seedy detective novels where you chase join after join and have scary high memory requirements on execution.

hennell|3 years ago

When you say "emphasis on" you mean the single mention with no distinction between that stat and the any other of the facts of the case? I don't think they really put any weight on if that's a lot or not.

Personally I would say it is a lot. No other number mentioned in the article even approaches a billion. Billions are big. It might be not be true big data big, but it is still a lot of customer records for a migration project (depending on what exactly they were trying to do within the migration) and it does illustrate why they had so many issues, because there was a lot to deal with.

teleforce|3 years ago

Just wondering if the migration disaster at this scale can be avoided using modern cluster and orchestration technology like Kubernetes?

petepete|3 years ago

Hopefully Virgin Money will get one too. They broke their Android app earlier this year and since they make you verify web logins using the app I was unable to access any of my business accounts for ~3 weeks.

If something really urgent had come up I could have done what I needed via telephone banking or in a branch, but it was a huge pain in the arse because of a single point of failure.

Just let me use a Yubikey as my second factor damnit.

insomniacity|3 years ago

+1 on the Yubikey. I'm pretty good at moving my savings around and getting the best interest rate possible - the side effect is a ton of accounts, which means I'm drowning in 'secure memorable passcode key PINs' and my SMS inbox is full of SMS 2FA codes, and I'm wondering what it would take to get a bank to offer Webauthn/FIDO.

How about a website where we pledged to open an account and deposit £X into savings, or switch current account, if they offered Webauthn/FIDO?

orf|3 years ago

shaman1|3 years ago

Quite thorough report, some points that stand out from the summary:

>SABIS was TSB’s principal outsourced provider

>SABIS relied extensively on 85 third parties (TSB’s fourth parties) to deliver the systems required for the migration and the operation of the platform, which required it to act as a service aggregator.

It amazes me the sheer complexity of a retail bank software system and I suspect most of it is due to legacy systems, legal requirements and lack of regular spring cleaning.

kmlx|3 years ago

i remember this. for at least a week people couldn't access their money. it was chaos. the bank lost lots of money and customers due to this botched transfer.

jesusthatsgreat|3 years ago

Not your keys, not your coins.

sherr|3 years ago

Not my main bank so this did not affect me badly but their online (web based) banking portal is still glitchy and not very good.

MaxBarraclough|3 years ago

Even if it's not your main account, it could be bad if your account details were leaked.

JCM9|3 years ago

I remember when this all happened. Would be interesting if it was the result of some really interesting technical bug that nobody could have foresaw followed by a fascinating effort to save the migration.

In fact it was all quite boring and simply the result of the sheer incompetence of the bank’s leadership in running bank IT.