Why TSB's risk management strategy failed

A number of commentators have pointed out that TSB chief executive Paul Pester’s love of surfing and triathlons should have made him well-placed to cope with adversity and sudden shocks – such as the sudden and total IT failure which has seriously dented his company’s reputation. You could also be forgiven for thinking that a passion for extreme sports would also make the CEO that bit more accustomed to dealing with risk.

The troubled migration of 1.5 billion TSB customer records from one computer platform to another last month is the latest in a long line of large-scale IT failures which have beset big banks in the past decade. What should have been a well-planned, well-executed change programme for a trusted high street bank, has instead become a textbook example of what happens when a programme fails to identify and manage its key risks effectively.

What went wrong?

To the average customer, TSB’s IT glitch may have seemed like an unexpected bolt from the blue. Up to 1.9 million customers were left unable to access their accounts on Monday 23rd April, after a scheduled computer systems migration the preceding weekend. But the actual crisis itself was years in the making.

TSB was bought from Lloyds Banking Group by the Spanish company Bank Sabadell in 2015. The scheduled system migration was intended to see TSB’s customer data transferred from its old, Lloyds Banking Group legacy system, and migrated to Sabadell’s in-house Proteo system.

Whilst TSB customers had been prepared for the fact that their mobile and internet banking service would be down over the preceding weekend, the IT glitches which greeted them the following week were soon causing alarm. Some of the most notable examples included:

Customers unable to access their accounts through the mobile and Internet banking services, frequently being greeted with error messages.
Some users were inadvertently able to see other customers’ TSB accounts, including sensitive information such as sort codes and account numbers.
Businesses left unable to pay staff, or unable to determine if bank transfers/payments had been processed properly

TSB were also guilty of some serious PR gaffes in the aftermath of the initial meltdown, having published a press release (which was subsequently withdrawn) praising the successful completion of the migration. They also came under criticism for telling customer that any IT glitches were “intermittent”.

How will this affect TSB’s business operations?

The damage to TSB’s day-to-day business operations, reputation and overall client base will most likely be severe, and this latest IT crisis will undoubtedly cost the bank millions of pounds which, in implementing the migration, it had originally hoped to save. Some of the most significant short to medium term consequences for TSB will include the following:

The Financial Conduct Authority and Prudential Regulation Authority will both begin investigations into the IT failures at TSB. The bank is also likely to be investigated, and potentially fined, by the Information Commissioner’s Office for the data breaches which have occurred as a result of the crisis.
Some customers have, understandably, switched their bank accounts to other providers and many others are considering their future with TSB. However, TSB faces further reputational damage, as its IT meltdown means customers are now struggling to switch bank accounts
It’s likely that TSB may have breached the new GDPR legislation by sending some customers’ apology letters to the wrong address.
There have been reports of fraudsters taking advantage of the IT meltdown, with hundreds of reported instances of fraud against TSB customers.

TSB’s IT meltdown: the vital statistics

1. 9 million customers were unable to gain access to their accounts in the aftermath of the crisis.
Since the crisis, the banks has received 93,700 complaints from customers
40% of customers have reported that they are unable to get through on the phone
Call waiting times continue to run to an average of 30 mins
TSB faces an unlimited fine from the Financial Conduct Authority

Why did TSB’s risk management strategy fail?

Whilst we are still learning about the specific details of TSB’s technical failures in the run up to the migration, there already seems to be a consensus that this crisis was easily avoidable. There are a number reasons why TSB’s risk management strategy failed:

Failure to learn from past mistakes

Arguably one of the most avoidable mistakes made by both TSB and their buyers at Sabadell, was the failure to learn the lesson of previous bank IT meltdowns. For example, RBS fell victim to a similar IT failure in 2012, when it made an update to its payment processing system. Similarly, this IT glitch affected 17 million RBS, Natwest and Ulster bank customers, and did severe financial and reputational damage to a bank which was already trying to repair its image in the aftermath of the financial crisis.

There is a clear, historical trend of large-scale changes/updates to banks’ legacy IT systems being prone to glitches, errors, and, in the worst-case scenario, a complete breakdown in mobile and Internet banking services, mainly due to the fact that the bigger, older banks are still using outmoded IT platforms to manage customers’ accounts. TSB was no exception.

The problem of legacy IT systems

Despite being sold to Sabadell in 2015, TSB’s IT system was still linked to, and controlled by, Lloyds Banking Group. TSB was operating on a cloned version of the Lloyds IT system, which was rented to the Sabadell controlled TSB at a cost of £100 million a year. Insiders at TSB have said that the Lloyds system, itself a product of multiple mergers with other banks, was complex and sclerotic. In preparing for the migration, the Sabadell team did not have sufficient experience or knowledge of the system they were migrating TSB’s customer data from.

Budgeting failures

The TSB migration was beset by examples of inadequate and unrealistic budgeting. As well as the aforementioned annual fee paid to Lloyds Banking Group for continued use of their old system, TSB also received a budget of £450 million from Lloyds Banking Group. Apparently Sabadell were warned that the £450 million budget was insufficient for a project of this size and complexity.

Unrealistic time frames

The initial deadline for the migration was set for December 2017, and the time-frame for delivery was optimistically set at 18 months, despite the aforementioned complexity of TSB’s legacy system. The initial December 2017 deadline was missed, adding pressure to do the migration in April which led to the meltdown

Groupthink

There was already an assumption amongst project leaders and major stakeholders that Sabadell’s Proteo system would be able to handle a migration of this size. Other IT migrations which Sabadell’s development team had overseen, involved much smaller banks, with simpler IT systems.

Moving a customer base of TSB’s size and complexity over to a new, in-house IT system was probably the least cost-effective option; the proliferation of more digital ready, off-the-shelf IT systems and the growth of fintech meant that Sabadell/TSB arguably had other options but chose not to explore them.

How could TSB have done things better?

Ultimately, TSB’s failures can be traced back to a number of key (generic) assumptions that went badly wrong e.g.

It will be fine – despite the fact that there have been several major banking IT migration failures in the last few years.
We understand the systems involved – clearly, they didn’t.
We have sufficient budget to do the migration – they didn’t.
We have sufficient time to prepare for the migration as planned – they didn’t.

…this final assumption regarding timescales is, as usual, the big one that blows up on most projects..

The crisis facing TSB point to a wider problem with large organisations’ risk management approaches, and begs the question as to how the banks can “de-risk” future IT migrations and programmes of change, minimising the chances of similar meltdowns occurring in the future.

How could banks “de-risk” their large-scale IT migrations?

A “de-risking” strategy which captures key strategic assumptions that are linked to milestones, budgets and benefits, and analyses their inherent risk, is a rigorous means of delivering favourable outcomes for large scale programmes and strengthening operational resilience. Techniques like Strategic Assumption Analysis are excellent at providing management insight into the real risks to the programme.

Additionally, banks need to:

Manage timescale risk by assessing the range of durations that the core activities on the critical path will take. Techniques such as Strategic Target Analysis are very effective at predicting the percentage confidence of achieving milestones and showing a “roadmap” of the assumptions that have to be managed to get back on track.
Manage cost risk by assessing the range of costs in all activities and capital items. Techniques such as Strategic Cost Analysis are very effective at predicting the confidence of staying within a particular budget and showing the assumptions that need to be managed to bring it back under control.

At De-RISK, we have worked on many IT transformations/migrations of a similar size and scale. Programmes of this complexity will normally fail for two primary reasons.

CEOs, senior management and major business shareholders are unaware of, or unwilling to grasp, the complexity and cost of a migration this size.
Staff members and employees directly responsible for the delivery or the project are collectively aware of the risks involved, but are either unwilling or unable to communicate/escalate these effectively to senior management.

Essentially the inability to communicate the strategic assumptions that are being made, and expose their risk, is THE reason for failure

The collective failure of TSB managers and key stakeholders in the project to fully analyse the inherent risk of their assumptions is continuing to have a disastrous effect on the company’s public image. At the time of writing, TSB customers are still experiencing problems accessing and utilising digital services and Paul Pester is facing calls for his resignation following a second grilling from MPs last week.

Some commentators have argued that the traditional high street banks are too reluctant to face the problem of outdated IT systems head-on, as the task of upgrading IT and digital banking services for the 21st century is perceived as too complex a task.

Sufficed to say, in a world where customers are demanding an increasingly individual, bespoke digital banking experience, banks which don’t innovate will be left behind by newer fintech and rising challenger brands. But innovation requires effective risk management.

The necessity of innovating banks’ IT platforms means that CEOs cannot afford to merely pay lip service to the idea of risk management. Operational resilience can only be secured if banks are willing to foster a culture where the communication of potential risk is encouraged, normalised and positive.