XLOD London 2024: What’s next for operational resilience?

Experts from Morgan Stanley, BNY, Nomura and First Abu Dhabi Bank discussed this key issue at the leading compliance industry event.

This session at the premium surveillance event for finance featured expert practitioner comment from: Sharon French, MD Global head of operational resilience, Morgan Stanley; Nick Fuller, Global head of third party risk management, BNY; Leila Gomes, MD EMEA head of business resilience, Nomura; David Arnold, UK Chief risk officer, First Abu Dhabi Bank.

The session started with an audience survey question – how prepared are you for the upcoming regulations (DORA and UK version)? The answers were;

  • fully – already met the requirements – 2%;
  • mostly – well along the path – 72%;
  • somewhat – still significant work to do – 26%.

The moderator hoped that the majority of attendees were indeed well through the process. DORA has added complexity as it is much more prescriptive. The challenges around critical third parties place more dependency on others. With important January and March deadlines in relation to DORA, the challenge is to encourage better preparation, testing and stress scenarios so that resilience becomes a progressive discipline.

The panel emphasized that the industry is only at the start of this journey and with DORA’s technical standards still unreleased, there is a lot more to do especially once the feedback from NCAs and the ESB is available. Clients, stakeholders and third parties are all waiting for these disclosures.

What good looks like

The panel opined that it is still not obvious what good looks like to regulators in respect of operational resilience. In the same vein how are institutions adapting to an enterprise view of risk? What requires adaptation and what does appropriate look like? So much depends on the size, maturity, risk culture and product mix at an institution, which all play a part in the outcome.

Preparedness can look very different from one firm to another but so many have been working on resilience for a long time now. For many it is a case of embedding this – the topic is a common one at the top table now. Responsibility falls on the first line for a control approach but they might not always have the subject matter expertise to cover this so some support from the second line could help. Key risk indicators and reporting should bubble up from action points to relevant committees. The chief risk officer, operational risk team and information security team (alongside IT) should all be sufficiently involved to result in proper verification. 

It is key to work with vendors who are willing to work with you. This can be a cultural journey for them as well as you.

The debate moved to whether existing structures are appropriate for a cross-functional approach. This requires a clear target operating model with well-defined responsibilities. The third-party element as to inter-connectedness is very important to consider. Many of the regulatory needs here should bleed into a framework. The regs have nuance but a great deal of commonality. This is not straightforward but deadlines are imminent, and even after they have passed there needs to be continual change.

Many are now conducting exercises and testing with third parties to establish how they would react to absorb any shocks or disruption. This includes working with the contracting party. The need to bring any service in-house or on prem has to be an option so planning to avoid a stressed exit could also be of benefit. It is key to work with vendors who are willing to work with you. This can be a cultural journey for them as well as you. They are getting similar requests from other clients – it is very important to outline what is critical and what is not.

The panel touched on IRQs and DDQs, emphasizing the need for testing by third parties on these and the requirement for them to show the results of these. Services that are viewed as systemic need special attention. The goal is to identify the weakest link in a supply chain.

Testing is the backbone of any program as it exposes vulnerability so that these can be examined and, if required, remediated. There is no shame in a ‘rinse and repeat’ policy here. This can be backed up by higher frequency and variety of tests with third parties. Discussion of impact tolerance is insufficient; technical testing leads to a clear resiliency posture.  This goes beyond assumptions so that there is comfort that systems will stand up under stress.

Key stakeholders

Next was the best approach to get key stakeholders to the table and achieve progress on robust operational resilience. Workshops are an excellent alternative to live testing, which is tough to set up. These can identify the major points of potential failure which lurk in infrastructure and through major cloud providers. This sort of engagement is an excellent starter and will tease out the core issues.

The CrowdStrike failure was discussed and the conclusion was that many were not in a position where what happened had been predicted and prepared for. The same could be said of the pandemic. A key requirement is the focus on response as well as recovery. Many tend to take a ‘happy path’ in their preparation and this often proves to be inadequate – more challenge is needed for some worse case scenarios.

The goal is to mitigate intolerable harm to customer, employee and the market. Adaptability reigns over a more defensive posture. In the CrowdStrike case, financial services firms were not that affected as the mapping many had done was sufficient mitigation.

Deliberate choices need to be made in the original architecture and this starts with procurement.

Quality communication is a vastly unappreciated tool in effective resilience. It needs to be front and center for the approach to resilience and understood as a vital policy. Prevention can be improved by using the intelligence and management information gleaned by your own incidents and testing. These can signpost your Key Risk Indicators and you can also test what your actual recovery time (RTA) is before you have your data restored. Analyzing vendors and how you would be affected if you lost their service is sensible and how you would continue is also of value. Being able to evidence your work with regulators to a granular level also helps get their comfort.

The end of the session covered how best to test the resilience of large cloud providers. While they are arguably more resilient than most, an alternative route for porting data to another provider should be an option though this is far from straightforward. Interoperability and substitutability are often analyzed in terms of vendor risk. Deliberate choices need to be made in the original architecture and this starts with procurement where modularity and equivalent services can be selected that provide diversity of location and reduce concentration risk. Redundancy and adaptability levers can then be pulled if there is the need. Embedding some of these attributes can significantly improve the operational resilience framework at its outset.

This summary is not a full transcription of the session, but contains the sense of it as interpreted and reported by the GRIP subject matter expert who attended, who is an ex-compliance officer and regulator.