In July, a software update from the cybersecurity firm CrowdStrike caused Microsoft Windows operating systems to crash.
The US Government Accountability Office called this event “potentially the largest IT outage in history.”
The broad-based impact generated by the incident heightens the need to develop an effective risk management process to combat systemic cyber risks.
The CrowdStrike incident is a completely different genre of cyber event than the more conventional attacks on specific entities such as Target/Equifax or the SEC. This incident didn’t impact a particular targeted entity, it impacted an entire digital eco-system.
These systemic cyber events are growing in frequency and are potentially far more damaging than traditional entity attacks. These systemic incidents raise unique, and difficult, questions for policy makers to address as compared to the traditional attacks on specific entities.
The most common question raised in the immediate aftermath of the CrowdStrike incident is “how did this happen?” However, a more important question is why did this happen?
Some will argue the answer to the “how” question is obvious; CrowdStrike screwed up. Either they didn’t test their fix properly before deploying it or they didn’t train their people adequately on how to deploy the suspect fix. Investigations will determine if there is a factual basis for such assertions. However, even if these allegations prove correct, they are only proximate causes for the event.
The ultimate, or real, cause for the CrowdStrike event, as with other systemic events like SolarWinds and NotPetya, is that our vastly integrated digital systems are subject to constant and ever more sophisticated disruptions either due to attack or, as in the CrowdStrike instance, a faulty effort to prevent attacks.
As a result, we need to constantly and urgently modernize and update our technical and governance systems to keep up with the increased risk from systemic events.
Moreover, adapting traditional “entity” oriented policies such as increasing liability would be difficult to implement, and could actually be counterproductive. In an environment of constant and evolving attack methods, we absolutely need our best technical minds to be continually creating and deploying prevention and mitigation technologies.
Enhancing – or even threatening to enhance liability on these companies will create disincentives for needed development and deployment of pro-security methods.
Addressing the immediate need deal with current systemic risk requires a risk management process that will enable a pro-active system to empirically identify areas of systemic risk and a collaborative process to better protect our digital eco system.
One approach to enhancing security from systemic risk, while simultaneously not discouraging innovation and investment in cutting edge technology, would be built around dominant market penetration of key elements in the cyber eco-system.
Previous research has found that there are a comparatively small number of elements in our cyber eco-system characterized by massive market dominance – with 1-3 companies holding between 70-100% of the market.
This dominance includes areas such as desktop and mobile operating systems, webserver software and operating systems and electronic medical records.
By focusing on the identifiable elements of the system where their market penetration creates the greatest risk for systemic events, we can address our largest systemic threats through a risk management system. In such a process when a portion of the system reaches a dangerous degree of market penetration the company owning the technology would be required to report this to the federal government.
This would lead to collaboration between the company that has developed the product and the appropriate government agency to design additional security measures to protect that element that, due to its success in the market, is now a more attractive target for attackers and hence is now a systemic risk.
For example, a configuration change to the Orion software from SolarWinds may well have mitigated a great deal of the risk that led to the widespread, and very damaging Solar Winds attack. Similarly, CrowdStrike in their Preliminary Post Incident Review identified three different processes that could have been used to limit the impact of their faulty update process.
It is a practical impossibility to expect IT vendors to completely lock down every element of every system. However, by pro-actively focusing on those elements that create the greatest risk we can, fairly quickly, mitigate a substantial portion of our current systemic risk. Of course, this does not solve our systemic risk problems – that will require far broader and time-consuming policies; However, this does create a practical process to address the issue though a standard risk management process.
Since this process calls for unique engagement between public and private sector in the interests of preventive systemic impact, the process ought to also divine appropriate compensation for the company to make upgrades not justified by market conditions but are justified on a public policy basis.
This narrowly targeted approach will enhance protection from systemic events while maintaining the market incentives required to continually develop and deploy state-of-the-art security technologies.
It is important to appreciate that the usual reason a particular product reaches a significant state of market penetration is because the product is so good. The reason the CrowdStrike event had such widespread impact was that the market – including the US federal government – had judged the CrowdStrike service to be state-of-the art (note ISA has no affiliation with CrowdStrike).
We need to move past the adversary mechanisms that have characterized most government responses to cyber threats and move toward a digital era of pro-active and collaborative structure to create a sustainably secure cyber eco-system.
In creating a new, more effective and sustainable system several issues must be analyzed in the new systemic framework. Among the issues that need further investigation are:
- What sort of incentive structure is needed to encourage investment in technologies and techniques to prevent systemic risk that may not be supported by market mechanisms?
- What incentive structures are needed to assure governments and private companies adopt proven effective practices to mitigate systemic cyber risk?
- How can system users impacted by a systemic event be made whole from damages suffered from systemic events?
- Does cyber insurance need to be modified to make it a viable method for the transfer of systemic risk?
- How can policies such as secure by design and default be adapted to help mitigate systemic cyber risk?
- How does the advent of emergent technologies such as AI and Quantum impact the threat of systemic risk?
- How can we discourage over-broad applications of counterproductive measures (e.g. liability) creating a disincentive for advanced security products?
- What reforms to the government industry partnership structure are needed to enhance security from systemic events?
- (Image courtesy: forbes)