For almost 100 years, the industry has treated all hazardous life safety situation as an accident. The principle is to eliminate dangerous situations to avoid serious injury. Indeed, a combination of factors can turn a hazard into an accident. The same thing applies in cybersecurity. The cause of a major IT incident is often related to several factors.
Since 1931, the industry has been using the Heinrich pyramid to improve safety and reduce serious accidents. Heinrich calculated the ratio of no-injury, minor injury, and serious injury accidents. His theory is that accidents are the result of human error. By identifying them, we reduce the number of accidents and therefore serious injuries. Since then, personal protective equipment (PPE) has been mandatory.
Limits of the Heinrich pyramid
In 100 years, his theory has been criticized several times, in particular by Deming. For the latter, the main cause of accidents is related to deficient processes. Ratios vary by industry, but the pyramid is the one to remember: many dangerous situations for few major injuries. However, whether they are human or process errors, root cause analysis helps to resolve and avoid risky situations.
One of the recent criticisms and analyses is that reducing the base of the pyramid does not automatically reduce the top. Remember that in 1931, what was identified as a no-injury or minor injury accident is very different from today. The majority of the staff worked in factories, whereas today we are mostly in offices. The working conditions in the factories were very different from today.
When you add the paper cuts to the incident records, you artificially increase the base of the pyramid. This type of incident should not result in a full investigation and corrective action. The base should only include situations, which did not cause serious injury, but which would have the potential to do so, if the context had changed. If I cut myself with paper, I have no risk of going to the hospital. On the other hand, if I trip over a cable or cardboard box lying around in an office space, then there is a risk of serious injury.
Using the pyramid for cybersecurity
Cybersecurity is a field of security and can therefore benefit from this approach and its century of learning. The pyramid concept is used in incident reporting, monitoring and threat hunting. The principle is to look for problems, anomalies or minor events that could lead to more important incidents.
Select data
An interesting source of data is the problems reported by the staff. We are upstream of reporting an incident. These tickets are a useful source for identifying risks and detecting weak signals. After losing a phone, it’s time to see if you have the measures in place to prevent the loss from having more serious impacts.
Those responsible for dealing with day-to-day problems should always keep this pyramid in mind. Could this problem have more impact? Any problem that affects a single person or device, even if handled quickly, should be subject to root cause analysis. Why didn’t this computer have the latest updates? Why have the updates been delayed? And so on.
You can include in the base of your pyramid the anomalies or minor events resulting from the network monitoring. The important thing is to focus on the events or anomalies that can lead to a major incident.
Choose a methodology to document your work
The first step is to choose where to store your data and analysis. Your IT service management or help desk management solution is probably the best option. You can add an attribute to identify cases with major incident potential.
Then, a tool like the 8D or A3 can help you address the root cause of these problems. Ideally, the people processing the tickets should be able to use these tools and start the analysis.
As with all processes, monthly monitoring of escalated and corrected situations will help you confirm that you are on the right track. Some of the cases will be more difficult to resolve. You will need to conduct a more in-depth study and/or invest in improving certain aspects of your security.
The objective is to improve your processes and ways of doing things. Since each major incident is caused by several factors, you limit the risks at a lower cost.