HUMAN FACTORS Part III – The Evolution of Threat and Error Management

The two B737 - Max air crashes that happened within 5 months of each other in 2018/2019 sent shock waves around the planet in aviation circles. As the causes of the crashes became evident, there was an overwhelming fear that aircraft design was once again threatening lives - something not seen since the 1960s.

That fear turned to disgust when it was revealed that Boeing had decided to keep secret from the pilots that would fly and manage the B737 Max, that a new Maneuvering Characteristics Augmentation System (MCAS) had been installed in the aircraft type. It was found that the FAA had approved Boeing’s request to remove any reference to it in Pilot Training Manuals.

The first crash, Lion Air Flight 630 occurred in October 2018. All 189 crew and passengers died. Boeing responded by still maintaining secrecy about the new system and instead, issued instructions for pilots on how to recover from the malfunction of the aircraft entering a series of automated nose dives.

In December 2018 (three months before the second crash attributed to this system) FAA quietly determined that the MCAS system could cause 15 crashes over the next 30 years.

Apparently, the FAA were cool with that. They did nothing. FAA were in cahoots with Boeing to maximise Boeings profits. A trust misplaced.

Considering each B737 Max could carry up to 230 passengers, those 15 potential crashes could kill a total of 4,000 innocent crew and passengers.

Way back in the 1980’s, in Emergency Procedures Training, as I trained to be be a Flight Attendant, we learnt the term for this behaviour - Blood Money - in the context of aviation it means that a certain number of people have to die before an aircraft manufacturer would invest in removing the threat to the life of the passengers and crew. Evidently, 4000 dead people was lower than that number.

Because of the FAA’s misplaced trust in Boeing to do the right thing (which they didn’t do) and not ground the aircraft in a timely manner, a second crash occurred in March 2019.

This time all 157 people died on Ethiopean Airlines Flight 302 - only six minutes after take off. The tech crew attempted to carry out the remedial actions proscribed by Boeing after the first crash but they didn’t get all the way through the check list. Their problems started only 44 seconds after take -off and even though they correctly recognised the situation and began to follow the required steps, they made two critical errors.

The Swiss cheese model was in effect and demonstrated with devasting results.

Captain Bob Henderson discusses the model here.

The holes in the cheese…

Part II of this series explored the development of the Slips, Lapses and Mistakes model of human performance and the circumstance under which these errors could occur.

Professor James Reason was probably the first man to suggest that no system is "perfect" or immune from slips, mistakes or lapses and that trying to eliminate them (as the aviation sector was then trying to do) was a folly as we just could not imagine everything that could go wrong. Since humans were involved at all stages/levels then these would occur at all stages. The Swiss Cheese model, as he first presented had the slices of cheese as lines of defence in an operation or system that is accepted as flawed where there are layers or checks within the system designed to identify and trap such flaws.

In an aviation system for example maintenance personnel are trained to only use tools from a tool board on an aircraft and to always put an ID tag on the board where the tool was taken from. If for some reason this system fails, meaning the tool was not returned, there would a good chance the tool will end up on an aircraft where perhaps a good preflight check by either the despatch engineers or pilots will find it.

The idea of latent or active failures was presented as the model matured with the final slice of cheese representing “active failures”. This final slice came to be known as the last line of defence. Subsequently latent failures became latent factors as they could not be considered as a failure until after the event.

The hypothesis was that, when a number of defences are breached and the holes line up, an accident occurs.

Jim Reason did not draw the Swiss cheese model; that pictorial depiction of his theory was created elsewhere – and it has proved very sticky.

The latent factors are environmental, organisational and structural factors that are within the system. They could, for example, involve an error or omission in a training procedure that only becomes apparent when combined with a failure. The Three Mile Island nuclear accident is a prime example of latent factors enabling a failure pathway with: warning flags showing a closed valve in the feedwater system obscured by a maintenance tag; an indicator light for the relief valve only indicating if the valve was powered rather than the actual valve status; high pipe temperatures being considered normal; and, training being conducted in classrooms not in situ.

This enabled a lot of attention to be focused on the layers of latent issues, which became collectively the systemic issues underlying an accident or, at least, increasing the risk factors involved with any activity.

In 1994 a research programme was undertaken between Delta Airlines and the University of Texas Human Factors group to look at errors on the flight deck. This was the beginning of the Line Orientated Safety Audit (LOSA). The audit was subsequently extended to incorporate decision-making as well as recording errors and error management. This research was headed by Professor Bob Helmreich and run by Dr James Klinect.

This paved the way for the development of the concept of Threat and Error Management (TEM). There are three basic components in the TEM framework. From the perspective of their users, they have slightly different definitions: threats; errors; and, undesired (aircraft) states.

TEM built on existing theory by acknowledging that threats (latent factors) and errors (slips, lapse, mistakes) are part of everyday aviation operations that must be managed by aviation professionals, since both threats and errors carry the potential to generate undesired states. The undesired states carry the potential for unsafe outcomes (which could be an incident, harm to humans or equipment, or an accident).

Undesired state management is an essential component of the TEM framework and is as important as threat and error management. Undesired state management largely represents the last opportunity to avoid an unsafe outcome and thus maintain safety margins in aviation operations.

 

TEM, therefore, is composed of:

  • Threats - generally defined as events or errors that occur beyond the influence of the line personnel, increase operational complexity, and which must be managed to maintain the margins of safety.

  • Errors - generally defined as actions or inactions by the line personnel that lead to deviations from organisational or operational intentions or expectations. Unmanaged and/or mis-managed errors frequently lead to undesired states. Errors in the operational context thus tend to reduce the margins of safety and increase the probability of an undesirable event.

  • Undesired states - generally defined as operational conditions where an unintended situation results in a reduction in margins of safety. Undesired states that result from ineffective threat and/or error management may lead to compromised situations and reduce margins of safety aviation operations. Often considered the last stage before an incident or accident.

Note: “Line personnel” in the context the phrase is used here means air traffic controllers or flight crew.

The capacity and capability of line personal, and especially pilots, to achieve and maintain competent TEM can be assessed through LOSA, which, to be effective, requires confidentiality of the findings and no regulatory or organisational jeopardy to the participants.

The real value of the TEM framework has been in informing:

·         safety analyses in a single event, or understanding systemic patterns within a large set of events,

·         licensing requirements, helping to clarify human performance needs, strengths and vulnerabilities, thus allowing the definition of competencies from a broader safety management perspective, and

·         On the-Job Training and guidance to develop training requirements, helping an organisation improve the effectiveness of its training interventions, and consequently of its organisational safeguards.

Next
Next

HUMAN FACTORS Part II – Slips, Lapses, Mistakes and Violations