Abstract
Objective:
The study contributes to human reliability analysis (HRA) by proposing a method that focuses more on human error causality within a sociotechnical system, illustrating its rationality and feasibility by using a case of the Minuteman (MM) III missile accident.
Background:
Due to the complexity and dynamics within a sociotechnical system, previous analyses of accidents involving human and organizational factors clearly demonstrated that the methods using a sequential accident model are inadequate to analyze human error within a sociotechnical system.
Methods:
System-theoretic accident model and processes (STAMP) was used to develop a universal framework of human error causal analysis. To elaborate the causal relationships and demonstrate the dynamics of human error, system dynamics (SD) modeling was conducted based on the framework.
Results:
A total of 41 contributing factors, categorized into four types of human error, were identified through the STAMP-based analysis. All factors are related to a broad view of sociotechnical systems, and more comprehensive than the causation presented in the accident investigation report issued officially. Recommendations regarding both technical and managerial improvement for a lower risk of the accident are proposed.
Conclusion:
The interests of an interdisciplinary approach provide complementary support between system safety and human factors. The integrated method based on STAMP and SD model contributes to HRA effectively.
Application:
The proposed method will be beneficial to HRA, risk assessment, and control of the MM III operating process, as well as other sociotechnical systems.
In the operating process of sociotechnical systems (such as aviation, marine, military, and nuclear industry), a large proportion of accidents are human-related. Griffith and Mahadevan (2011) stated that human error has been implicated in 30% to 90% of all industrial accidents. Human reliability analysis (HRA) deals with evaluating the risks caused by human error (Boring & Bye, 2008; Boring, Oxstrand, & Hildebrandt, 2009; Ung & Shen, 2011). In general, HRA has three fundamental functions (Ung & Shen, 2011): the identification of human error (Antonovsky, Pollock, & Straker, 2010; O’Connor, O’Dea, & Melton, 2007), the prediction of the likelihood associated (Gertman, Blackman, Marble, Byers, & Smith, 2005; van de Merwe, Øie, & Gould, 2012), and the reduction of the likelihood if required (Boring, 2009; Sheridan, 2008). There are numerous HRA methods that provide procedures for these fundamental functions, such as technique for human error rate prediction (Vaurio, 2001), human error assessment and reduction technique (Williams, 1988), and others. Although these HRA methods have solved a large number of problems in some areas, there are some limitations when analyzing causality within a sociotechnical system (Paletz, Bearman, Orasanu, & Holbrook, 2009; Strauch, 2010). One of the major limitations is that few existing models attempt to provide a causal picture of human errors (Groth & Mosleh, 2012). Another major limitation is that most HRA methods are static analyses, and the calculated likelihood of human error (LHE) is constant under the given conditions. However, when considering the dynamic behaviors within a sociotechnical system, the LHE is not necessarily stable but may vary enormously with social and technical factors such as management, personal moral state, and so on.
To gain a good or better understanding and a comprehensive awareness of human error in sociotechnical systems, it is significant to use a systemic approach to consider all the aspects that may lead to hazardous events. Systems-theoretic accident model and processes (STAMP; Leveson, 2011) is one of the typical systemic models. STAMP provides an approach to analyzing human error, equipment failure, and other components that may cause accidents, and it is appropriate particularly to complex sociotechnical systems.
System Dynamics (SD) deals with the time-dependent behavior of managed systems, with the aim of describing the system and understanding, through qualitative and quantitative models, how information feedback governs its behavior, and designing robust information feedback structures and control policies through simulation and optimization. (Coyle, 1996, p. 10)
When analyzing human factors in accidents, SD can be taken as a means to depict the system behaviors, including the interactions among human error and other contributing factors, whereas the mechanism within the system behaviors is modeled with STAMP.
This paper aims to propose a HRA method by highlighting causality within a sociotechnical system. This method will be capable of accommodating modeling of all the factors that contribute to human-related accidents, including performance shaping factors (PSFs) that should be considered in a sociotechnical system. First, STAMP approach is used to form a generic framework of human error causal analysis. Next, to model and analyze the behavior dynamics of human error, the SD model is established based on STAMP-based analysis results. Furthermore, the recommendations on risk mitigation are proposed to support effective control of human error.
Method
STAMP was proposed based on the cybernetics of systems theory (Leveson, 2011). In STAMP, the most basic concept is not “event,” but “constraint,” which means that safety becomes a control problem where the goal of the control is to effectively enforce the safety constraints. The cause of an accident, instead of being understood as a series of events in the sequential accident models (e.g., domino [Darbra, Palacios, & Casal, 2010], Swiss cheese [Reason, Hollnagel, & Paries, 2006], etc.), is viewed as the result of inadequate control or enforcement of the safety-related constraints during system development and operation (Leveson, 2011). The sociotechnical system is accordingly viewed as a hierarchical control structure, where the controllers at each level impose constraints on the activities at the lower level(s).
To depict the dynamic processes responsible for the changes in the system, and to build the safety control structure with STAMP, the SD modeling is adopted as an effective tool. Causal loop diagram (CLD) and stock and flow diagram (SFD; Dangelicon et al., 2010) are commonly used in SD modeling to illustrate the interacting structure within the system to be investigated. Causal loop diagramming, as an inherent feature of SD, is a qualitative technique used to model the real world (Sterman, 2000). A CLD seeks to highlight complex interactions and feedbacks between variables, where causes and effects are often indiscernible (Goh, Love, Stagbouer, & Annesley, 2012). If quantitative SD analysis is required, the CLD can then be converted into the SFD provided that the underlying mathematical correlations between the variables are set. It is important to note that the data collection should be sufficient to validate the SFD model and verify its output.
Based on STAMP and SD, a formalized framework is developed, as shown in Figure 1, to conduct HRA with an accident taken as a case. The framework consists of the following six components, which are cascaded to demonstrate how the data extracted from real world end up contributing to actions that can backward make the real world safer, through the efforts in regard to STAMP-based HRA thinking.

The STAMP-based HRA framework.
Component A: Description of the Human-Related Accident Timeline and Background
For the human-related accident to be analyzed, the timeline needs to be identified first so that the operating process involved can be understood. To identify the potential human error, the human activities should be highlighted along the timeline. Besides, as much information associated with the accident is also required as possible. Both the efforts are necessary to provide STAMP-based HRA with the supporting information in the following components.
Component B: Description of the Accident Causes Identified in the Official Accident Report
Abundant information about accident causes could be extracted from the accident report issued officially, to facilitate the understanding of the mitigation in human error or other risk aspects. However, it is more favorable if such insights can be obtained before accidents occur than after that. With the aim of comprehensively understanding the causality and thus having the foresight to prepare technologically in case of accidents, a broad view of accident mechanisms can be analyzed based on the data coming from the official accident report.
Component C: Modeling of the Sociotechnical System With STAMP to Describe the Mechanism Within the System Behavior
Step C-a: Identify human errors and all the human-related contributing factors in the accident
First, human errors and all the human-related contributing factors must be identified. Systematic analysis based on STAMP is needed because people are facing increasingly complex accident processes where the factors contributing to human error often exist in almost all the aspects of a sociotechnical system. Leveson (2011) provides a generalized accident analysis process called CAST (causal analysis based on STAMP), from which some core steps are borrowed and adopted with some degree of modification to analyze human-related accidents. Step C-a is described in Table 1, where Steps 1 and 3 are borrowed from CAST (Leveson, 2011).
Step C-a of STAMP-Based HRA
Step C-b: Analyze the causal relationships among human errors and human-related contributing factors, and identify the PSFs
The causal relationships include two parts: the effects between human errors and human-related contributing factors, and the effects among only the contributing factors. The former is usually analyzed prior to the latter. According to the degree of influence of the human related contributing factors on human errors, the identified contributing factors are then classified into two types: direct influencing factors and indirect ones. In particular, the feedback is one of the basic concepts in STAMP, acting an important role when safety is treated as a control problem. There are feedback effects of human errors on other social and technical factors such as management, personnel, and so on. All the feedback effects of human errors on other factors should be identified as well. When determining PSFs, the direct human-related contributing factors are considered. Gertman et al. (2005) provides the references that the ideal mean LHE can be taken as a function of PSF influence, showing that the relationships between LHE and PSFs are positively correlated, that is, LHE increases as the negative influence of the PSF grows.
Component D: System Dynamics Modeling
Step D-a: Determine the CLDs
The causal relationships identified in Component C are used to construct the CLD. The polarity of link between a causing factor and the one affected by it is determined for each pair of them, with the causal loops and their types identified. The time lags are marked based on the time-delay-related contributing factors identified in Component C, and on the objective control or feedback delay between the higher and the lower levels in control structure.
Step D-b: Discuss the dynamics in system behavior and human error
Loop dominance is an important concept in SD as a shift in loop dominance is responsible for most of the nonlinear behavior of complex systems (Goh, Love, Brown, & Spickett, 2012; Richardson, 1995). It is adopted here to analyze the possible dynamics in system behavior and human error by analyzing the migration from the favorable state (safety) to an accident. The potential change from a dominant loop into another is discussed to identify the key factors and effects during the migration, which will consequently be given a higher priority to consider when making recommendations on risk mitigations.
Component E: Comparison of C to B, and of D to B
As mentioned in Component B, the analysis results in Components C and D are compared with the investigation and analysis shown in the official accident report to determine any more of constructive information (i.e., more elaborated or comprehensive description of accident scenarios) concluded from the approach proposed in this paper than demonstrated apparently in the accident report. The comparison assists in comprehensively understanding the mechanism of the accident and human error, as well as the dynamics in the system state migration, so as to prevent future accidents.
Component F: Actions to Control Human Error and Improve Safety
The feasible recommendations for the risk mitigations are proposed to control human error and improve safety.
Minuteman (MM) III Case Study
By taking a human-related fire accident during the MM III operating process as a case, the application of the proposed method to the improved HRA is illustrated in this section.
Component A: Description of the Human-Related Accident Timeline and Background
On May 23, 2008, a fire broke out in the MM III launch facility (LF) A06, located near F. E. Warren Air Force Base (AFB), Wyoming. Fortunately, the fire did not damage the missile. According to the accident report (Walker, 2008) issued officially, the fire caused an estimated $1,029,855.77 in damage. The direct cause of the accident was related to a loose connection on the capacitor C101A of the battery charger inside A06’s launcher equipment room (LER), but human error played an important role in the accident causes as well. Table 2, refined from the accident report (Walker, 2008), shows the timeline of the human-involved situations and other factors.
Timeline of the MM III Accident
Source. Adapted from Walker (2008).
Component B: Description of the Accident Causes Identified in the Official Accident Report
In the Statement of Opinion section of the accident report, Walker (2008) stated that “the loose connection was most likely caused by the failure of the technician who installed the capacitor to securely fasten the nut,” and concluded that the following factors substantially contributed to the accident (Table 3).
Contributing Factors to the MM III Accident
Source. Adapted from Walker (2008).
Walker (2008) presented a reasonable explanation for the accident, and expounded the causation especially concerning the component failure and human error in the operating and maintenance process. However, further analysis by looking into causality among the technical, organizational, and managerial factors can be performed based on the views of Walker (2008).
Component C: Modeling of the Sociotechnical System With STAMP to Describe the Mechanism Within the System Behavior
Step C-a: Identify human errors and all the human-related contributing factors in the accident
The analyses of steps in Table 1 are described later. Step 1: For the hazard of the fire, the safety constraint at the system level can be defined as follows: to avoid the concurrence and the interaction of flammable substances, oxidizer, and ignition source.
As implied in the accident description, the hierarchical control structure to enforce the safety constraint is shown on the left side of Figure 2. All the controllers in the hierarchical structure, including the operators and managers, play roles in enforcing the system-level constraint to prevent the fire accident. The specific safety-related requirements and constraints for each controller are shown on the right side of Figure 2.

The overall safety control structure and the safety constraints.
Step 2: The control structure in the operating process at the time of the accident is shown in Figure 3, where the dotted line means the ineffective or missing controllers and control/feedback channels. The causal analysis results are shown in Table 4, including the identification of each controller’s behaviors or states that can lead to the accident, together with the classification of the contributing factors identified, and some basis for judgment as well as assumptions. The identified contributing factors in the operating process are numbered by OP (short for operating process).

The control structure in the operating process at the time of the accident.
The Causal Analysis Results Based on STAMP for the Operating Process
Step 3: The expanded insight into the accident causes based on the views shown in the accident report is obtained in this step. The analysis results are summarized in Figure 4 and Table 5. Similarly to Step 2, the identified contributing factors at the higher levels of the control structure are numbered by MD (short for management department).

The control structure analysis toward the controllers at the higher levels.
The Causal Analysis Results Based on STAMP for the Higher Levels of the Control Structure
Step 4: Based on the contributing factors identified in Tables 4 and 5, all the human errors in the operating and maintenance process, which are numbered by HE, are summarized in Table 6.
Human Errors in the MM III Accident
Considering the complex causal relationships among the contributing factors and human errors, all the identified human-related contributing factors are categorized according to their relationships with human error and shown in Table 7 to facilitate the analysis in Step C-b.
The Categorization of the Identified Human-Related Contributing Factors
Step C-b: Analyze the causal relationships among human errors and human-related contributing factors, and identify the PSFs
The analysis results of the effects among human error and human-related contributing factors are listed in Table 8. According to the categorization of identified human-related contributing factors, the analysis is elaborated as follows.
The Summary of the Identified Causal Relationships
Equipment technology level (Effect 1)
The inadequate human-related control (HE1, HE2) in the operating process was exacerbated with the degraded equipment technology level (OP1.4). As the supporting site equipment in the LF were still those for the older Minuteman II (“Explore the US ICBM Network,” n.d.), the equipment technology level since the MM II version (completed in 1967) was in a decreasing trend year by year, according to the reliability theory. The false alarms caused by aging of the equipment slackened vigilance of the operators, and could increase the occurrence of human error.
Experience (Effect 2)
The MCC dealt with GMRs in terms of their obsolete experience (OP2.4), which was related to the HE1. The MCC allowed flammable substances to be there according to their obsolete experience related to HE2. Because there was no training task for the battery charger modification (MD2.7), 582 MMXS performed the modification based on the obsolete experience related to HE4. The personal experience is influenced by accumulation of the training and working experience, and has a direct effect on human error.
Task pressure (Effect 3)
The pressure from resources or finance on supervision and management of the equipment (MD1.5) may cause missing or wrong implementation of site supervision, which is due to the shortage of hands or resources. This has a direct effect on HE2. The task pressure on supervision and management of the equipment (MD2.8) may cause a delay or stop in the equipment improvement, such as replacing the shotgun or duct tape with nonflammable materials, and it has a direct effect on HE4. From the view of human error, the task pressure is related to mandatory tasks and available human resources.
Personal commitment to safety (Effect 4)
The inadequate control action of the MCC (HE1) and the delayed control action of the MMOC (HE3) reflected a weak personal commitment to safety in both the mindset and actions (OP2.3 and OP3.2). It was the long term without any accident that the weak personal commitment to safety was exacerbated by, which definitely meant a high risk. Hence, personal commitment to safety has a direct effect on human error, and it is influenced by the safety culture involved in management commitment to safety.
Management commitment to safety (from Effect 5 to Effect 10)
Management commitment to safety has an indirect influence on human error. The contributing factors from training, task planning and safety culture in the higher levels of control structure indicates the insufficient attention of managers paid to safety. The flawed management of training and task planning affects personal experience and task pressure. The weak safety culture influences personal commitment to safety that consequently influences human error directly.
Management commitment to launching capability (Effect 11)
The main objective of a missile wing is to get more powerful launching capability. It has an unapparent influence on human error, whereas it has an effect on management commitment to safety. Due to the pressure from national strategies and commitment to launching capability, most managers tend to think the launching capability is more important than safety (MD4.4, MD4.5, MD4.10).
Human error feedback (1, 2, 3)
Based on the safety control structure, the managers should create control according to the information or reports got from feedback channels. When human error is focused on, the LHE can be viewed as a type of report, which is a measurement of system risk. When human error is detected by managers, related response will be made by managers, just as the incident rate as a risk measurement in references (Mohaghegh, Kazemi, & Mosleh, 2009; Ouyang, Hong, Yu, & Fei, 2010; Zhang & Li, 2009). Hence, the feedback effects are reflected in management commitment to safety (human error feedback 2) and to launching capability (human error feedback 3). When personnel operated in process and were aware of their error, personal commitment to safety will be increased to improve their behaviors, and thus the other feedback effect of human error (i.e., human error feedback 1) is put on personal commitment to safety.
In summary, equipment technology level, experience, task pressure, and personal commitment to safety have direct influences on human error, and they should be identified as PSFs, whereas management commitment to safety and management commitment to launching capability have indirect ones.
Component D: System Dynamics Modeling
Step D-a: Determine the CLDs
All the causal relationships among human errors and human-related contributing factors identified in Component C are combined with the basic structure of CLD (Figure 5), and thus the causal relationships in Figure 5 are essentially based on the control and feedback among controllers or levels in the control structure. In the form of causal loops, all the relationships in the operation system involved in the MM III accident are described; also, the development mechanism of human error identified by means of STAMP-based analysis is presented.

The causal loop diagram (overview) of the MM III operation system.
Loop B1 represents the dynamics of training efforts to accumulate the personal experience and control human error, made by managers at the higher level in the control structure. Similarly, Loop B2 and B3 represent the task planning efforts to release personal pressure and the safety improving efforts to enhance personal awareness respectively. Although it appears that the three loops are helpful in reducing the LHE, it is critical to note that there tends to be time lags between the strategy of management commitment to safety and the activity implementation (Effects 8, 9, 10), as well as between the LHE and the higher level managers (MD 4.6). In practice, this means it takes a longer time to rectify human error through more careful study, consideration, and orders assigning at the high levels of the control structure. Therefore, this delay facilitates the tendency of personnel to make immediate “knee-jerk” changes at the operating process level. Loop B4 exactly represents the dynamics of personal efforts on the system, in which the knee-jerk reactions have positive effects on human error without delay.
Loop R1 represents the military objective of launching capability pursued by the higher level managers in the control structure. This loop is significant because it shows that human behaviors in the operating process can be affected negatively after time delay when the managers at the higher level hold different opinions and some time is needed for negotiation. More specifically, management commitment to safety is affected negatively when management commitment to launching capability dominates due to the increasing level of human error or degrading equipment technology level; accordingly, the safety policies will not be implemented effectively through Effects 8, 9, and 10, and then the LHE out of control will lead a reinforcing tendency to management commitment to launching capability again.
Step D-b. Discuss the dynamics in system behavior and human error
As discussed in Step C-a, the control flaws occurring in the control structure reflect the violation to the safety constraints at the time of the accident. Before the accident, the system state has been in a migration with an increasing risk over time. Two states in the migration, that is, state in control and state out of control, are assumed here to discuss the possible dynamics in system behavior and human error.
In the former state, the effects of the four PSFs on human behavior are appropriate, leading to a lower LHE. All the causal loops tend to be relatively stable, and there is no dominant loop(s). However, the equipment technology level caused by the lack of timely equipment improvement management plan (MD 2.4) degrades over time. If the degradation is serious enough to increase the LHE, the human-related risk tends to be controlled by personal knee-jerk efforts first (Loop B4 dominates).
After the time delay discussed in Step D-a, the long-term safety activity plan from the higher level managers are implemented (Loop B1, B2, or B3 dominates). During this period, the LHE may fluctuate due to the time delay.
Furthermore, the military objective of launching capability is enhanced by the feedback from the increasing LHE and the continuously decreasing equipment technology level. management commitment to safety is weakened by the commitment to launching capability (Loop R1 dominates).
Finally, even the fullest amount of attention to safety paid by personnel or managers can hardly reverse the increasing trend of LHE, when the system situation has been worsened excessively. Under this circumstance, the state out of control will occur sooner or later.
Component E: Comparison of C With B, and of D With B
Through the STAMP-based analysis in Component C, 41 contributing factors were identified, which covered the accident causes demonstrated in the official accident report (Component B). Unlike the causes officially issued that are primarily focused on component failures and human errors obviously, the factors identified in this paper are related to a broad view of sociotechnical systems. From the systemic view of STAMP, an uncontrolled risk in sociotechnical systems is hardly avoided, if through eliminating only the individual component failures and human errors instead of through the comprehensive control based on the cognition of interactions among the multiple risk-contributing aspects.
Through the SD modeling in Component D, two critical deficiencies were identified. One is that the different opinions about safety and launching capability among the managers at the higher level will affect the human behaviors negatively. The other is the lack of control and feedback channels from the equipment technology level to the managers, which causes continuous degradation of the equipment.
Component F: Actions to Control Human Error and Improve Safety
Walker (2008) did not give recommendations for the prevention of future accident. It is supposed here that the recommendations based on the accident causes officially recognized are referred to but further expanded and developed in the way discussed earlier. Based on the analysis in Component E, the recommendations of risk mitigation (covering both technical and managerial aspects), as well as the time for implementation are proposed for safety improvement, as shown in Table 9.
The Mitigation Recommendations for Safety Improvement
Conclusion and Future Work
Within the HRA community, there is a widely acknowledged need for an improved HRA method with a more robust scientific basis. The STAMP-based HRA framework presents two significant advances in developing a method like that.
First, the systemic model STAMP was applied to causal analysis. The traditional HRA methods tend to be used with focus on the easily sequential and generally low-level tasks, which are not the main source of systemic errors. This paper analyzed a human-related accident and identified contributing factors and causal relationships based on STAMP, since it is significant to understand accidents and human errors by considering the causality within a sociotechnical system. In the demonstrating study where the MM III accident was taken as a case, 41 contributing factors were identified based on an overall view of the system, and more information regarding accident causes were concluded that is an extension of the analysis in the accident report issued officially. Related managerial and technical recommendations were also given.
Second, based on the dynamics analysis by SD models, loop dominance in CLD was identified to analyze the possible migration in system behavior and the human error from the state “safety” toward the state “accident.” The key factors or effects in regard to the migration were discussed and then given higher priority to consider when making strategies or recommendations.
In addition, the development of a CLD such as that identified in Figure 5 provides building blocks to construct a SD simulation model, which is often used to assess the impact of possible interventions. In the following-up work, the CLD will hopefully be converted into a quantitative model, i.e., a SFD will be created which is made up of stocks linked with flows in accordance with SD rules. Stocks represent the accumulation of matters or information; flows refer to the rate of change of the stock. Auxiliary or intermediate variables will also be inserted as functions of stocks, constants, or exogenous factors. SFD simulation will require quantitative equations to be defined among the stocks, flows, and variables. Once established, the SFD model can then be used to quantitatively analyze how a human-related system behaves and responds to different changes in the variables, stocks, and flows. In comparison with CLD, the simulation of SFD will be more rigorous, but less accessible and understandable by managers (Goh, Love, Brown, et al., 2012; Richardson, 1995). Also, as an extension of research, the model proposed herein will be converted into a SFD simulation model to test a variety of strategies of risk control and to evaluate the results got from a wider range of case studies. The simulation model can then be used to enhance the CLD so that both the rigor and accessibility of the method can be achieved.
Key Points
The traditional HRA methods have limitations when dealing with causality within sociotechnical systems, as systems increase in complexity.
To improve HRA, the STAMP approach was used to develop a formalized framework of human error causal analysis.
A SD model was built to explore the causal relationships that lead to the migration in the system state toward an accident; based on the SD model, the dynamics analysis was conducted.
The case study was focused on a typical accident (the MM III missile accident occurred in the United States in 2008). The case illustrated how the human-related factors have a critical impact on system risk, and provided risk mitigation recommendations.
Footnotes
Acknowledgements
This work was supported by a grant from the Major State Basic Research Development Program of China (973 Program) (No. 2014CB744904).
Hao Rong is a postgraduate of Beihang University. He obtained his MS degree (2013) in control science and engineering from the School of Reliability and Systems Engineering. His research is focused on system safety, risk assessment, hazard identification and analysis, and human factors.
Jin Tian is an assistant professor in the School of Reliability and Systems Engineering, Beihang University. She earned her PhD (2007) and BS (2002) degrees in systems engineering from Beihang University. Her research activities are focused on system safety, risk assessment, hazard identification and analysis, reliability of products, system engineering, and so on.
