Abstract
This study explores how experimental flight test professionals manage catastrophic risk in complex socio-technical systems. Using an ethnographic, mixed-methods approach with survey (n = 49; validation n = 21), interviews, and observation of flight test practitioners, the research found that flight test teams maintained both statistical (Bayesian, ISO 31000-based) and non-statistical (Precautionary, heuristic) methods in parallel. Statistical tools were applied where repeatable, deterministic elements permitted probabilistic reasoning. While resource-intensive Precautionary controls and experienced-practitioner heuristics were used where novelty, emergence, and an absence of prior knowledge precluded calculation of the probability of a hazard. Practitioners commonly reported corporate mandates to probability approaches but treated those outputs with caution or disregard when clearly unsuitable. The dual approach traded efficiency for effectiveness, ensuring identified risks were managed even when statistical measures were unreliable. Aligning statistical tools with deterministic system elements, and precautionary, heuristic approaches with complex system elements offers a pragmatic, empirically grounded template for managing catastrophic risks in complex systems. This case study invites managers and researchers to reconsider reliance on probability where uncertainty is irreducible.
Introduction
Managers increasingly confront systems whose behavior is not merely complicated but genuinely complex. Whereas complicated systems are characterized by many but ultimately knowable interfaces, contemporary complex systems exhibit dynamism and functional emergence (Sterman, 2000) that continually alter system behavior. This temporal mutability renders system knowledge perishable, undermining the assumptions that support probability–based decision making under epistemic uncertainty (Paté-Cornell, 2012). As a result, risk management frameworks developed for static, complicated systems lose effectiveness when applied to complex systems (Luther et al., 2023). This paper reports on research conducted to learn how experimental flight test professionals respond – by maintaining parallel statistical and non–statistical risk practices. Aligning the attributes of risk management tools with the level of determinism is the system can achieve effective risk control where conventional probabilistic approaches alone would be inadequate.
The Boeing 737 Max accidents of 2018 and 2019 starkly illustrated this distinction when a company test pilot individually identified the hazard associated with the Maneuvering Characteristics Augmentation System, though the manufacturer's organizational risk framework failed to manage it (US Government House Committee on Transport and Infrastructure, 2020). Experimental flight test crews routinely demonstrate a capacity to manage catastrophic hazards during the development of novel technologies, and this research examines their practices using an opportunistic case study. As an initial academic recognition of this framework in industry, the study does not claim to be definitive but instead offers a contribution that opens a line of inquiry using the flight test approach to inform broader theories of risk management in complex socio–technical systems.
Risk Management in Flight Test
Flight testing provides feedback to aviation systems development, a continual maturing that renders the underlying system dynamic. Experimental flight test operates complex systems for which there is no direct prior knowledge – by definition theirs’ is the first use. While there may be extensive modelling and simulation behind the design, these postulate scientific theories of performance rather than certainty of outcomes. In such circumstances, there is effectively no certain knowledge of how the system will behave in flight; the aircraft may not perform as expected, and the consequences would be fatal for the crew. This creates a continual n = 1 problem with statistical data in flight test, hampering probabilistic approaches to risk management.
The term safety is commonly associated with risk in flight test, recognizing safety as the set of mitigations to risks with consequences adverse to human health. The classification of risks as catastrophic stems from the permanent nature of fatal consequences. In the manner of Grechuk and Zabarankin (Grechuk and Zabarankin, 2014), such risks are modeled within a probabilistic setting that allows for outcomes representing catastrophic consequences unmeasurable in monetary terms, for example the loss of human life occurring with non–zero probability. This framing highlights the unique challenge faced by flight test crews, who must manage risks that cannot be meaningfully bounded within conventional probabilistic or financial measures.
This research investigated the risk management framework of the flight test community with a view to elucidating an exemplary framework for effective risk management in complex systems. Section 2 of this paper presents the research Methodology, and Section 3 addresses supporting academic Theory. Section 4 presents the Results, while Section 5 Discussion analyzes the results and examines the limitations to the research. Section 6 concludes with the attributes of an effective framework for risk management in complex systems and suggests a direction for future work.
Methodology
Study Design
The study employed an ethnographic, mixed–methods approach, drawing on observation, survey and interview of flight test professionals. The observed flight test operations existed independently of this research, so randomization and treatments were not undertaken (Rosenbaum, 2009). Nor would they have been ethical. The study to learn flight test risk management practices was guided by Layder's (Layder, 2013) small scale research methods, while the two-step systematic approach of Gioia, Corley (Gioia et al., 2012) using First Order Concepts supported by Second Order Themes, assured rigor. Observation of flight test risk management practices was enabled by a member of the research team who has experience in the conduct of experimental flight test, thereby enabling prolonged proximity to the community, direct observation of practices, and the contextualization and interpretation of the data collected.
The scope of this research was necessarily bounded by the opportunistic case study that presented itself, taking advantage of access to a unique approach to risk management in a critical-case operating environment. The methodology reflects the inherent limitations of such a study, including the small global population of practitioners and the simple treatment of interview data. While multidimensional techniques such as latent analysis, item response theory, or Bayesian network models, could in principle yield deeper insights into this empirical inquiry, the study was designed as an exploratory case study of a real–world, independent activity. Given the limited sample size and the ethnographic orientation of the research, such advanced methods were not feasible. Instead, the analysis emphasized thematic convergence and identification of practitioner heuristics, consistent with the study's aim of uncovering a distinctive risk management framework rather than validating a model. Nonetheless, this represents the first academic recognition of the flight test community's distinctive framework, providing a foundation for further inquiry. The research is not intended to refine flight test practice itself, but to reflect on wider approaches to managing risk in complex sociotechnical systems. By examining the characteristics of this distinctive framework, the study offers exploratory insights that may inform the development of more resilient risk management strategies in other high consequence environments.
The study was focused upon risk management practices employed in experimental flight test upon complex, socio-technical systems with the possibility of catastrophic consequences. The tight definition ensured the level of complexity was consistent, specifically: dynamic systems (Sterman, 2000), featuring emergent functions (Vester, 2012), with blended control between human and technical elements (Leveson, 2012), but without the potential for knowledge of the frequency of outcomes (Viscusi and DeAngelis, 2018). The survey was trialed to ensure consistent understanding and application. The terminology used was the vocabulary of the English language Test Pilot Schools, the professional Flight Test Societies, and the ISO 31000. Ethical clearance for the research was obtained from the University of Adelaide, Office of Ethics and Integrity.
Interviews were conducted in person and via video conferencing. The survey questions were used to initiate and direct discussion toward the framework underlying the participant's flight test risk management activity.
Participants
Research participants were screened to include only those that practiced flight test in English, inside the last 5 years, as part of organizations undertaking developmental flight test. Membership of a developmental flight test organization ensured that study participants were part of a group in which there was opportunity for mentoring. Specifically, the passage of risk management heuristics between staff.
Research participants were employed by Western militaries or aircraft manufacturers, with sufficient qualifications or experience to gain admission to the professional Flight Test Societies. 1 Survey participants were recruited from the Flight Test Societies’ professional conferences, with participation being entirely voluntary. Participants were not paid or reimbursed and could remain anonymous.
Procedures
An online survey tool was utilized to deliver the survey and it randomized the presentation of alternative selections. Of 81 interactions with the survey, 49 valid responses were gathered, each referencing risk management for a complex socio-technical system, with potentially catastrophic consequences in at least one risk element. No questions were required to be completed to submit a valid response and consequently, there are incomplete sections in the dataset. Three demographic questions were used to screen responses to ensure recency (<5 years) of suitable experience, that the subject flight test system featured catastrophic consequences to the respondent, and that the referenced system was complex.
Particular survey respondents were notable for their engagement. Noting the propensity for bias in this group, their eagerness to share their experiences managing risk in complex systems was utilized in follow-up interviews. All interviewees completed the survey prior. Interviews were sought and conducted to the point of thematic saturation.
Data Analysis
The raw data was coded to reveal themes and patterns to inform our research questions, while the analysis maintained linguistic approaches to capture qualitative nuance. The analysis of the framework was undertaken per the principles of Repenning (Repenning, 2021), focusing on the framework as the generalization of knowledge for interdisciplinary use. The Cynefin (Snowden and Boone, 2007) ontological framework was utilized in the analysis.
Validation
Validation was undertaken 18 months after the initial data collection, employing a similar ethnographic approach with survey (n = 21) and interview (n = 5) of a new cohort of participants from the same professional group. This process presented the research findings back to practitioners for confirmation of accuracy, mitigating potential biases inherent in self-reported data. Several strategies were employed to strengthen the validity of the results: participants were carefully filtered to ensure their direct relevance and experience in the flight test domain; anonymity was assured to reduce social desirability bias and encourage candid responses; and the survey instruments were designed to minimize leading questions, eliciting consistent, comparable data. The initial research was used to develop Causal Loop diagrams (Sterman, 2018) of the flight test risk management framework, which were subsequently employed in the validation activity. These were recognized by practitioners as a valid depiction of generalized process.
Theory
Rasmussen's (Rasmussen, 1997) observation that complexity obscures risk was elaborated by Amalberti (Amalberti, 2001) when he identified that contemporary complex systems were insensitive to risk management practices that had previously proven effective in complicated systems. Amalberti noted the presence of an asymptotic accident rate in complex systems and this study similarly identifies that asymptotic tendency when risk management frameworks designed for complicated systems exhibit limited effectiveness against complex systems. Sterman (Sterman, 2000) provides insight when differentiating complicated systems with many interfaces from those with dynamic interfaces, and Leveson (Leveson, 2004) identifies emergence as a defining attribute of complexity. Together, these perspectives provide an explanation as to why conventional frameworks plateau in effectiveness when confronted with complexity. The broader theoretical development of this argument is consolidated in (Luther et al., 2024), tracing the evolution of risk management frameworks alongside the emergence of complexity in socio–technical systems.
The flight test community has lore, transmitted through heuristics, that supports time–critical decision making and strategic process controls under conditions of uncertainty when catastrophic consequences would preclude adaptation. Viscusi and DeAngelis (Viscusi and DeAngelis, 2018) studied this challenge through a two–armed bandit problem terminated at the first loss, analogous to flight test where a fatal outcome precludes adaptation. Such operations also challenge notions of robustness and resilience, since fatal losses cannot be recovered. Instead, flight test professionals have evolved a framework that manages risk in complex systems without recourse to resilience. In this paper, the term risk management follows Borgonovo, Cappelli (Borgonovo et al., 2018) in referring to discrete processes of analysis, assessment, and management, excluding subsequent decision analysis. Responding to Borgonovo et al.'s call for a generalizable framework, and consistent with Komljenovic and Loiselle's (Komljenovic et al., 2017) view of organizational learning as transferable across technologies, this research examines the profession of flight test as a working example of effective risk management in contemporary complex systems, of the kind observed by Amalberti to be unresponsive to conventional approaches. This research offers a case study of the flight test community's risk management framework as an example of an approach that has proven effective in managing catastrophic hazards under conditions of dynamism and emergence, thereby illustrating how theory can be translated into practice.
Results
Survey
Risk Tools
The survey queried the risk management tools in use, permitting multiple selections and natural language responses. The survey queried the risk management tools in use, permitting multiple selections and natural language responses. Question 1 provided respondents with a list of risk management tools and asked them to indicate which were in use within their organisation. Table 1 presents statistical tools and Table 2, non–statistical tools.
Reported Usage of Statistical Risk Management Tools.
Reported Usage of Statistical Risk Management Tools.
Reported Usage of non-Statistical Risk Management Tools.
The natural language responses elaborating the risk management tools in use elicited the following themes: Implementation of risk management practices in the style of ISO 31000, featuring a two–dimensional risk model with consequence and likelihood as the independent variables, was found to be approaching universal. No instances of alternative models were encountered. While participants often described their approaches as proprietary applications, this reflected a convergence toward the model codified in ISO 31000, rather than recognition of the standard itself. The dominant non–statistical approach to risk management was reliance on experienced practitioners, whose expertise and insight informed the management of complex risks. Experienced practitioner knowledge was integrated within collaborative work teams, transmitted through mentoring relationships, and institutionalized through formal review gates in the systems engineering process. Collectively, these multi–layered mechanisms rendered practitioner input pervasive, with its nomination or recognition found to be almost universal. Dissecting the flight test task, and/or implementing a step-wise approach was referred to as an incremental build-up. There was a vague conception that this strategy reduced the total risk, although participants rarely elaborated on the mechanism by which this was achieved. In practice, the approach was being implemented as a break–up of the task, an arbitrary parsing of the overall flight test activity into sub–tasks on the basis of workload, resource consumption, schedule, or some perception of divisible parts. This approach is analogous to, yet distinct from, the Precautionary approach (Cox, 2009), which deliberately partitions a task into increments, each constrained by an acceptable level of risk, thereby enabling completion of the overall task even where the aggregate risk would be deemed unacceptable. Modelling and simulation were nominated as risk management tools and commonly associated with an incremental build-up. The model was used to baseline expected system performance, against which participants were sensitive to deviation as an indicator of future, further deviation. Formal validation of the model against the system was not reported. The use of a Test Hazard Analysis (THA) was frequently noted. Reported THA were each an implementation of ISO 31000 process utilising survey of experienced practitioners, a non-systematic literature survey and System Safety Assessments (SSA) to identify hazards and generate a hazard log. Bow-tie and Two-Dimensional Risk Analysis Matrices (2DRAM) were used for analysis. In their nomination of THA as a tool, participants didn’t recognize THA as a framework. The use of feedback was favorably reported. Closed loop feedback was rare, being constrained to Hazard Log review. Open loop feedback was prevalent, part of the generative safety culture (Reason, 1997) inside experimental flight test communities where it takes the form of broadcasting stories to convey safety lore. The overhead associated with non-statistical approaches was reported as being onerous, with reported cases of reverting to 2DRAM. Though access to non-statistical approaches to risk management were retained, statistical approaches were used in the first instance.
The survey queried hazard identification practices within the risk management framework and sought practices that would assure completeness. Table 3 categorizes processes undertaken to assure the breadth of consideration toward identifying hazards. The most frequent assurance practice for hazard identification was to baseline expected outcomes with experienced practitioners. Responses with an assurance activity were split across formal review and a review embedded in management procedures, though collectively, they account for two-thirds of responses.
Tools / Actions / Processes Taken to Assure Breadth of Consideration of Hazard Identification Activity.
Tools / Actions / Processes Taken to Assure Breadth of Consideration of Hazard Identification Activity.
Questions in the survey sought to understand the use of statistical tools within the framework. Table 4 illustrates a majority of participants reporting that a measure of a statistical probability of a hazard was factored into their referenced risk management practices. Two participants commented that the likelihood definitions applied in 2DRAM were modified to be “semi-quantitative”. Their definition of that term remains unknown.
Examining the use of Probability in Risk Management Within Organizations Undertaking Experimental Flight Test.
Examining the use of Probability in Risk Management Within Organizations Undertaking Experimental Flight Test.
The survey queried the validation of assigned quantitative values of probability and Table 5 tabulates the responses. While some probability categorizations are validated upon assignment (58% - all, most, or some), few are subject to a closed loop validation or ongoing monitoring (71% - few, or not used). Most measures of probability did not accommodate any error (60% - not used) and the availability of numeric quantitative probability only differentiated the way some risks were managed in 40% of activities. Practices that would have assured the probability values used in statistical risk models were not widely adopted. Respondents could select more than one approach to validating probability values.
Examining the Validity of Probability Measures Used by Organizations Undertaking Experimental Flight Test.
The survey found that risk management frameworks were used in a common manner among flight test organizations. Table 6 reports the purpose of risk management practices within experimental flight test, being consistent and largely homogenous. Respondents could select more than one purpose.
Reported Usage of Risk Management Outcomes Within the Risk Management Framework.
Reported Usage of Risk Management Outcomes Within the Risk Management Framework.
Adherence to organizational risk management procedures, and the prevalence of workaround practices were queried in the survey, with the results presented in Table 7. The question soliciting descriptions of workaround procedures to risk management practices mandated in the framework, met with a strong response. A significant group of responses simply refuted the possibility of workaround procedures being implemented – “nil workarounds”.
Adherence to Organizational Risk Management Process and the Potential for Workarounds.
The survey queried the presence of domain specific risk management practices. Table 8 presents the responses, indicating that two-thirds of referenced systems implement at least partial, domain specific risk management practices within their framework.
Use of Domain Specific Risk Management Frameworks.
Use of Domain Specific Risk Management Frameworks.
Final questions addressed the effectiveness of the risk management framework/s in use against the different risk domains with potentially catastrophic consequences. Analysis of that data is presented in Table 9. Overall, practitioners reported a mean of 90% satisfaction with the risk management practices in place to address the different risk domains (high of 93% for project safety, low of 83% for environmental damage). Only 5% of participants reported any known project safety losses. Unitary instances of a project suspending a risk management framework were reported. There was a high rate of satisfaction with the effectiveness of the risk management frameworks being implemented, referring to a dual statistical / non-statistical approach, reliant upon input from experienced practitioners, that was rarely reflected in formalized process.
Subjective Assessment of Effectiveness of the Risk Management Practices Referenced.
The themes of the interviews were extracted and are presented in Table 10. As outlined in the methodology, the findings are exploratory in nature, reflecting the subjective perspectives of participants and intended to uncover a novel approach to risk management. Repeated instances of individual reports were collated to identify areas of convergence, a process that may be understood as indicative of an underlying Bayesian approach to risk management. Consistent with the observations of Figini and Giudici (Figini and Giudici, 2011), this approach remains subject to the limitations inherent in merging subjective assessments to establish consensus.
Qualitative Themes Identified in the Interviews Conducted.
Qualitative Themes Identified in the Interviews Conducted.
Consistency in the themes between the interviews was interpreted as evidence of saturation on the issues raised. Interviewees were universally familiar with the application of risk management tools in the flight test context and were able to readily assimilate the concepts of risk management frameworks as a conglomerate of tools and practices in a formalized process for controlling risks. They were less familiar with the idea of a structured framework of tools. Though none were formally assigned Risk Manager as a job title, interviewees were conscious of their role as risk managers in their conduct of flight test. Interviewees were conversant with the organizational objective of risk management to support decision making.
Theme 1 – Boundary Conditions, were nominated as important in near half the interviews, being defined as the established limits of aircraft performance that mark the outer edges of the verified flight envelope. Boundary conditions represent the delineation between areas of known aircraft performance, characteristic of complicated system intricacy, and areas still under development or lying outside the system operating envelope. The implementation of controls that constrained system operation within defined performance boundaries functioned as a risk management measure, ensuring that the system was restricted to domains of known performance.
The divergence between the notions of uncertainty in predicting unique future events (Knight, 1921, Keynes, 2014, Faulkner et al., 2021) and probabilistic forecasting within a system (Kay and King, 2020, Friedman, 1976) was apparent in Theme 4. Responses suggest that this distinction is readily apparent in the experiences of interviewees, corroborating the theoretical frameworks established by Faulkner, Feduzi (Faulkner et al., 2021), Keynes (Keynes, 2014), and Knight (Knight, 1921). However, the academic theory was foreign to research participants. Interviewees reported significant discomfort in the practice of assigning quantitative probabilities to unique, uncertain events to enable statistical risk management tools. Interviewees reported assignment of quantitative probabilities as organizational practice, with one respondent noting the practice was mandated. Interviewee discomfort stemmed from the absence of an established statistical model to support probability assessments, which led to the use of subjective probability estimates considered reasonable for the purpose of meeting the requirements of mandated statistical risk management processes. These quantitative values were known to lack rigor, and the practice was undertaken only to clear corporate procedural hurdles.
Themes 9 & 10 spoke to using high levels of experience to assess complex risk and then natural language to convey nuance in those qualitative assessments. Participants reported using natural language to convey complex risk concepts and enable qualitative risk analysis. Participants reported their organizations trained staff to facilitate communication, with some organizations using the crew resource management training mandated in regulated airline operations. These actions were cited as assuring more favorable risk management outcomes.
Organizational Risk Policy
The survey and interviews indicated that participants’ organizations consistently mandated risk management through policy, typically employing the two–dimensional model of consequence and likelihood from the ISO 31000 framework. Reported practices showed widespread use of statistical tools, though these were not applied with explicit consideration of system attributes that might impact their effectiveness. Alongside statistical tools, most participants reported the use of non–statistical tools, suggesting that despite organizational efforts to rationalize around statistical approaches, alternative methods remain valued. Practitioners drew on non–statistical tools where statistical methods were perceived as insufficient, reflecting a tendency toward parallel approaches featuring a Precautionary (Cox, 2009) strategy for managing complex risks. In such cases, mitigation was directed toward reducing potential consequences to a tolerable level, irrespective of assessed likelihood. For the test aircrew who operate inside the complex system facing catastrophic consequences, no discernible probability threshold was evident.
Assurance of Hazard Identification
In accordance with the ISO 31000 framework, flight test organizations sought to identify hazards by tracking occurrences to build statistical models. In practice, this approach yielded little useful data due to the extremely low frequency of such events, and participants indicated limited expectation toward effectiveness. Instead, assurance of hazard identification was primarily achieved through mandated reviews of test documentation and board-level reviews by experienced practitioners within systems engineering gate reviews. Hazard logs, maintained in line with ISO 31000 processes, provided a basis for these reviews. Engagement with experienced practitioners emerged as the predominant non–statistical means of assuring hazard identification, often reinforced through multiple layers of review. Interviewees also noted that the iterative nature of hazard identification reflected the influence of emergence in complex systems, necessitating repeated reassessment to maintain effectiveness.
Risk Analysis
Survey and interview data indicated that organizational processes generally mandated the use of statistical approaches to risk management. Participants frequently expressed difficulty and professional discomfort with the assignment of probabilities to hazards, noting that such values were often applied to satisfy policy requirements rather than to inform practice. Validation or revision of these probability estimates was rarely reported, and several participants suggested that the outputs of statistical tools were treated with caution or, in some cases, disregarded. Though all identified risks were retained within organizational systems, regardless of the probability assigned.
Non–statistical tools, defined here as methods not reliant on knowledge or assignment of a probability value, were less frequently reported. Examples such as System Theoretic Process Analysis (STPA) (Leveson, 2012) and Bow–tie were noted. More commonly, participants described adopting precautionary strategies, consistent with Cox (Cox, 2009), in which potential consequences were bounded to tolerable levels in the absence of reliable probability data. In practice, while subjective probabilities were assigned in line with policy, subsequent risk management tended to rely on precautionary, non–statistical approaches. Reports did not indicate the application of margins of error to probability estimates, nor explicit consideration of potential cognitive bias.
Risk Acceptance
Interview discussions highlighted differences in how residual risk is approached between complicated and complex systems. In stable systems, mitigation efforts can reach a point where residual risk became negligible, making acceptance relatively straightforward. Such stable systems allow for stochastic convergence, where empirical frequencies of occurrence can guide probability–based mitigation. In contrast, catastrophic events in dynamic, complex systems were described as singular and uncertain, precluding the utility of probability–based approaches. The dynamic nature of complex systems precluded practitioners from relying on learning through repeated occurrences, as each instance exhibited novel characteristics. Consequently, risk acceptance in such contexts was itself novel and necessarily grounded in non–statistical approaches.
The catastrophic nature of the consequences under consideration frequently led to all hazards being treated maximally, without prioritization, thereby rendering any inaccuracies in probability estimates effectively inconsequential.
In certain organisations, statistical analysis was employed to classify residual risk in alignment with the defined tiers of managerial risk acceptance authority.
System Boundaries
Interviews indicated that knowledge of the system operating boundary provided test crews with an expectation of performance within a bounded envelope, with deviations from expectation serving as cues of inadequate system knowledge and the potential for further deviation. Participants described control strategies that sought to ensure the performance boundary remained within the known limits of control authority. One commonly reported strategy was the incremental approach, implemented as a division of the performance envelope for testing. However, there was little evidence that this approach explicitly accounted for increased exposure at each step, and its effectiveness in assuring acceptable risk at each increment was not considered. Some interviewees suggested that aligning increments with known tolerable limits could strengthen this approach, potentially morphing it to be a Precautionary strategy (Cox, 2009).
Attributes of the System Under Test
Interviews and survey responses suggested that flight test organizations did not explicitly differentiate between system attributes when selecting risk management practices. Statistical, quantitative tools were commonly described as efficient to apply but ineffective in the presence of complexity, though participants were not always able to articulate reasons for the ineffectivity. Non-statistical, qualitative tools were seen as resource–intensive and reliant on practitioner expertise yet remained valued in practice. Despite corporate preferences for standardized statistical approaches, most frameworks incorporated both quantitative and qualitative tools. Hazard Logs, Incident Reporting, and 2DRAM widely used, and methods such as practitioner surveys, Bow–Tie, and STPA were applied less frequently. This parallel application appeared to provide resilience, as all identified risks were addressed regardless of system attributes, though system attributes were not used to guide the choice of tools.
Participants expressed discomfort with the assignment of subjective probabilities to hazards in complex systems, observing that such estimates were frequently employed to satisfy policy requirements, while qualitative methods were concurrently applied to inform decision–making. When presented with the academic definition of complexity as precluding valid probability estimates (Luther et al., 2023), interviewees recognized this as consistent with their professional unease. Flight test practitioners emphasized that they often managed systems undergoing change, or systems being operated for the first time, conditions that precluded the development of statistically valid datasets. Consistent with Aven (Aven, 2015), participants described a discontinuity between past performance and future uncertainty. In such contexts, statistical tools were of limited value, whereas in stable systems with repeated operation they were considered effective, drawing on valid performance models.
Cynefin
Interviews suggested that flight test practitioners often encountered risk across systems with varying levels of intricacy, defined here as the degree of determinism and the latency in realizing hazard consequences. The Cynefin ontological framework (Snowden and Boone, 2007) provided a useful lens for categorizing these attributes. Several participants discussed domain–specific controls, noting that the effectiveness of controls varied with system intricacy. For example, personal protective equipment (PPE) was regarded as offering limited value in Complicated or Complex domains, while incident reporting tools and hazard logs, effective in Complicated settings, were seen as less useful in Complex contexts. Conversely, controls perceived as effective in Complex domains were described as resource–intensive and difficult to communicate, and therefore less suited to Clear hazards where the overheads were disproportionate.
One interviewee questioned the width of the boundary delineating between the Complicated and Complex domains. The discussion evolved to view the delineation as relative to the observer, contingent on available resources, and better understood as a broad transitional zone. Consequently, a system might appear Complex to an observer lacking time or expertise to learn the system but Complicated to an organization able to invest in learning the system. Established organizations were described as using systems engineering resources to shift systems toward the Complicated domain, whereas smaller organizations with fewer resources might adopt Precautionary approaches to manage risks that effectively remain Complex. This observation aligns with Murmann and Sardana (Murmann and Sardana, 2013), who noted that entrepreneurs use Precautionary strategies when resources to reduce uncertainty are limited.
Communication
Interviews suggested that taxonomies were often used to capture nuanced reasoning in natural language, supporting the communication of complex hazards and risk concepts within flight test teams. Participants noted that natural language aided understanding and was reinforced by interpersonal relationships that facilitated communication. This practice accords with Wei and Zeshui (Wei and Zeshui, 2016), though there was no indication of this enabling propagation of learning outside the industry. Cultural esteem for experienced practitioners was evident, with mentoring and coaching seen as important in transmitting heuristic methods for identifying and mitigating complex risks. These heuristics were valued for their memorability, ease of communication through stories, and their abstraction, which allowed sharing across organizational boundaries without disclosing proprietary details.
Culture
Interviews and survey responses suggested the presence of a strong, pan–organizational safety culture (Reason, 1997), shaped over decades as the profession evolved from a period of high accident rates to one that emphasizes disciplined practice. This formally schooled, professional group provided a cultural consistency from which insights could be drawn.
The influence of this safety culture was evident in responses to the survey question on non–compliance with procedures (Table 7). Intended to elicit examples of workaround practices, the question was largely interpreted by participants as relating to safety compliance, leading to near–universal rejection of the idea of procedural deviation during operations. Free–text responses instead described strategies to avoid the need for workarounds, such as narrowing the scope to resolve conflicts. Real–time amendments to procedures without prior approval were not reported, reflecting established cultural norms.
Cultural lore within the flight test community was transmitted through stories, often originating from past accidents. These narratives were seen to reinforce lessons and perpetuate shared heuristics across organizational boundaries. Ireland (Ireland, 2016) highlights the principles of control, communication, coordination, and integration within Complex Systems Governance (CSG). While participants did not explicitly reference CSG, their emphasis on communication and system controls resonate with its principles. In this sense, the findings suggest parallels between flight test practices and the CSG framework.
Study Limitations
The requirement for participants to have an organizational background in an English effectively limited the sample to individuals trained in risk management practices mandated by the Western legal frameworks regulating aviation (International Civil Aviation Organization, 2016) and industrial safety standards (International Organization for Standardization, 2018).
While it cannot be concluded from this study that such filtering determined the observed tendencies, it is acknowledged that the participant group was homogenous. The use of English and filtering to organizations practicing systems engineering had the effect of filtering participants to those for who Cartesian-Newtonian reductionism (Dekker et al., 2011) formed the basis of their risk acceptance thresholds. Also to those adopting an Anglo-Saxon cultural bias toward individual responsibility for decisions (Frisk and Bannister, 2017) – complementary to the Newtonian cause-effect dependency. These characteristics may also reflect universal features of the flight test domain or requirements imposed by regulators and cannot be disentangled within the scope of this research.
This study is domain-specific, focusing on a unique industry group in which risk managers operate as integral actors within a system that would be lethal if risk were realized. The research necessarily relied on self-reported data and is therefore subject to potential biases, including halo effects, while the task of triangulating findings across different cultural or organizational contexts remains a matter for future investigation. Consequently, the outcomes have limited generalizability. The findings should be interpreted as exploratory, constituting a unique case study that documents novel practices, offering an initial contribution that may inform alternative approaches toward the challenge of managing risk in complex socio–technical systems.
Conclusion
The significance of this study lies in its potential to contribute to risk management strategies in contexts where the consequence of a single adverse event is unacceptable. Responding to Komljenovic and Loiselle's (Komljenovic et al., 2017) call for research into the influence of complex systems on risk management, the findings suggest that flight test practitioners employ statistical and non–statistical approaches in parallel. All available tools were applied to the system without delineation, despite recognition that not all tools were equally effective across system types. The flight test risk management framework incorporated methods intended to address both deterministic and non–deterministic system elements. Among the non–statistical approaches, the Precautionary strategy (Cox, 2009) was frequently described, reflecting its emphasis on bounding consequences rather than estimating likelihood. Other likelihood–independent practices, such as engineering reviews and checklists (Mosey, 2014), were also reported, alongside the use of System Theoretic Process Analysis (STPA) (Leveson, 2004), employing top–down risk controls to capture emergence.
Practical Application
This research offers an empirically grounded illustration of Aven's work (Abrahamsen and Aven, 2012, Aven, 2016), demonstrating how non–statistical tools may be applied to avoid reliance on subjective probability estimates when managing hazards in complex systems. The Cynefin framework (Snowden and Boone, 2007) was found useful in categorizing levels of determinism and latency, providing a lens through which appropriate tools could be aligned with systems categorized as Complicated or Complex. Although the duplication of statistical and non–statistical tools within flight test frameworks introduces overhead, this duality appeared to support effectiveness across different system types. Consistent with Komljenovic and Loiselle's (Komljenovic et al., 2017) view of organizational learning as transferable across technologies, the findings suggest that applying Cynefin as a management lens may help risk managers select appropriate risk management tools. Aligning tool use with system determinism could, in principle, improve the efficiency of risk management across industries.
Interview data highlighted opportunities at the boundary between Complicated and Complex domains. For organizations with established systems engineering capacity, investment in system knowledge may shift some risks toward the Complicated domain, enabling greater use of statistical tools with lower overheads. Where such knowledge cannot be attained, non–statistical approaches remain necessary. For smaller organizations with constrained resources, the adoption of Precautionary strategies may constitute a pragmatic approach to managing risks that, within their context, remain complex.
Closing
Managers across public and private sectors increasingly confront systems marked by dynamism and non–determinism, demanding risk management frameworks that satisfy both societal and regulatory obligations. This study identified the flight test risk management framework as an effective response to such complexity and, through a systems–thinking lens, examined the basis of its effectiveness. While this represents an important initial step, the growing prevalence of complex systems in technology–intensive domains highlights the need to adapt and extend risk management practices. Future research should explore the transferability of the flight test framework to other industries, critically examining the cultural and linguistic boundaries that may shape its broader applicability.
Footnotes
Ethical Approval
The University of Adelaide, Office of Research Ethics, Compliance and Integrity: H-2022-146
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Benjamin Luther acknowledges financial support received through the provision of an Australian Government Research Training Program Scholarship. The funding source was not involved in the conduct or reporting of this research.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
