Abstract
The use of artificial intelligence (AI) in digital health applications is increasing rapidly, creating new opportunities as well as safety challenges. AI-related errors can be systematic, repeating consistently across similar cases and, when deployed at scale, potentially affecting multiple patients within a short time. In this paper, we extend James Reason’s classic distinction between person-based and system-based approaches to error by incorporating a third, technology-based perspective to account for failures inherent to AI-driven digital health systems and their lifecycle, including issues related to data, model design, deployment, and monitoring. Protecting patients in an increasingly AI-driven healthcare system requires not only model-centric controls, but an integrated set of overlapping technical, organizational, and regulatory safeguards designed explicitly for AI-driven digital health.To explore how such safeguards may be structured, we conducted a qualitative, exploratory analysis of 15 real-world incidents retrieved from an AI incident database. We extracted contributing factors for these incidents and conducted a thematic analysis. Aiming for conceptual alignment with Reason’s Swiss cheese model for safety, the themes were grouped into higher-level system layers. As a result, we propose five protective layers to safeguard AI-driven digital health: (1) data governance and quality assurance; (2) model development, evaluation and change control; (3) sociotechnical integration and human oversight; (4) governance, regulatory and ethical compliance; and (5) post-market monitoring, incident response, and learning. We propose safeguards that were also derived from the incident analysis to operationalize the layers, but they are not intended to represent an exhaustive or consensus-based set of recommendations, but rather an exploratory, incident-informed framework to support analysis and system design. Finally, we argue for a “just culture” in AI-driven digital health, where researchers, healthcare professionals, and patients are encouraged to report critical incidents and errors, as this is essential for learning and system improvement.
Keywords
1. Introduction
Artificial Intelligence (AI)-driven digital health applications are becoming increasingly embedded across healthcare systems. For the purpose of this work, we understand AI-driven digital health as the use of AI within digital health technologies to analyze data, inform clinical and patient decisions, and support more personalized and efficient care. 1 We consider both, AI-based systems that are regulated medical devices and non-regulated health applications. The boundary of AI-driven digital health is not regulatory status, but the functional role of AI systems in influencing health-related decisions or actions within a healthcare or health-relevant context. This includes systems that directly inform clinical decision-making as well as those that shape patient behavior or access to care. They span healthcare professional-facing tools such as AI-enabled clinical documentation systems and ambient listening technologies, 2 as well as patient-oriented applications including symptom-checking chatbots 3 or digital therapeutics. 4
While the adoption of AI-driven digital health has accelerated rapidly, this growth has been accompanied by substantial concerns regarding risks to patient safety and care quality. 5 In response, new conceptual frameworks such as digitalovigilance 6 and algorithmovigilance 7 have emerged to study how digital technologies and AI influence clinical outcomes and health system functioning. 8 Algorithmovigilance focuses on monitoring and managing risks associated with algorithms, a topic that specifically arose with the availability of AI and machine learning algorithms. 7 Digitalovigilance refers to the monitoring, evaluation and management of risks associated with digital health technologies and data-driven health ecosystems. 6 Such approaches reinforce the importance of moving beyond static validation toward dynamic, post-deployment surveillance to ensure ongoing clinical reliability and patient safety. 9 Nevertheless, understanding the full range of unintended consequences associated with AI-driven digital health remains incomplete and a systematic characterization of the potential harms associated with AI-driven digital health and the conditions under which they occur is missing.5,10
Recent analyses of AI-driven digital health incidents illustrate that algorithms may fail to detect critical clinical events or produce misleading outputs in high-stakes contexts. 11 Beyond, serious errors linked to AI-driven digital health have been reported, along with indications that such incidents are underreported or inconsistently documented. 12 Coiera et al. noted “little reporting of patient harms from trials.”. 13 Farrah Adegunle et al. concluded current UK and US regulatory models “lack mechanisms to systematically detect or prevent bias.”. 14 Muralidharan et al. found that among 692 FDA-approved AI/Machine Learning devices, only 3.6% reported race/ethnicity data, 99.1% provided no socioeconomic data, and only 9% included a prospective study for post-market surveillance. 15 Publicly available incident repositories such as the OECD AI Incidents and Hazards Monitor 16 suggest that incidents are being identified with increasing frequency, although their true magnitude remains uncertain. Limitations of the existing - rather passive - surveillance strategies include underreporting, reporting bias, and limited capacity to capture software-specific harms as it has been acknowledged by Lakhan et al. 17
Traditionally, patient safety science has addressed risks through system-level learning, reporting, and layered safeguards. 18 Already in 2000, James Reason introduced an error approach that distinguishes person and system approaches to error. 18 Reason argued that human error is inevitable, even among skilled healthcare professionals. Therefore, safety should focus less on blaming individuals and more on designing systems that anticipate, detect, and mitigate errors. The introduction of AI, however, adds new sources of risk to healthcare systems. AI failures in healthcare manifest differently from human failures due to their systematic, scalable and often silent nature. When AI-driven digital health systems learn flawed patterns, they reproduce them consistently across similar cases, which can affect thousands of patients simultaneously. In contrast, a human would not make errors consistently and in different places at the same time. AI-driven digital health systems have the potential to generate outputs that are plausible yet incorrect, making them difficult to detect. 19 As a result, they may fail silently and with high confidence. Beyond these core characteristics, AI-driven digital health systems are susceptible to environmental changes, i.e. the models can perform well under the conditions they were trained on, but their accuracy can drop when the environment and context changes. 20 Further, AI models often learn statistically relevant patterns that correlate with outcomes, but actual causes remain hidden. For example, a particular diagnosis may be strongly associated with a specific combination of symptoms. However, if a patient presents with an atypical symptom profile, the model may fail to provide an accurate diagnosis.
In this paper, we extend James Reason’s models of error and safeguards 18 to the emerging landscape of AI-driven digital health. We argue that protecting patients in a healthcare system that is increasingly AI-driven requires not only isolated controls, but an integrated system of overlapping technical, organizational, and regulatory safeguards designed explicitly for AI-driven digital health. Building on this perspective, we aim to complement and extend existing risk frameworks such as the National Institute of Standards and Technology AI Risk Management Framework, ISO 14971-based medical device risk management, and emerging concepts such as digitalovigilance and algorithmovigilance by providing a unified, incident-informed, and healthcare-specific safety model. In contrast to existing approaches, which often address individual dimensions of risk (e.g., technical performance, governance, or post-market monitoring) in isolation, our framework integrates these dimensions across multiple layers and explicitly captures how failures can propagate through the system. The study develops a conceptual framework informed by a qualitative, exploratory analysis of empirical cases. An exploratory incident review is used to ground the framework in observed real-world failures, rather than to derive a fully generalizable or formally validated qualitative theory.
2. An error approach to AI-driven digital health
First of all, we propose extending James Reason’s model to human error comprising the person and the system approach, using a technology approach to better capture AI-driven health specific failure modes, and to avoid attributing harm solely to healthcare professionals, patients or the healthcare system (see Figure 1). We elaborate on the person, technology, and system approaches below. Error approaches to AI-based systems in healthcare. Grounding on the two error approaches of James Reason, we introduce a third approach referring to the technology, i.e. failures of AI-driven digital health systems.
2.1. Person approach
The person approach in James Reason’s model locates the source of error in individual healthcare professionals. 18 Errors originate from mental processes such as inattention, poor motivation, forgetfulness or carelessness. With patient-facing AI-driven digital health, errors can also be caused by the patient, who may use these tools and commit errors intentionally or unintentionally – which is in addition to errors committed by healthcare professionals as described in the original model. In contexts where AI outputs are perceived as authoritative by either patients or healthcare professionals, errors attributable to individuals may be reinforced through deskilling and automation bias. 21 Countermeasures should strengthen healthcare professionals’ and patients’ ability to calibrate trust, critically interpret AI outputs, and use AI-driven digital health systems as intended. For instance, ensuring that outputs are explainable and that the uncertainty of AI models is clearly communicated can help users recognize when additional verification is needed and thereby help preventing careless use. 22
2.2. Technology approach
As an additional analytic lens to the original safety model, we introduce the technology approach that locates the source of failures in the AI-driven digital health systems themselves. Failures can be divided into active failures and latent conditions, in parallel with Reason’s terminology. 18 Active failures are visible manifestations of errors in patient care, such as unsafe inferences from input data, hallucinations, erroneous, automatically generated diagnoses or inappropriate treatment recommendations or health advice. Unlike individual human mistakes, such failures can occur consistently across similar cases, creating systematic error patterns that affect multiple patients simultaneously. They may also manifest themselves subtly or with a delay, which makes them harder to detect than conventional clinical errors. Avoiding this kind of failures requires critical evaluations of the AI algorithms and clear scope setting and description so that the user knows about application areas and tasks where accuracy can be expected or not.
Latent technological conditions, by contrast, are vulnerabilities embedded within the AI lifecycle. These include biased or incomplete training datasets, inadequate validation in real-world clinical contexts, a lack of explainability, insufficient cybersecurity protections and insufficient regulatory compliance or governance. Addressing these issues requires robust design, careful selection of datasets, knowledge about the underlying datasets and their biases, thorough validation and continuous post-deployment monitoring as well as consideration of regulatory requirements regarding security and privacy.
2.3. System approach
The system approach understands errors as a consequence of broader organizational and contextual factors. Persons and AI-driven digital health systems can make mistakes as it was discussed before in the person and technology approach. With the system approach, we acknowledge the conditions that allow failures to happen and to create harm. In the case of AI-driven digital health, this perspective highlights that even the most advanced algorithms are fallible and that their safe use depends on the environment in which they are deployed. Depending on their purpose and functionalities AI-driven digital health has to be registered as software as a medical device and has to undergo certain regulations to ensure their safety. However, workflows that restrict opportunities for human oversight, lack of access to care for patients, time pressures that prevent healthcare professionals from questioning automated recommendations and organizational safeguards that are inadequate can transform isolated human or technology errors into systemic hazards. Therefore, responsibility does not lie with the individual healthcare professional, patient or the AI-driven digital health system alone, but with the absence of barriers and defenses that should prevent errors from propagating to patients.
3. Methodology for adapting the Swiss cheese model
To adapt Reason’s error approach and Swiss cheese model to AI-driven digital health, we conducted a qualitative, exploratory study. It combines a multiple case study design with inductive thematic analysis to enable theory-building grounded in empirical cases. A purposive sample of 15 AI-related healthcare incidents was collected from the AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC) database (https://www.aiaaic.org). It is a publicly accessible repository that documents real-world incidents, failures, and controversies involving AI systems. It compiles cases from diverse sources, including academic literature, regulatory reports, investigative journalism, and credible media coverage. Each entry typically includes a description of the incident, contextual information, and references to original sources.
Incidents were included in our sample if they (i) involved an AI or machine learning system used in the health domain, (ii) described a negative outcome or identifiable risk (including near-misses), and (iii) provided sufficient detail to reconstruct contributing factors. The dataset was retrieved on March 24, 2026. In total, 38 health-related incidents were identified in the AIAAIC database and screened against these criteria. From these cases, a final sample of 15 incidents was selected using purposive sampling. Selection aimed to maximize diversity across types of AI systems, application contexts, and forms of failure or risk (e.g., clinical decision support, self-management applications, and resource allocation systems), to capture a broad range of contributing mechanisms. For example, for incidents related to the same system only one case was included. The extraction table in Appendix 1 shows all included incidents and the 15 selected cases. The screening and coding process was conducted by a single researcher.
For each case, we extracted data including the clinical context (setting and domain), AI system characteristics, incident description, outcome (harm/near miss), contributing factors, detection mechanisms (how and whether the incident was identified) and existing or missing safeguards. Contributing factors were analyzed using inductive thematic coding. Thematic coding followed an iterative and reflexive process. Initial codes were generated from a subset of cases and compiled into a preliminary codebook (Appendix 1), which was continuously refined as additional cases were analyzed. Earlier cases were revisited to ensure consistent application of codes across the dataset. Coding decisions and category definitions were documented throughout the process.
The grouping of codes into higher-level system layers was guided by both empirical patterns in the data and conceptual alignment with Reason’s Swiss cheese model. This step involved iterative comparison across cases to identify clusters of contributing factors that operated at similar levels within the sociotechnical system. While no formal inter-rater reliability assessment or independent second coding was conducted, the analysis emphasized consistency through repeated comparison and iterative refinement. The resulting framework should therefore be understood as an interpretive, theory-building contribution rather than a definitive classification.
To support practical implementation, we used the selected incidents and the information on missing or existing safeguards to describe a set of minimum and advanced safeguards for each layer.
4. The Swiss cheese model adapted for safety related to AI-driven digital health
4.1. Exploratory incident analysis
Extraction table for the purposive sample of 15 AI incidents occurred in healthcare retrieved from AIAAIC database.
From the identified contributing factors, we derived five protective layers that form the adapted Swiss cheese model described in the following subsection. Specifically, the theme “Data quality, bias, and representativeness” is mapped to a “Data governance and quality assurance” layer. The themes “Model design and technical limitations” and “Validation, reliability, and clinical safety” jointly inform a “Model development, evaluation, and change control” layer. The themes “Transparency, accountability, and governance” and “Privacy, security, and ethical risks” are addressed within a “Governance, regulatory, and ethical compliance” layer. In addition, we introduce a “Sociotechnical integration and human oversight” layer, corresponding to the theme “Workflow integration and sociotechnical fit”. Finally, a “Post-market monitoring, incident response, and learning” layer captures contributing factors related to “System-level and infrastructure failures”. When contributing factors fit to multiple themes, factors were assigned to the layer representing the primary origin of the failure or the layer where intervention would most directly prevent recurrence.
4.2. Adapted Swiss cheese model
The diversity of contributing factors requires several layers of safeguards to prevent patient harm in the context of AI-driven healthcare. The Swiss cheese model for safety – also introduced by James Reason - visualizes multiple protective layers against harm, each represented by a slice of Swiss cheese with “holes” symbolizing vulnerabilities.
18
When these holes align, accidents or adverse outcomes can occur. Applied to AI-driven digital health, the model highlights that safety is not achieved by any single safeguard, e.g. those that can be integrated into AI systems, but by the integrity of an entire system of overlapping protections. We propose five protective layers to reduce the risk of patient harm arising from failures in AI-driven digital health or the uncritical use thereof. While each layer has potential vulnerabilities, together they can provide a resilient safety net (see Figure 2). The layers should be understood as analytically distinct but not mutually exclusive; individual incidents may involve interacting failures across multiple layers. In the following, we describe these protective layers. Swiss cheese model applied to AI-driven digital health: Five protective layers prevent occurrence of patient harm caused by errors, failures and latent conditions related to AI-driven digital health systems.
Data governance and quality assurance. AI models inherit the strengths and limitations of the data used to develop them. 23 Using high-quality, representative and sufficiently diverse datasets can reduce the risk of biased or inequitable outcomes.24,25 Key defenses include data governance (e.g. provenance tracking, version control and access management), systematic data quality checks (e.g. label auditing, missing data analysis and unit harmonization) and documentation of dataset composition and known limitations. Gaps in this layer can be caused by mislabeled outcomes, changes in clinical practice over time, limited representation of relevant subgroups and unrecognized differences in measurements. 26 For instance, early warning systems for sepsis trained predominantly on data from younger populations may underperform in older adults or patients with multimorbidity. 27
Model development, evaluation and change control. A robust design and rigorous evaluation of AI-driven digital health are essential to ensure clinical reliability. 28 This layer is strengthened through pre-release testing, independent external validation across settings, calibration assessment, subgroup analyses, and ongoing revalidation following software or model updates. 29 In line with the principles of clinical evaluation of Software as a Medical Device (SaMD), 30 the evidence should address the analytical validity and clinical performance of the model in its intended context of use.15,16 Because clinical knowledge evolves, a defined model lifecycle is required to prevent outdated recommendations due to outdated data in the trained model. 31 This should include versioning, change-control criteria and timely retirement of obsolete models. 32 Communicating uncertainty (e.g. calibrated confidence or ‘insufficient information’ flags) can facilitate safer interpretation. Holes in this layer arise when models are overfitted to narrow datasets, are poorly calibrated, are affected by data leakage or spurious correlations, lack transparency appropriate to risk, or are insufficiently validated under real-world conditions and workflow constraints.
Sociotechnical integration and human oversight. Even well-designed AI-driven digital health systems can cause harm if they are not safely integrated into clinical pathways. Ensuring safety requires sociotechnical integration, including clear role definitions (specifying who is accountable for acting on outputs), escalation and fallback procedures, user-friendly interfaces, and workflows that allow for human oversight.33,34 For patient-facing tools (e.g. a chatbot delivering cognitive behavioral therapy), safeguards should include the ability to detect symptom deterioration, clear emergency guidance and a defined handover process to healthcare professionals or crisis services. 35 In contrast to the layer “Model development, evaluation and change control” that captures failures intrinsic to the AI system and its lifecycle management, this layer captures failures emerging from the interaction between the system, users, workflows, and organizational context during real-world deployment. This layer can be compromised when AI-generated recommendations are poorly timed, alert thresholds are miscalibrated (leading to false positives and fatigue 36 ), risk escalation is inadequate, or the tool increases the workload, thereby undermining a healthcare professional’s ability to verify outputs.
Governance, regulatory and ethical compliance. Another protective layer is robust governance that is aligned with regulatory, ethical and professional standards. For AI systems that qualify as medical device software, this involves complying with the relevant medical device regulations (e.g. the Medical Device Regulation (MDR)) and implementing a risk management process, for example in accordance with ISO 14971, supported by software lifecycle processes (IEC 62304) and usability engineering (IEC 62366-1). When it comes to AI-specific governance, organizations can draw on AI risk management and management system standards (e.g. ISO/IEC 23894 and ISO/IEC 42001), as well as information security management standards (ISO/IEC 27001). In the EU context, aligning with the AI Act and clarifying its interaction with medical device regulation can strengthen defenses. 37 However, overlaps between these frameworks can create compliance gaps if responsibilities and documentation requirements are fragmented. In addition, limited transparency and the use of opaque (“black-box”) models 38 can undermine healthcare professionals’ ability to verify AI-generated outputs and appropriately calibrate trust.22,39 Transparency measures, such as model cards, 40 intended-use statements41,42 and the explicit disclosure of AI involvement to patients, are therefore critical to support informed consent, accountability, and safe clinical use.
Post-market monitoring, incident response and learning. Ensuring safety of AI-driven digital health systems requires continuous oversight during real-world use. Defenses include live performance monitoring, drift detection, periodic benefit–risk reviews and clear triggers for retraining, revalidation, rollback and retirement. 43 Post-market monitoring is also a formal regulatory expectation for high-risk AI systems in the EU. Effective safeguards include prospective incident registries, automatic risk reporting, the systematic capture of near misses and ‘silent failures’, predefined incident response workflows and the clear assignment of surveillance responsibility across providers and deployers. This layer’s safety measures can be compromised by a lack of monitoring, reliance on technical metrics alone, slow detection or reporting of incidents, failure to communicate findings to all users, and unclear accountability for corrective actions.
4.3. Safeguards for the five protective layers
Safeguards per protective layer.
The suggested safeguards are intended to operationalize the protective layers and support implementation in practice. We distinguish between minimum and advanced safeguards. Minimum safeguards represent baseline requirements that should be in place prior to deployment, whereas advanced safeguards provide additional risk mitigation, particularly for high-risk applications or more complex AI-driven digital health systems. The relevance and prioritization of specific safeguards may vary depending on the AI modality, clinical context, and clinical risk profile.
The incidents analyzed in this study span a heterogeneous set of modalities, including predictive models embedded in clinical workflows, generative AI systems (e.g., chatbot-based interfaces), computer vision applications, and hardware-integrated sensing systems, which exhibit distinct dominant failure modes. The adapted Swiss cheese model should be applied in a modality-sensitive manner, where safeguards are not uniformly weighted but prioritized according to the dominant risk profile of the system under consideration. The layered structure remains applicable across modalities, but its practical implementation requires adaptation to the specific technical and clinical context. For example, predictive analytics systems are particularly sensitive to issues such as calibration drift, dataset shift, and alert fatigue, placing greater emphasis on safeguards related to model development, evaluation and change control (Layer 2), as well as sociotechnical integration (Layer 3). In contrast, generative AI systems such as large language model-based chatbots introduce risks related to hallucinations, inappropriate responses, and missing escalation pathways, increasing the importance of safeguards in Layer 3 (Sociotechnical integration and human oversight) and Layer 4 (governance, regulatory and ethical compliance.
4.4. Evidence-informed examples to illustrate the framework
To illustrate the adapted Swiss cheese model, we map the 15 real-world incidents to the framework. Across the analyzed cases, incidents rarely resulted from a single point of failure; rather, they emerged when multiple weaknesses aligned across the proposed protective layers. Several recurring patterns can be identified.
Limitations in the data layer were a frequent initiating factor in the incidents. Issues such as biased or unrepresentative data, dataset shift, and low data quality propagated through subsequent layers, often remaining undetected until harm occurred. Cases involving demographic bias (e.g., skin color, race-based adjustments) illustrate how upstream data assumptions can lead to systematic inequities in clinical outcomes.
Limitations in model design and evaluation including poor calibration, oversimplification, and modality-specific weaknesses such as algorithmic limitations contributed to unreliable or unsafe outputs. These risks were amplified when models were deployed without sufficient external validation or robustness testing.
Failures in sociotechnical integration (e.g. poor workflow integration, overreliance on AI outputs, and lack of guardrails) led to inappropriate use of AI systems in practice. Even technically sound models failed when deployed in contexts for which they were not adequately designed or when users misinterpreted their outputs.
Most notably, the post-market monitoring and learning layer was consistently weak or absent. In many cases, there were no systematic mechanisms to detect performance degradation, unsafe outputs, or emerging risks in real-world use. As a result, failures such as dataset drift, bias, or hardware malfunctions persisted longer than necessary, increasing the likelihood and severity of harm.
These findings suggest that AI-related incidents in healthcare are best understood as system-level failures, arising from the interaction of technical, organizational, and human factors. Strengthening safety therefore requires not only improving individual components but ensuring that robust and complementary safeguards are implemented across all layers, with particular emphasis on continuous monitoring and governance.
We present three representative cases in detail to illustrate how failures propagate across layers. A complete mapping of all analyzed cases across the five protective layers is provided in Appendix 1.
The Apple Watch blood oximeter case (AIAAIC0898) illustrates how failures originating in the data layer can propagate across the system. The device exhibited reduced accuracy for individuals with darker skin tones, reflecting insufficient representativeness in the underlying data and limitations related to measurement location. This bias was not adequately mitigated at the model or validation stage, and no systematic subgroup monitoring was in place post-deployment. While some users mitigated the issue through double-checking, the absence of robust safeguards across layers resulted in inequitable outcomes and potential delays in treatment.
The incident concerning the Babylon symptom checker (AIAAIC0160) demonstrates how limitations in model design, combined with sociotechnical factors, can lead to harm. The system exhibited overconfidence and insufficient clinical validation, producing unsafe triage recommendations. These risks were amplified by its deployment in a consumer-facing context without adequate guardrails or escalation mechanisms. The lack of systematic monitoring further limited the detection of unsafe outputs. This case highlights how model-level weaknesses, when combined with inappropriate integration into real-world use, can result in patient harm.
The case related to the Abbott glucose monitoring system (AIAAIC2140) illustrates how failures can arise from interactions between hardware and algorithmic components. A manufacturing defect, combined with insufficient fail-safes in the algorithm, led to incorrect glucose readings and subsequent dosing errors. Although post-market reports eventually identified the issue, the absence of robust real-time monitoring and rapid response mechanisms delayed mitigation. This case underscores the importance of lifecycle quality control, system-level validation, and effective post-market surveillance.
5. Discussion
This paper argues that safeguarding AI-driven digital health requires moving beyond a metrics-centric view of quality, such as accuracy, AUROC and F1, and adopting a patient safety model that recognizes harm as the convergence of vulnerabilities in people, technology and systemic structures. We selected Reason’s model of error and the Swiss cheese model as they offer a well-established approach of patient safety culture that explains how patient harm rarely originates from a single mistake, but rather emerges when weaknesses in people, technology and systems align. Digital health and AI are quintessentially sociotechnical; safety depends not only on algorithmic performance, but also on data practices, workflow integration, governance and monitoring during real-world use.
5.1. Practical implications for AI-driven digital health
As mentioned earlier, the introduced layers should be understood as analytically distinct but not mutually exclusive; individual incidents may involve interacting failures across multiple layers. From a practical perspective, this suggests that contributing factors should not be forced into a single-layer classification where overlap exists. Instead, identifying all relevant layers to which a contributing factor pertains may provide a more accurate representation of the underlying failure dynamics and the implementation of safeguards across multiple layers may be more effective than isolated interventions, as this allows risks to be addressed at different points of manifestation.
One persistent challenge is that the translation of research into practice is often guided by performance metrics (e.g. accuracy, F1 score), which do not specify the necessary evidence for safe deployment. Although reporting guidelines for clinical AI studies have been introduced (e.g. CONSORT-AI/SPIRIT-AI for trials,44,45 TRIPOD+AI for clinical prediction models 46 and DECIDE-AI for early-stage clinical evaluation 47 ), inconsistent reporting and adherence continue to undermine a harmonized safety assessment of AI models and algorithms, reproducibility of evaluations and clinical validity. 48 Harmonized reporting is particularly important for the protective layers referring to data governance, quality assurance, model evaluation, and change control. From a Swiss cheese model perspective, such reporting is particularly important for layers 1 and 2 (data governance and quality assurance, and model development, evaluation and change control, respectively) because these layers require transparent documentation of dataset provenance and representativeness, labelling practices, subgroup analyses, calibration, intended use and update/revalidation policies.
Further, healthcare institutions deploying AI-driven digital health systems must treat their implementation as a safety-critical intervention. This requires human factors and workflow testing (Layer 3), clear accountability for responding to outputs (Layer 4), pathways for escalating safety-critical situations and establishing and maintaining a usable monitoring infrastructure (Layer 5, e.g. tracking alert burden, overrides, performance drift and subgroup safety signals). However, the systematic learning of post-market incidents involving digital health interventions is still in its infancy. In the United States, although the FDA’s MAUDE database provides a publicly searchable collection of medical device reports, the scale and variability of the data make it challenging to conduct a systematic assessment, and the database is known to have limitations as a surveillance source. 49
Crucially, existing incident infrastructures also make it difficult to identify events specifically linked to AI functionality, as structured identifiers (e.g. an Artificial Intelligence/Machine Learning flag and software/version fields) are not consistently available. This limits the reliable retrieval and synthesis of information, and therefore adverse event identification for AI as a medical device requires targeted search strategies and remains methodologically challenging.50,51
In this untransparent landscape of AI-driven digital health, healthcare professionals and patients need support to calibrate their trust. This includes providing training and developing skills, 12 designing workflows that allow for verification and escalation opportunities, and ensuring that patient-facing tools offer clear guidance on when and how to seek human assistance. It is particularly important to maintain a clinician-in-the-loop pathway (or an explicit escalation route) because patients may hesitate to report unexpected events due to uncertainty, low health literacy or fear of misunderstanding, which reduces the likelihood that early warning signs will be captured and acted upon.
5.2. Comparison to other safety approaches
Reason’s model is not the only useful approach to safety; its value lies in its explanatory clarity and barrier logic. Complementary frameworks can help to specify, test and ensure those barriers. Standards for medical device risk management and software lifecycles (e.g. ISO 14971, IEC 62304 and usability engineering guidance) provide a structured hazard–control lifecycle and are particularly relevant for regulated digital therapeutics and AI medical devices.
AI systems that are classified as software as a medical device are subject to established regulatory requirements, including risk management, clinical evaluation, and post-market surveillance. Many consumer-facing or wellness tools operate outside these frameworks despite having potential health impacts. Safeguards such as transparency, user guidance, and post-market monitoring may be less consistently applied or rely on voluntary standards. This creates potential gaps, particularly where such tools are used in ways that influence health-related decisions. The proposed framework can help to bridge this divide by highlighting a common set of safety layers that are relevant across both contexts, while allowing for differences in the rigor and formalization of safeguards depending on the level of risk and regulatory oversight.
General AI governance frameworks (e.g. the NIST AI Risk Management Framework (https://airc.nist.gov/airmf-resources/airmf/) and the ISO/IEC 23894:2023 Information technology – Artificial intelligence – Guidance on risk management) provide cross-sector risk management functions and organizational capabilities. Our five-layer model can be operationalized using the NIST AI RMF functions: • • • •
This mapping enables digital health developers to translate patient safety narratives into governance and operational actions that can be assigned, audited and improved iteratively.
Sociotechnical patient safety models (e.g., Systems Engineering Initiative for Patient Safety (SEIPS) model 52 ) are closely aligned with our integration layer, as they explain how work-system design shapes outcomes. However, our framework goes further by emphasizing AI lifecycle issues (e.g. data provenance, drift and update governance) that are not always considered in traditional healthcare workflow models.
Safety-II/resilience engineering 53 complements the adapted Swiss cheese model by emphasizing how work usually goes right, which is highly relevant for AI-enabled care, where healthcare professionals routinely adapt to uncertainty. This approach can enhance Layers 3 and 5 by shifting the focus from preventing failure to fostering adaptive capacity, recovery, and learning.
Comparison with other risk management frameworks.
While these frameworks provide important guidance on risk management, sociotechnical system design, and post-market surveillance, they typically address specific dimensions of safety in isolation. In contrast, our framework integrates these perspectives into a unified, incident-informed model that explicitly links failure modes across multiple layers, from data and model design to governance, clinical integration, and post-market monitoring.
The added value of the proposed framework lies in four key contributions. First, it introduces an explicit barrier-based, layered logic adapted from Reason’s Swiss cheese model, enabling the analysis of how failures emerge and propagate across interconnected system levels. Second, the framework is empirically grounded in real-world incident analysis, linking abstract risk categories to observed failure patterns in AI-enabled healthcare. Third, it provides a healthcare-specific sociotechnical integration that connects technical, human, organizational, and governance dimensions within a single structure, rather than addressing these domains separately. Fourth, it offers a practical mapping of safeguards across layers, including minimum and advanced safeguards, which supports implementation and prioritization in real-world settings. Rather than replacing existing approaches, the framework integrates and operationalizes them within a unified structure that reflects how risks manifest in practice.
5.3. Guidance to practical application
The suggested framework is intended to be applied prospectively, i.e., as a proactive safety assessment tool. It visualizes how errors in AI-driven digital health systems occur and which protective layers may help prevent these errors from resulting in patient harm. To apply the model for a specific AI-driven digital health solution, potential hazards first need to be identified. Subsequently, for the identified hazards safeguards need to be developed and integrated into system and processes. Examples of such safeguards are provided in Table 2.
This assessment process can be repeated iteratively to identify remaining hazards or “holes” within the protective layers. Additional mitigation strategies can then be developed to minimize or eliminate these vulnerabilities.
Given the limited resources available in healthcare, development of mitigation strategies have to be prioritized. Such prioritization may consider several factors, including the potential impact on patient safety, feasibility, resource requirements, interdependence of safeguards, and generalizability across settings. For example, a model trained on biased or unrepresentative data for the target population is likely to pose a substantial risk of patient harm; therefore, safeguards addressing data representativeness should be prioritized. Some safeguards can be implemented relatively easily, even in resource-constrained settings, such as clear documentation of the underlying dataset, intended purpose, and target population. In addition, safeguards may need to be prioritized for layers whose failure could lead to downstream system failures. For instance, poor data quality may compromise the reliability of the entire system. The relative risk associated with failures in each layer depends heavily on the specific purpose of the AI system and the healthcare environment in which it is deployed. Importantly, no single protective layer acts in isolation. Effective safety depends on a robust and interconnected system of safeguards operating across all layers. As a rule of thumb, we suggest, that after addressing technical vulnerabilities, broader measures such as fostering a culture of safety and providing user training should be implemented. Even after deployment, regular reviews are essential to identify emerging vulnerabilities and maintain system safety over time.
5.4. Limitations
This work has some limitations. First, the Swiss cheese model risks over-simplifying complex adaptive systems if interpreted as a linear chain of failures. We mitigate this by emphasizing dynamic holes and by explicitly integrating person, technology, and system lenses; nonetheless, system-theoretic and resilience approaches may be required for tightly coupled environments. Second, AI-driven digital health is heterogeneous: predictive models, generative systems, and autonomous decision support differ in hazard profiles and source of error. The incidents considered from the incident database relate to a broad range of AI that exhibit distinct hazard profiles. For example, predictive models are particularly susceptible to calibration drift and dataset shift 54 ; computer vision systems often face challenges related to data representativeness and subgroup performance 55 ; and generative systems introduce risks such as hallucinations, unsafe outputs, and failures in escalation.56,57 Rather than assuming uniformity in these systems, the proposed adaptation of the Swiss cheese model is intended to accommodate modality-specific risks within shared system-level safeguards. Each layer addresses different failure modes depending on the type of AI system, thereby enabling both generalizability and specificity in risk mitigation. The proposed protective layers have to be considered as guidance as they cannot be considered complete for all AI-driven digital health technologies. Specific tailoring is needed. The exploratory study was based on 15 randomly selected cases that fulfilled our inclusion criteria. This is a limited set of incidents. However, they were referring to a diverse set of technologies. The heterogeneity of included cases, while enabling cross-cutting insights, may obscure modality-specific risk patterns. The information available on the incidents was basically originating from online news or social media posts and may be biased by media reporting.
The proposed safeguard catalogue should be interpreted in light of the study’s exploratory design. The layers and safeguards were derived inductively from a limited and heterogeneous sample of 15 incidents and are not intended to represent an exhaustive or consensus-based set of recommendations. Rather than prescribing definitive controls, the framework offers an analytically grounded structure for identifying and organizing failure modes and safeguards across sociotechnical layers, and should be adapted to specific AI modalities, clinical contexts, and regulatory environments.
The thematic analysis was conducted by a single researcher using an inductive, iterative approach. Initial codes were generated from the first cases and compiled into a preliminary codebook, which was continuously refined as additional cases were analyzed. To promote internal consistency, earlier cases were revisited and recoded where necessary, and coding decisions were systematically documented. Through this process of constant comparison across the 15 cases, the thematic categories and their grouping into higher-level constructs were progressively stabilized. The proposed Swiss cheese model for AI safety was presented to several audiences in international conferences (e.g. keynote at AI in Health conference in Cambridge, September 8-10, 2025) with informal feedback received and included in the adapted version. Nevertheless, the absence of a second independent coder or formal inter-rater reliability assessment represents a limitation. The mapping of incident-level observations to thematic groups and subsequently to protective layers involves interpretive judgment, which may introduce subjectivity. As a result, the derived layer structure should be understood as an exploratory analytical framework rather than a definitive classification. This constraint may limit the transferability of the findings, and future research could strengthen robustness by incorporating multiple coders, formal codebook validation procedures, or external validation of the thematic structure.
6. Conclusion
The success of AI-driven digital health depends less on achieving perfect system performance than on creating robust, adaptive safeguards that can protect patients despite limitations in both humans and AI, and additionally on a systemic view on AI-driven system’s use in healthcare. We argue for a “just culture” in the context of AI in healthcare, in which researchers, healthcare professionals, and even patients, are encouraged to report critical incidents and errors associated with AI-based digital health tools. Such reporting is essential for learning from incidents rather than concealing them, as only recognized failure modes can be systematically addressed through appropriate safeguards. However, a “just culture” must be carefully balanced with accountability: while most incidents may arise from system-level issues that warrant learning and improvement, cases of negligence or misconduct require appropriate response. Moreover, effective reporting infrastructures must address practical constraints, including patient privacy, legal liability, and vendor-related limitations, particularly in the context of proprietary AI systems. Importantly, patients should also be enabled to report incidents, which requires accessible, transparent, and non-punitive reporting channels, alongside protections against blame or retaliation. Establishing such a balanced and inclusive reporting culture is essential to ensure safety and enable continuous learning from AI-related failures in healthcare.
Future work should (i) translate each layer into measurable maturity levels (minimum vs advanced practices), (ii) define and validate core safety metrics for AI-enabled workflows (calibration drift, subgroup disparity, alert burden, override patterns, incident rates), (iii) evaluate how “dynamic holes” evolve longitudinally as models and workflows change, and (iv) test whether a layered barrier approach reduces safety incidents in prospective, multicenter implementations. Finally, measurements have to be taken to support and facilitate the recommended “just culture”, e.g. by establishing low-level reporting options for observed safety hazards related to AI-driven digital health.
Supplemental material
Supplemental material - Safeguarding AI-driven digital health - An adaptation of the Swiss cheese model for safety
Supplemental material for Safeguarding AI-driven digital health – An adaptation of the Swiss cheese model for safety by Kerstin Denecke in Digital Health.
Footnotes
Ethical considerations
No ethics approval was needed for this research.
Author contributions
Conceptualization, Methodology, investigation, Project administration, Writing – original draft, Writing – review and editing: KD.
Funding
The author received no financial support for the research, authorship, and/or publication of this article
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
