Abstract
Engineering trustworthy artificial intelligence (AI) is important to adoption and appropriate use, but there are challenges to implementing trustworthy AI systems. It is difficult to translate trust studies from the laboratory to the field. It is also difficult to operationalize “trustworthy AI” frameworks and principles to inform the actual development of AI. We address these challenges with an approach based in reported incidents of trust loss “in the wild.” We systematically identified 30 cases of trust loss in the AI Incident Database to gain insight into how and why humans lose trust in AI in various contexts. These factors could be codified into the development cycle in various forms such as checklists and design patterns to manage trust in AI systems and avoid similar incidents in the future. Because it is based in real incidents, this approach offers recommendations that are concrete and actionable for teams addressing real use cases with AI systems.
Introduction
As Artificial Intelligence (AI) is developed and deployed in more high-consequence work environments, there is an increased need to design more trustworthy AI. AI with even exceptional performance is for naught if end users and associated stakeholders do not trust the AI or its outputs, making trust critical toward adoption of AI technologies (Lyons et al, 2016; Dorton & Harper, 2021). A recent panel introduced the concept of trust engineering: the idea that impacts to trustworthiness must be explicitly considered at each step in the development process of AI (Ezer et al., 2019).
In recent years there has been a proliferation of various frameworks for AI that is trustworthy, assured, responsible, etc.; however, these principles-based frameworks and guidelines are largely redundant (Blasch et al, 2021), and are difficult to put into practice by teams that develop AI (Munn, 2022). Some studies have shown that such principles-based approaches have failed to impact the process or the outcome of AI development efforts (e.g., McNamara et al., 2018).
We argue that because of the limitations of such top-down, principles-based approaches, the use of more naturalistic, bottom-up, evidence-based approaches to engineer trustworthy AI merits investigation. The Naturalistic Decision Making (NDM) tradition focuses on developing understanding of human cognition situated in the context of work (vs in a controlled lab setting) and enables the engineering of systems to support cognitive processes (Klein et al., 1993). Naturalistic inquiry has been effectively applied to glean insights on how trust is formed in AI in high consequence work (Rotner et al., 2021; Dorton & Harper, 2022a; Dorton, 2022), work adaptations to changes in trust (Dorton et al., 2022), and log-term trust dynamics with AI (Dorton & Harper, 2022).
Therefore, we seek to develop resources and tools for trust engineering of AI based on the naturalistic research tradition -- evidence-based tools that are based on actual cases when trust was gained or lost in AI “in the wild.” In this initial phase of research, we wanted to cast a wide net across domains, far beyond our immediate access to participants for interviews or job observation. In other words, we wanted to conduct naturalistic inquiry without relying on conducting actual inquiry.
Given these motivations and challenges, our overarching goal was to examine whether a repository of real-world incident reports can be used to conduct exploratory naturalistic research on trustworthy AI without relying on direct interactions with AI users. More specifically, we attempted to answer the following research questions: (1) what kinds of AI failures correlate with trust loss, (2) what factors contribute to these failures, (3) who loses trust in AI, and (4) when in the lifecycle trust is lost.
Methods
For this study we used the AI Incident Database, a crowdsourced catalog of publicly reported cases of AI harms that was created for the purpose of reducing future harms through application of lessons learned (McGregor, 2020). The database only compiles reports of incidents and does not impose a taxonomy on types of AI, users, cases, or failures, although at the time of writing such a taxonomy is in development and various metadata are provided (Pittaras & McGregor, 2022).
While targeted at reducing harms, we believe the database can be applied to examine other phenomena such as trust. As such, we devised a procedure and set of exclusion criteria to identify incidents in which humans demonstrably lost trust in an AI. From the initial dataset of 388 unique incidents (downloaded Nov. 14, 2022), we performed the following steps to arrive at a set of 30 trust incidents.
Excluded all incidents that did not have at least one report containing one or more keywords: trust, confident, confidence, faith, worry, worried, concern (
Excluded incidents where we could not identify evidence of trust loss, a specific human (individual or group) who lost trust, and the AI system in which trust was lost (
Excluded duplicate incidents that contained redundant reports from other incidents (
Excluded one incident [Incident 179] because it was an academic study, thereby falling outside the scope of naturalistic inquiry (
For each trust incident, we identified attributes from the associated report to answer the four aforementioned research questions. We explored the sample of incidents and inductively coded them into different categories based on their attributes, where the first author performed the initial coding and categorization, and the second author then reviewed and validated the categories incidents were placed into.
Results
Analysis of the incidents provided insights into how trust was lost in various operational contexts. Table 1 summarizes some of these results, while others are reported in the following sections. Table 1 is organized by AI type to be more convenient for those who may work in specific industries and/or with specific types of AI. Within each category of AI, we identified categories of failures leading to trust loss, factors contributing to the loss of trust, and relevant examples.
Overview of Incidents (N = 30).
Note. Numbers in brackets correspond to the incident ID in the AI Incident Database.
Types of Failures
Table 1 shows types of AI failures associated with each type of AI. Despite the unique labels across types of AI, we found commonalities across AI types. For instance, we found Undesired Output failures for both Content Service and Conversational AI types. With minor abstraction, the Unexpected Move failure category for the Vehicle/Robot AI type is conceptually similar to the Undesired Action failure category for Conversational AI (i.e., in both cases the AI did something the human did not want).
In summary, a plurality of failures across all types of AI resulted from the AI doing something users did not want or expect, or failing to do something users did want or expect. This initial research is meant to be exploratory in nature (i.e., we are merely describing and summarizing cases, and not asserting a typology of failure types); we would more systematically revisit these categorizations if future research goals depended on identifying distinct failure types.
What Factors Contributed to Trust Loss
Numerous factors contributed to trust being lost in different types of AI. For example, failures categorized as Negative Affect were associated witTh stakeholder concerns such as biased training data, historical inequities, and lack of transparency and privacy. Incidents describing a Failure to Stop typically involved detection errors and the infeasibility of transferring vehicle control to humans in emergencies when they are not paying attention. Incidents in both of these failure types were concerning the larger sociotechnical system: community members in the former, and transportation systems in the latter. Table 1 shows factors for each failure type.
There was a many-to-many relationship between failure types and contributing factors. For example, privacy concerns were associated with both Negative Affect failures and Undesired Action failures, though not necessarily for the same reasons. The incidents serve as concrete examples to provide situational context for the underlying factors affecting trust.
Though not explicit in Table 1, the existence of a harm was a universal factor in losing trust. Stakeholders lost trust in the AI because they observed or experienced something unacceptable. Harms described in these incidents varied considerably, ranging from death of an individual or financial burdens imposed on thousands of families, to showing unsavory content to children or using bigoted language.
Who Lost Trust
We identified five types of stakeholders who lost trust in AI. As noted in the methods section, we used quotes addressing trust directly where available, but we also inferred trust loss from statements and actions such as expressing safety concerns and suspending use of a deployed system.
In 11 out of 30 incidents (37%), the AI failure impacted the trust of the
In nine incidents (30%), the
In five incidents (17%), the
Three separate incidents (10%) describe a
In the 2 remaining incidents (7%), the AI impacted an otherwise unrelated
Some incidents could be assigned to multiple categories. For instance, in four out of five cases, the developer was also the party deploying the AI. When deployers lost trust, it was sometimes informed by concerns from end-users. For simplicity, however, we assigned each incident to one category, using the following order intended to reflect the party’s relative ability to make decisions about the AI: Creator, deployer, regulator, end-user, third party.
When Trust was Lost
We also analyzed when trust was lost, although we could not track or analyze those data in the same way as for the other research questions, for multiple reasons. First, this information was not always clear from the incident reports. Second, there are many possible reference points for tracking these data such as date of creation, deployment, update, or first use. We did not have a useful systematic rule for choosing among these dates as each incident had its own unique temporal features. In cases when a system was deployed and then suspended, we tracked the time from deployment to suspension. For the 14 incidents in which deployers or creators withdrew support for their AI system, the duration between deployment and trust loss ranged from a single day to multiple years.
In one case (7%), a potential deployer decided not to deploy the AI at all [Incident 54].
In two cases (14%), it was clear within a day of deployment that the AI wasn’t performing as needed.
In six cases (43%) it took between one month and approximately one year.
In five cases (36%) it took multiple years to lose trust. These were typically Burden at Scale failures (accumulated errors or workload over time), or a Negative Affect failure following a shift in the context of use. For example, when the COVID-19 pandemic led to an increase in remote teaching, students lost trust and lodged complaints against test monitoring software that had been used in the classroom for years [Incident 138].
Discussion
Key Findings
We successfully accomplished the overarching research objective of determining whether the AI Incident Database could be used to conduct exploratory naturalistic research on trustworthy AI, in lieu of relying on direct interactions with AI users. The AI Incident Database allowed us to answer various research questions about how trust in AI is lost “in the wild” across various contexts. To be more precise, we found the AI Incident Database to be a viable means to conduct naturalistic research, although it appears to have advantages and disadvantages when compared to interview-based knowledge elicitation methods such as the critical incident technique (e.g., Dorton & Harper, 2022a; Dorton, 2022):
The lack of direct interaction forced us to work with what information was present in the existing reports associated with each incident. The reports were not examining incidents through the lens of trust, providing us with far less relevant information than if we had conducted interviews.
Although we focused on direct quotes from incident reports wherever possible, we were forced to rely on interpretations of the incidents by the authors of reports, who were typically journalists and not researchers. In addition to direct quotes, we inferred trust loss from reported words and actions. The use of inference decreases the validity of qualitative research (Johnson, 1997).
The unstructured nature of crowdsourced reports made answering some research questions more difficult than others. For example, we could only analyze the temporal element of incidents in less than half of the cases (n = 14, 47%).
The lack of structure in reports was beneficial in giving us a broader perspective than if we merely interviewed users of AI. Crowdsourced reports provided insights on trust dynamics with AI developers, organizations deploying AI, regulatory bodies, and other third parties.
The use of the AI Incident Database required less time and effort to analyze than the time required to conduct interviews (when considering experimental design, institutional review board approvals, interviews, transcriptions, and analysis).
We found that using the AI Incident Database saved us marginal time and effort, at the cost of losing depth of context versus interview-based methods. Thus, we believe that this approach has value as a “quick and dirty” exploratory research method, which could be used to inform and complement subsequent research on AI trust using direct inquiry.
Other Insights
This research enabled us to glean insights above and beyond accomplishing the overarching research question. In many cases, these insights align to other findings in the literature and have implications for trust engineering of AI.
We found that trust could be lost for a wide variety of stakeholders (users, developers, deployers, etc.). These findings align somewhat with the concept of supradydadic trust (i.e., trust outside of the user-AI dyad; Dorton, 2022). They also align with recent research on system-wide trust, or the idea that different components within a sociotechnical system (AI, people, work, etc.) may engender or exhibit different levels of trust than the entire system (e.g., O’Hear et al., 2022).
We also found that trust was mediated by more than just the performance of the AI and was often grounded in the impact of the AI on the broader sociotechnical system over time. This aligned with what Dorton and Harper (2022a) called the role of utility (impact on the work system, irrespective of AI performance) in trust development, and more broadly, other work showing that there are dozens of factors mediating trust in automation (e.g., Chiou & Lee, 2023; Schaefer et al., 2016). Negative affect was also a factor in several cases, which aligns closely with findings on algorithm aversion, where stakeholders trust AI outputs less than they trust other humans, especially for tasks that are perceived as being subjective in nature (Hou & Jung, 2021).
Finally, we found that the temporal element of trust loss was highly variable across incidents. In a few cases the deployers of AI immediately recognized issues with deploying the AI in the real world; however, in many cases trust was lost only after smaller harms compiled or compounded over time. This aligns with other research on how people may gain or lose trust in AI over shorter or longer durations (e.g., Gutzwiller & Reeder, 2021; Dorton & Harper, 2022b).
Future Work
Although we succeeded in our overarching research goal, this study was merely part of a first step toward developing evidence-based guidance for trust engineering of AI. We are currently exploring several options for translating these findings into actionable tools for AI development teams. Such approaches may include checklists or inventories for formative or summative use or developing evidence-based design patterns or anti-patterns that can be employed by AI developers (e.g., Bogner et al., 2021; Mo et al., 2021).
Additionally, further work should refine and expand on this method for conducting naturalistic research on a repository of real-world incident reports. We aim to not only advance the emerging discipline of AI trust engineering, but to also help demonstrate the value of such incident repositories for AI research. These repositories can serve as a first step to complement and inform other forms of naturalistic research, which fills a critical gap in the way forward to actionable guidance for building trustworthy AI.
