Abstract
While there is growing consensus that the analytical and cognitive tools of artificial intelligence (AI) have the potential to transform government in positive ways, it is also clear that AI challenges traditional government decision-making processes and threatens the democratic values within which they are framed. These conditions argue for conservative approaches to AI that focus on cultivating and sustaining public trust. We use the extended Brunswik lens model as a framework to illustrate the distinctions between policy analysis and decision making as we have traditionally understood and practiced them and how they are evolving in the current AI context along with the challenges this poses for the use of trustworthy AI. We offer a set of recommendations for practices, processes, and governance structures in government to provide for trust in AI and suggest lines of research that support them.
Keywords
Introduction
Most industry analysts focused on government agree that artificial intelligence (AI)—an umbrella term for analytical and cognitive tools that can be used to derive insight from government’s vast stores of data—will transform government in the immediate future (Chenok et al., 2017; Dhasarathy et al., 2019; Eggers et al., 2017). AI is viewed as providing “actionable” insight that improves decisionmaking, produces efficiencies, and stimulates innovation across the wide range of government operations. Further, AI tools are frequently cast as politically neutral, or objective, ostensibly overcoming human decision-making bias in their operations and outcomes and thus presenting a powerful argument for their adoption (Agrawal et al., 2019).
In this turn toward data-driven policy analysis and decision making, government follows closely on the heels of industry in seeking to exploit AI technologies. Although lagging behind industry in many ways, there is little doubt that government will acquire their own suites of AI tools and strategies. This conclusion is underscored by President Trump’s recent executive order announcing the American AI Initiative, which directs federal agencies to cultivate AI experimentation and development (AP, 2019). Similar motivation is reflected in a recent European Commission report on AI policy and investment recommendations, which noted that AI applications in government are “emerging very rapidly leading to a potential revolution in the role and structure of government and its relationship with individuals and businesses” (AI High Level Expert Group [HLEG], 2019b).
Although widely regarded as revolutionary, our understanding of the benefits and costs of deploying AI in government contexts is still developing. Industry analysts have recognized that this powerful combination of data and computational strategies may challenge democratic institutions in significant ways (e.g., Araya, 2019). What we know now suggests that using AI-powered analytical tools can produce substantial threats to privacy, autonomy, equity, and fairness for individuals in their quotidian pursuits and in their roles as citizens (Eubanks, 2018); the long-term consequences have yet to be explored. The U.S. executive order acknowledges the relevance of “civil liberties, privacy and American values” to what is essentially a new information policy but leaves these issues unelaborated (AP, 2019). Similar concerns about democratic values are reflected in two European Commission reports on AI (AI HLEG, 2019a, 2019b).
Ethical issues presented by AI strategies have entered into public discourse stimulating fears about AI and threatening to lead to public rejection of these tools. A recent Pew study reported that a majority of those surveyed viewed four types of algorithmic decision making as unfair and unacceptable for making decisions with real world consequences for humans (Smith, 2018). Another survey of Americans suggests mixed support toward AI as well as the perceived need to carefully manage AI development, particularly in the area of privacy (Zhang & Dafoe, 2019). Unfortunately, Americans also report their lack of confidence in the ability of any institution, including government, to manage AI development (Zhang & Dafoe, 2019).
These findings suggest that the effective use of AI in government is more than a matter of technical advances. Instead, public trust in AI must be cultivated and sustained; otherwise, useful AI systems may be rejected and government decision making may lose its legitimacy (AI HLEG, 2019b). We argue that public trust can be achieved when AI development takes place in contexts characterized by policies situated firmly in democratic rights and supported by well-documented and fully implemented governance practices. The importance of ethics is recognized by industry and public watchdogs through an array of frameworks, such as IEEE’s (2019) “Ethically Aligned Design” and the Asilomar AI Principles (Boddington, 2017; Future of Life Institute, 2017). Although these frameworks propose values and criteria to guide the design and development of AI tools, they are rarely accompanied by tangible implementation of those values (Crawford et al., 2019; Hagendorff, 2020). Indeed, ethics “owners” in industry find powerful internal logics aligned against such implementation: the tendency to dismiss critique, the pressures of the bottom line, and the assumption that technology itself will solve ethical problems (Metcalf et al., 2019). In government contexts, we know of no guidance for how to design tools, processes, policy, and governance structures that implement even the most accepted AI ethical principles.
We attempt to provide a bit of such guidance in this article. We consider a broad range of literature addressing the uncertainties of AI and weigh the consequences of this uncertainty for use of AI in government policy and decision making. Our analysis exposes a stark contrast between the reasoning used by traditional decision support tools for government and that enabled by current AI development. We argue that the latter jeopardizes values used traditionally to frame innovations in data and technology as well as government’s ability to cultivate and sustain public trust in AI. We propose several conditions that create a foundation for deploying trustworthy AI applications for government. We focus specifically on the use of AI in government contexts characterized by policy development and execution as well as by citizen–government interactions related to services, benefits, fraud detection, personnel decisions, criminal justice, and other similar contexts.
This article is organized into six sections including this “Introduction” section. In the second section, we identify AI strategies that are under development by governments and consequential for the public. We suggest that, at least in the foreseeable future, the AI tools most likely to affect citizens are those referenced in the literature as artificial narrow intelligence (ANI). In the third section, we draw on the extended lens model (ELM)—a conceptual model of human judgment—to help illustrate a set of distinctions between policy analysis and decision making as we have historically understood them with their contrasting logic in the current AI context. Using the model as a framework, we show that AI introduces challenges in two major areas: (1) assessing the adequacy of data used in AI applications and (2) evaluating the outcomes of advanced analytics in light of values that infuse our democratic norms. These issues are taken up in the fourth and fifth sections, where we also recommend actions to create conditions for trustworthy AI. We conclude in the sixth section with a set of larger policy and research recommendations.
The Looming Prospect of Data-Driven, AI-Powered Government
What Do We Mean by AI?
When we think of AI, what often appears in the popular imagination are robotic entities with intellectual capabilities that far exceed those of humans. This is referred to as artificial superintelligence (e.g., Bostrom, 2003; Mitchell, 2019; Wirtz et al., 2019), which is viewed as currently unrealized, although opinions differ over when such capabilities might be achieved. In contrast, artificial general intelligence (AGI) refers to a type of AI that is able to learn and transfer its skills and knowledge into other domains without help from humans (Bostrom & Yudkowsky, 2014; Helbing et al., 2017; Wirtz et al., 2019). The status of AGI development is somewhat less clear, with arguments over whether software meets these qualifications largely dependent on what counts as “learning.” Deep neural networks comprise one form of AI in which self-programming takes place; however, as Mitchell (2019, p. 112) points out, these systems make errors of the kind to which humans are not susceptible, so “how can we say that these networks ‘learn like humans’ or ‘equal or surpass humans’ in their abilities?”
More relevant to our concerns is artificial narrow intelligence (ANI), a less advanced but currently far more common form of AI, which consists of software able to solve problems in a single domain. Common examples of ANI include machine learning and its many variations, neural networks, deep neural networks, and natural language processing. These techniques are often applied in contexts such as autonomous vehicles, computer vision, and facial recognition. ANI is the subject of forecasts about the use of AI by government (Gasser & Almeida, 2017); this is what we will mean when referring to AI in the following discussions.
We distinguish ANI from current uses of robotic process automation (RPA), a form of automated software “bots,” that lack “cognitive” decision-making processes (Eggers et al., 2017). RPAs carry out repetitive tasks such as opening email and attachments, scraping data from the web, and other operations that, when automated, save time and labor. Instead, we focus on AI applications that involve a form of learning related to decision making that can be consequential for citizens or businesses. For example, the city of Las Vegas developed an AI system to scan tweets, extracting those mentioning an experience with food poisoning with the purpose of identifying restaurant locations to be prioritized for inspection (Eggers et al., 2017). In this case, a form of “supervised” AI was used in which humans coded tweets for indications of food illnesses, enabling the system to learn the language characteristics that distinguished between those that did and did not represent food illness. This process enabled the city to identify and inspect venues, improving the effectiveness of their inspection system.
Supervised AI can be expensive due to costs of human participation. Thus, “unsupervised” AI, in which there is no human guidance, can seem to be vastly superior. The software learns to identify and categorize instances of a phenomenon on the basis of features in the data itself. However, at times, this produces problematic results as we shall see, as it is often not clear what features play a role in the categorization (Burrell, 2016). A recent study of facial recognition algorithms conducted by the National Institute of Standards and Technology demonstrates that even supervised algorithms differ markedly in their rates of accuracy (both false positives and false negatives) in ways that vary substantially by ethnicity, age, and sex (Grother et al., 2019).
So far, governments use AI technologies to make welfare payments, detect fraud, make immigration decisions, answer citizen queries, automate help desk or call center operations, adjudicate bail hearings, and plan new infrastructure projects (Martinho-Truswell, 2018). Although government’s experience with AI applications has yet to be exhaustively documented, it is clear that some projects are problematic. Gilman (2020) reports that AI is increasingly used by states to determine welfare benefits and detect fraud, with substantial examples of error. For example, Michigan implemented an AI system that found 5 times more incidence of unemployment insurance fraud than the prior system, but 93% of the determinations turned out to be in error. Similar algorithmic errors have been found in Medicaid eligibility determinations in Indiana, Arkansas, Ohio, and Idaho. Pointing to errors in countries around the globe, the United National Special Rapporteur on Extreme Poverty and Human Rights, Alston (2019) reports that social welfare systems are “increasingly driven by digital data and technologies that are used to automate, predict, identify, surveil, detect, target and punish” and recommends “regulation of digital technologies, including artificial intelligence, to ensure compliance with human rights….”
What’s New About AI?
Data have always been the basis for government decision making; technology has supported data analysis, and democratic values traditionally have guided innovation. One of the earliest agendas for digital government research proposed a focus on technology tools that support policy deliberations and management decisions (Dawes et al., 1999). Digital government scholars then envisioned governance processes characterized by access to the technology and information used to support decision making, a citizen-centered perspective in service delivery, and the enduring democratic values of transparency, accountability, and trust in government (Dawes, 2008; Harrison & Sayogo, 2014). Open government principles of transparency, participation, and collaboration (still promulgated by the 78 nation Open Government Partnership; https://www.opengovpartnership.org/stories/charting-the-next-three-years-for-ogp/) provided for citizens’ access to government data, involvement in decision making, and engagement with government leadership with the goal of establishing trust (Harrison et al., 2012).
Earlier limitations on data availability and integration across domains posed seemingly intractable boundary conditions for the tasks considered feasible. But it is now evident that a new form of digital government is emerging, redefined by decision making made possible through Big Data in all its forms coupled with its exploitation using AI and other analytical tools (Puron-Cid et al., 2016). In this new digital government, decisions turn on the adequacy, the provenance, and the relationship of data to the problem at hand, as well as the formidable complexities inherent in the operations of current AI tools, creating conditions for the erosion of trust in government decision making. We illustrate this in a comparison below between the ELM and current AI decision making.
The ELM Versus Data-Driven AI Decision Making
Multicriteria decisionmaking is one of the most common approaches to understanding decisionmaking and decision support systems from both researcher (Wallenius et al., 2008) and practitioner perspectives (Dhami & Mumpower, 2018). Computer science applications using multicriteria decision making are plentiful in the management science and information systems literature (Wallenius et al., 2008). The lens model (Stewart, 1990), developed initially by Egon Brunswik as a theory to explain how living organisms interact with their environment, was popularized by Kenneth Hammond and widely adopted by cognitive psychologists interested in better understanding and improving human judgment and decision making (Dhami & Mumpower, 2018).
The lens model (Figure 1) assumes that living organisms perceive and interpret elements in their environment (observed events) through a series of data elements (cues), aggregated using some form of calculation to make a judgment (forecast) about them. Lines between cues represent correlations between them. Lines between the cues and the observed event in the figure represent relations between the cues and the event in the environment, and lines between cues and forecast represent the ways in which the forecaster is using these cues. Both relationships in the lens model can be expressed formally as a pair of probabilistic models between the cues and the observed event and between the cues and the forecast:

Brunswik’s lens model. Source. Adapted from Brunswik (1952).
where Xi represents the cues, MOX and MYX represent the models describing the relationships between the cues and the event and the cues and the forecast, and the E’s represent the probabilistic errors in each model. The line labeled G represents the correlation between the event and the forecast. The lens model has been applied both to improve judgment precision and consistency and to develop decision support systems (Rohrbaugh, 1992).
For example, a decision support system could be designed to filter and rank candidates (multiple Ys) for fieldwork positions in the Department of Social Services. The application program may use several pieces of data (recommendation letters, college transcripts, personal statements, and work experience) as potential cues to forecast the future performance of candidates (Ys) in order to rank them in terms of their match to the job. In this traditional approach to decision support, expert judges would construct the model by drawing upon domain knowledge as well as data utilization in preparing the model for use. What cues need to be included in the analysis? What analytical or statistical models are most useful in this case to match the environment? Given that models were constructed using data generated by human judges, another key issue is related to the understanding of the mental models and values of these human judges, as well as their potential strengths and biases.
Using the lens model to improve decisions and decision support has shown that both the predictability of the environment and the reliability of the judgments are influenced by additional factors (see Figure 2). Environmental predictability depends on the relationship between the event and the cues but may also be affected by errors in the selection of cues actually being used (the difference between “true descriptors” and cues). Reliability of the forecaster, on the other hand, is associated with information processing, but also with how information is acquired for the analysis or interpreted by the forecaster (difference between cues and “subjective cues”).

Extended lens model. Source. Adapted from Stewart & Lusk (1994).
The first type of error—the difference between true descriptors and cues—may emerge because of problems of fidelity of the information system such as data quality or use of a proxy to observe qualities that are hard to observe directly. The second one—differences between cues and subjective cues—emerges when data need to be extracted from some context such as a document, an image, or human interpretation of a qualitative indicator.
In our recruitment example, assume that grade point average (GPA) is a cue that an application is using to assess a skill among applicants. Being a proxy of skill, GPA is not an exact match to the true descriptor of the skill. In addition, if a candidate is graduating, it is possible the actual GPA reported in their transcript is not current. Moreover, if the number is scraped from the document by a web tool, there is a potential source of error in the process.
This ELM helps us understand the ways in which human judgment processes differ from those that are produced using AI strategies. Below, we explain four such issues and their relevance to trustworthiness within AI-based decision support.
1. The fidelity of the information system. The ELM assumes the cues selected to represent “true” descriptors are valid proxies and that they are accurately measured. Error is created when this assumption is violated, such as when cues are inaccurately measured or do not, in fact, validly represent the concepts embodied by the true descriptors.
Within the AI context, data must also be complete, valid, and accurate, but the challenges here are formidable since data for the analysis come from multiple sources assembled in contexts with different and potentially unknown data management practices. This can lead to high variability in the validity and quality of the data. An additional issue arises when data are collected in contexts that may not be appropriate for use within the domain under current investigation. 2. Reliability of information acquisition by the judge. The ELM acknowledges that data in the model generation process will become subjective in some respects simply by virtue of inherent interpretive and application processes that are part of any preprocessing and handling by humans.
In AI contexts, as data are cleaned, integrated, and applied to problem-solving, many decisions and judgments are made by humans who unavoidably bring their own biases to bear on the outcomes. AI software development requires multitudes of decisions about issues such as which data to include in an analysis, how to handle missing values, how data are weighted, and how to integrate data with different levels of aggregation, all of which contribute to the opacity of the algorithmic processes. Since public servants often lack expertise in AI, such decisions are likely to be made by software engineers. 3. Robust environmental models versus data-driven approaches. In traditional knowledge generation, efforts to find true descriptors are part of what one calls “theorizing” or domain-based model building. Such activity is conducted by domain experts and is based on what is known about the domain of the environmental object being modeled. Error is created when the “true” descriptors cannot be located.
In contrast, AI strategies such as deep neural networks often eschew both theory and domain knowledge focusing instead on using available data to locate a combination of predictive factors and derive a set of decision rules. Hence, the “data-driven” nature of AI solutions (and the oft-cited claim that causal models are not needed since prediction is sufficient for many contexts, see, e.g., Kleinberg et al., 2015), and their susceptibility to issues related to fairness and privacy. 4. Assessment of the forecast. In traditional model building, analysts can follow the sequence of information processing, interrogate the validity of fundamental assumptions, and look for mistakes in statistical or computational processes bearing on the adequacy of the forecast. In these ways, traditional model building assumes it is possible to achieve a close match between the environment and the forecast model.
In contrast, AI strategies are not necessarily amenable to human assessment and critique. This is because characteristics of learning algorithms make it difficult to examine internal decision processes. The complexity of model development makes it difficult for even experts to fully grasp the nature of the calculations, with significant implications for explaining how an algorithm works or representing its logic to those affected by the decisions made. Further, AI models may overfit the algorithm to training data producing failures to predict well with new data.
These four issues may be sorted into two types: first, challenges stemming from the use of government data (Issues 1 and 2) and, second, challenges stemming from algorithmically driven decision making (Issues 3 and 4). In the fourth section, we explain a set of concerns related to Issues 1 and 2, including basic requirements for insuring the integrity of data used in AI applications. In the fifth section, we review a set of complexities arising from Issues 3 and 4, discussing implications for the values of transparency, accountability, and responsibility in government decision making as well as for the ability to insure fairness and privacy.
Trustworthy Data for AI
Availability and quality of data are long-standing issues in decision support (Dawes et al., 1999), but data today differ in scope and in its potential uses. The Big Data of government—taxes, health, public safety, social welfare, and education to name a few—are now supplemented by Open Government Data and the tsunami of data produced by the Internet of Things. Interoperability constraints have now largely been overcome by advanced integration technologies (Klenk & Majerol, 2018).
These enormous data volumes make AI feasible (Piletic, 2018). The more information available to AI, the more systems can learn and improve accuracy (Marr, 2017). On the other hand, without AI, Big Data loses some of its potential for innovation. In this sense, Big Data and AI are mutually enabling. However, problematic data introduce risks that can lead to economic devastation for industry and that may erode trust in and the legitimacy of government. Thus, trustworthy AI rests on the quality of this fuel.
High-Quality Data
Although much attention is now devoted to improving data analytics, much less attention is devoted to improving data management, which is essential to data analysis of any kind, but critical to trustworthiness within the AI context. Unfortunately, providing a sound foundation for building AI systems is no easy accomplishment, since government agencies do not typically possess appropriately curated data resources (Mehr, 2017). Moreover, critics remind us that Big Data is not necessarily better data and that data taken from its original contexts threatens to lose its original meaning (boyd & Crawford, 2011).
Data management is relevant at two points in the system. At the operational level, data management improves the general fidelity of the information systems and ensures the existence of metadata documenting data’s “origin, format, lineage, and how it is organized, classified, and connected;” metadata are vital for determining the uses to which the data can be applied (Brunet, 2018). In the AI preprocessing stage, data management enables reliability of data acquisition and replicability of the analysis, both of which are essential to preserving the transparency of any analysis.
Data Security and Privacy
Government agencies are required by statute and regulation to safeguard access to personally identifiable data, although as a recent data breach at the Office of Personnel Management illustrates, they do not always measure up to these challenges (Marks, 2018). Privacy is even more relevant recently because the greatest value obtained from AI comes from integrating volumes of data from multiple sources that, at the same time, can readily generate trails of potentially identifiable data (Johnson, 2017; Kitchin, 2016). When joined and analyzed together, integrated data sets provide the potential for “the ability to assemble multiple views of the customer [that] may provide inappropriate insights” (O’Leary, 2014, p. 72). Such integration is what Big Data and AI make possible and valuable. There is an inherent tension here because “the utility and privacy of data are intrinsically connected, no regulation can increase data privacy without also decreasing data utility” (Ohm, 2010).
Addressing Data Challenges for Trustworthy AI
We propose two recommendations related to data used in AI. First, government needs to implement enterprise data management as an essential foundation for trustworthy AI. While others exist, one of the most important frameworks for data management has been created by the Data Management Association and is expressed in the Data Management Body of Knowledge (DMBOK; DAMA International, 2017). DMBOK is an industry standard for assessing and strategizing about data management with recommendations for nine dimensions of data management: data governance, data architecture and design, database management, data access (security) management, data quality management, master data management, data warehouse and business intelligence management, records management, and metadata management.
The key to establishing standards for all dimensions is governance, which consists of structures and practices required for authority and control (planning, monitoring, and enforcement) over the management of data assets. Formal policy-making bodies and regularized, systematic practices, enable organizations to make decisions about data and to create policies that stipulate how people and processes are expected to function in relation to data (Harrison et al., 2019). The overarching purpose of data governance is to ensure that data used throughout the organization are managed according to policies and best practices with implications for several important outcomes (Ladley, 2012). Establishing well-functioning and sustainable data governance structures enable an organization to minimize security issues and establish basic safeguards related to creation, access, and use of data.
Second, governments seeking to deploy AI should cultivate data literacy. In the new digital government, the integrity, security, and appropriateness of data is a prerequisite for trustworthy decision making. Data literacy must be seen as an imperative for any government employee whose responsibilities bear on data collection, manipulation, and use. Data literacy encompasses knowledge of data management and enables practitioners to promote the creation, maintenance, and security of quality data in systems supporting daily operations (Eversden, 2019). In addition, literacy for AI practitioners requires careful documentation of judgment calls in the processes of data cleansing and preparation as well as continuous reflection on the need to understand limitations of the data set and what questions can be asked of it (boyd & Crawford, 2011). Such practices enable policy and decision makers to assess and critique the grounds upon which AI development takes place, enabling transparency and providing a basis for accountability.
Trustworthy AI Algorithms
In comparison with the ELM, AI development may produce decision-making models with biased results, additional privacy risks, and outcomes that are produced substantially out of direct human oversight and are thus resistant to evaluation.
Fairness and Privacy
Recent research shows that AI makes it possible to infer training data using the learned model, producing a different kind of risk to privacy (Kearns & Roth, 2019). Beyond data breaches and the need for effective anonymization, this challenge is concerned with the problem of data inference, guessing data using the learned model as a form of reverse engineering and potentially due to overfitting the model (Yeom et al., 2018). Protecting data privacy that results from algorithmic inferences will become more important as the use of AI extends to new domains of government services, and as requirements for transparency increase, resulting in more publicly available models and hence increasing the risk of data inference.
The challenge of fairness is another source of uncertainty in applying algorithmic AI in government. Government profiling and service targeting are not new; however, the use of algorithms may increase such dangers, disrupting policy and governance principles and reducing citizen equality in the service of achieving efficiencies (Henman, 2019). Further, data taken from their original production venues and used elsewhere for AI-driven decision making produce outcomes that can distort (Crawford, 2013; Dawes & Helbig, 2015); stigmatize (Lageson, 2016), and discriminate (Brayne, 2017; Erickson et al., 2018).
Finally, AI algorithms may amplify hidden biases embedded in the data or even create new ones (Kearns & Roth, 2019). The digital footprint (data points) produced by each individual through our use of digital services provides great opportunities for personalization but also for profiling and discrimination. The vast amount of data available about each individual, commonly referred to as a data vector, include numerous correlations with gender, race, or religion that are all potential sources of bias. Acknowledged manifestations of this problem exist in currently well-known examples of biases in algorithmic policing (Brayne, 2017) and recruitment tools (Dastin, 2018). Restricting the use of certain inputs such as race and gender in the AI process is not a solution to bias because of the multiple correlations within individual data vectors. The algorithm will treat men and women differently by discovering variables that covary with gender within the vector. Solutions to this problem are not available within the current state of knowledge.
Opacity, Transparency, and Explainability
AI opacity stems from three sources, each highly relevant to government contexts (Burrell, 2016). First, opacity is created from the secrecy with which models are guarded by their owners, most frequently private sector vendors (Burrell, 2016). Second, opacity derives from the lack of in-house expertise of government employees, limiting government’s ability to develop AI systems of its own or to evaluate the quality of vendor products (Desouza, 2018). Third, opacity is inherent in the functioning of algorithms, partly because the interactions among large numbers of variables make it difficult for nonexperts to comprehend them (Burrell, 2016).
Algorithm secrecy masks numerous points of judgments embedded in code, including decision points such as: What data to include or exclude in the analytical process? How to resolve trade-offs between competing harms or benefits? and How to balance the likelihood of false negatives versus false positives (Brauneis & Goodman, 2018)? Further, machine learning algorithms often overfit their training data, usually with added predictors, producing models with poor performance on new data (Yeom et al., 2018). The trade-off between model fit and performance requires expert judgment for a satisfactory resolution (Anastasopoulos & Whitford, 2019). These decisions have policy consequences that may well be left to the control of system designers working for private vendors, leaving policy makers potentially unaware of the consequentiality of such choice points.
In addition, as AI applications learn and program in the process of computation, their logics are “not intelligible to humans” (Burrell, 2016). Deep learning approaches and other machine learning systems produce predictions based on models that involve many computational layers and nodes preventing even expert analysts from examining decision rules (Mittelstadt et al., 2016). Since AI algorithms are data-driven, they are also case-specific and evolving and thus do not necessarily produce consistent decisions. Decisions about the same person or comparable people may differ at varying points in time. “This plasticity creates challenges for understanding and interrogating a model’s behavior, as input-output behavior can vary from case to case and can vary over time” (Mulligan et al., 2020, p. 140). Opacity of these kinds makes it difficult to explain, in terms that will be understandable to humans, the basis for particular decisions.
Thus, the essential difference between traditional model construction and those produced in AI applications is that “we are asking a machine to make inferences and conclusions for us” (Holt et al., 2019). And yet, numerous decisions are made in the construction of the model that warrant intervention by humans with domain expertise and policy-relevant judgment. It is not possible to determine or understand how decisions are produced, creating inexplicable outcomes with potential losses of trust in both the system and for government (Mehr, 2017; Van Engers & de Vries, 2019). Issues of accountability loom large when governments do not understand and cannot make the bases for their decisions sensible to those who are their targets (Brauneis & Goodman, 2018; Chui et al., 2018). Under these conditions, “the public cannot assess the efficacy and fairness of the governmental process, and the government agent has lost competence to do the public’s work in any kind of critical fashion” (Brauneis & Goodman, 2018, p. 109).
Improving Trustworthiness for AI-Powered Decision Making
To address the problem of algorithm-related privacy problems, the developing field of machine ethics (Allen et al., 2006) has undertaken initial steps using the concept of differential privacy (Han et al., 2017; Kearns & Roth, 2019). The approach involves the use of algorithms that produce randomized data following known probability distributions. Given that the algorithm developer knows the way in which data are scrambled, it is possible to extract main patterns and models while protecting individual’s privacy. Although initial results are promising, more research is needed to better understand the strengths, weaknesses, and necessary improvements to the approach and also to develop other alternative approaches (Kearns & Roth, 2019).
In the area of fair models (algorithms), some initial steps are also being taken. Although common approaches to fairness conceptually involve the idea of being “blind” to individual characteristics (gender, race, and religion) to avoid discrimination on the basis of such characteristics, current efforts in producing fair models show that it is necessary to adopt an open and transparent definition of the populations that are to be protected as well as definitions of accuracy of the model (Kearns & Roth, 2019). Some experts even argue that once a precise mathematical definition of fairness is developed, algorithms can supervise and avoid bias in an automatic way (Etzioni & Etzioni, 2016). However, others contest this position suggesting instead that AI should be used only to augment human decision making without replacing humans in the decision-making process (Shein, 2018).
In the meantime, the problems with issues of algorithmic opacity will continue to jeopardize government’s traditional values for transparency, explainability, and accountability. These conditions argue for conservative use of AI support in decisions with consequences for individual citizen welfare (Mehr, 2017). Instead, substantial human oversight and domain expertise would seem to be minimal requirements for AI system development. The goal should be an accumulated track record of successful AI development in low-risk applications that provide test beds for experimentation, along with other strategies discussed below. As Burrell (2016) suggests, experimenting with simplified forms of machine learning enables “feature extraction,” which identifies features critical to classification may provide an initial strategy for experimentation.
Although AI decision making has been celebrated for its ostensible ability to operate without human intervention (Henman, 2019), we view this as an untrustworthy strategy for government decision making. A more useful approach may be to view AI-driven systems as composed of computational components, human actors, and institutional arrangements that guide the use of AI (Johnson & Verdicchio, 2017) through the creation of AI governance structures. Such an approach would highlight questions relevant to trustworthiness such as: Who decides which AI systems are designed? Who makes decisions about their design and implementation? Which tasks are delegated to machines and which to humans? How are humans who work with AI systems trained? and How can algorithmic decisions be appealed or challenged within legal and regulatory processes? AI governance structures must be populated by the widest range of stakeholders including policy makers, government domain experts, and AI systems developers inside and external to government, along with individuals who will be affected by AI decision making (Dhasarathy et al., 2019; Gasser & Almeida, 2017).
We call attention to three further strategies likely to improve trustworthy AI development: algorithmic accountability, algorithmic audits, and participatory AI development and testing. Algorithms will only be sufficiently transparent if government creates and maintains records that document their objectives for algorithms and vendors disclose sufficient information describing how algorithms are developed, making it possible to trace the decisions taken in the construction of AI algorithms (Brauneis & Goodman, 2018). Some government legislatures have considered mandating practices designed to improve transparency and accountability for government-secured vendor-created AI systems (Pangburn et al., 2019). However, it behooves public managers to insist on creating the ability to document the data used and the decisions essential to the construction of accountable AI models.
Algorithmic audits using a variety of research and statistical methods can be used to assess the extent and type of harmful bias that may inhere in the algorithms used by publicly available online service providers such as Google and Netflix (Sandvig et al., 2014); public managers should insist on analogous assessments related to government AI products. A more pointed approach consists of algorithmic impact assessments, such as those recommended by AI Now (Reisman et al., 2018). These are self-assessments of system impacts on values such as fairness and justice prior to deployment of an AI application as well as during its use, advance notice to the public about the deployment of an automated decision process and its associated impacts inviting public comments and questions, and due process mechanisms for challenging the outcomes of decision making that are unfair or biased and that agencies have failed to address.
It is essential that a wide variety of AI stakeholders be involved in the processes of participatory AI development and testing. Such involvement should take place in contexts in which AI system designs, functions, and operations are “contestable” (Mulligan et al., 2020). Possible scenarios include interaction between software developers and domain experts at the point of development or interactions between experts and users with systems already developed, empowering domain experts to use their knowledge about training data and decision rules to shape how system decision making takes place and to play a governance and oversight role. It should also be possible to challenge particular outcomes through the use of feedback from users and others affected by system outcomes (Dhasarathy et al., 2019).
Concluding Remarks
As our comparison with the ELM demonstrates, the AI decision making of today will proceed quite differently from the human judgment processes we have historically sought to refine. But it is worth recognizing that AI deployment in government is still in its infancy; these new modes of decision making, and the problems presented by AI that this article has warned against, are only now beginning to make their appearance. We hear most often about egregious failures of AI that have harmed citizens while pursuing improvements in efficiency and effectiveness; however, we acknowledge that this is only a partial picture. One substantial limitation of this article is that we do not know how extensively AI has been deployed in government and to what extent the unfortunate examples that appear in the press and in occasional research reports are common in contrast with how often AI is used in what might be viewed as successful efforts. Extensive research documenting the development, use, and outcomes of AI implementations is not available. Herein, we have relied on the literature in which AI is critiqued and on what has been written by the software developers, AI experts, and scholars that have called the attention of the public and the research community to these issues.
Clearly, we need to know more about the ways in which AI and its associated data are being used by government at all levels. What data resources are used in current AI projects and how are they assembled and deemed fit for use? What specific analytical strategies are used in AI applications? In what policy and service-related contexts is AI deployed? Who has oversight for these projects? What governance structures are now being used? How are current projects being audited and evaluated? The answers to these kinds of questions will provide insight into government practice that can inform our efforts to assess trustworthiness.
In the meantime, it seems likely that AI uncertainties will continue. Of utmost importance is the need to ensure that democratic values will frame the deployment of AI in government, since there are powerful economic arguments and political forces driving its development. Depending on industry to translate codes of ethics into AI products for government is an unacceptable solution. Even though the technical landscape of AI may evolve in ways that resolve or mitigate current issues, it may also generate entirely new causes for concern. Thus, public managers must take active roles in creating the conditions for trustworthy AI using values that have historically sustained trust in government. In this endeavor, there is a role for new and supportive research agendas.
A cornerstone of computer and information science education for many years, data management is admittedly expensive and now, due to the new demands of AI, requires an enterprise approach. This may be a daunting challenge in a status quo characterized by resource limitations (Harrison et al., 2019). What barriers limit the ability to systematically manage data within and across complex government organizations? What technical advances and social arrangements might contribute to innovation in data management?
Enterprise data management coupled with new demands for data literacy on the part of public managers and other government employees can help to ensure that data are appropriately fit for the uses to which it will be put (Harrison et al., 2019). But we need to have a thorough understanding of the components for data literacy training that are most relevant to AI challenges.
Beyond issues related to data, it is clear that more general governance structures for AI development must be focused on identifying potential norms, structures, mechanisms, and processes for decision making about AI in government organizations (Dafoe, 2018; Gasser & Almeida, 2017). The appropriate goal for creating trustworthy AI should be to minimize uncertainty and risk. What governance strategies are best equipped to manage the risks of AI development and designate “trustworthy” AI advances? Who should be entitled to make what kinds of decisions and by what mechanisms? Who needs to participate; who needs to be informed? and How is information about AI systems to the public best transmitted? These are questions to which there are no definitive answers at this time. Indeed, it behooves digital government researchers to undertake programs of inquiry that explore such issues. The efficacy of algorithmic accountability, audits, and impact statements must be explored. Frameworks for participative AI developments must be developed and tested.
Finally, we know that AI systems will change workplaces, education, and government. Thus, it is important to inquire about AI’s long-range effects on government (Crawford & Calo, 2016). Poorly planned AI projects producing inaccurate results will erode trust in government. But very little is known about the long-term consequences of automation as it currently exists and the effects of AI in the future on the relationships between citizens and government. Rare studies of automated welfare decision making find that it creates new roles and relationships between citizens and public authorities (e.g., Wihlborg et al., 2016). We know virtually nothing about the human consequences of interacting with AI-driven decision support systems and service providers in government, but clearly, research must address this topic since these consequences also bear on questions related to how government can earn and sustain the trust of its citizens.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
