Abstract
Our third article on the history of evaluation use affirms its importance in evaluation practice and related literature. It first highlights the centrality of use in the field’s professionalizing documents, extant theories, and the persistence of continuing research. Next, it discusses the challenge of evaluation theories in general, including the prevalence of prescriptive theories, and provides criteria for a good descriptive use theory that even the most detailed use theory does not currently meet. The third section reviews existing use “theories,” nine from Alkin’s evaluation theory tree, two from the literature, and two existing influence theories. This article concludes with a discussion of research on evaluation use past and present, including the effects of competing definitions, along with thoughts for future inquiry.
Preface
This article, the third in a series documenting the historical record of the concept of evaluation use, details the centrality of use in current practice and examines theories of and research on evaluation use and influence. Our purpose in compiling the history of this evolution is to help evaluators better understand the research grounding of evaluation use, making clear the potential role of specific practices, the related effects of organizational context, and how best to study them.
To review, the first article in the series traced the evolution of this increasingly pervasive concept, documenting two developmental streams (educational testing and measurement and the social sciences), detailing competing perspectives about the prevalence of use, and presenting its traditional categories in the literature. The second article began by exploring definitions of use and its opposite, misuse, then described an expanded concept labeled evaluation influence and created a “pattern theory” by summarizing research-grounded factors related to increased use.
As noted in Article 2 (Alkin & King, 2017, p. 447), the distinctive characteristic of a pattern theory “is not a set of laws that promise prediction, but rather a series of ‘tendency statements’” (Lincoln & Guba, 2004, p. 227) that have meaning when applied together. The pattern theory of evaluation use suggests that evaluation use results when certain types of users (i.e., those committed to use) interact with certain types of evaluators (i.e., those committed to fostering use) to do evaluation activities in a certain way (e.g., using appropriate and credible methods) in certain contexts (i.e., environments where potential users can take action based on the evaluation process/results). Such a “tendency statement” makes no guarantees, but the commonality of the pattern, supported by research and hinting at a realist framing of “what works for whom in what settings,” highlights the general tenets of what the field has learned over roughly five decades of research on evaluation use.
The Centrality of Use in Evaluation Practice
Time has been kind to the concept of evaluation use. Solid evidence now exists detailing its prominent role and widespread acceptance in the field. Whereas historically the field began with a laser focus on methodology, the idea of use is now integral to the way that most practicing evaluators think about evaluation. Evidence of the centrality of use comes from three sources: (1) its prominence in the field’s professionalizing documents, (2) its distinct role in numerous evaluation theories, and (3) researchers’ continuing focus on this topic.
First, evaluation use has clearly established a visible presence in the evaluation field’s professionalizing documents. The Program Evaluation Standards (Yarbrough, Shulha, Hopson, & Caruthers, 2010), the newly approved draft of American Evaluation Association (AEA, 2018a) Evaluator Competencies, and the proposed revision of the AEA’s (2018b) Guiding Principles are the oldest and the newest such documents in the field, and, in each, evaluation use/utility plays a role. The Program Evaluation Standards (Yarbrough et al., 2010), originally published in 1981 and revised in 1994 and 2011, place utility first of the initial four domains. 1 Utility refers to the extent to which use is possible. Utility “…describes when and how evaluation worth is created, for example, when evaluations contribute to stakeholders’ learning, inform decisions, improve understanding, lead to improvements, or provide information for accountability judgments” (Yarbrough et al., 2010, p. xxviii).
The eight utility standards reflect two categories from the pattern theory of research-based factors related to increased evaluation use discussed above: (a) evaluator factors, including evaluator credibility, attention to stakeholders (engagement), and concern for consequences and influence and (b) evaluation factors, including meaningful processes (that contain negotiated purposes and explicit values) and products (relevant information) and timely and appropriate communicating and reporting. When the joint committee working on the third edition debated a major reordering of domains that would have put accuracy as the lead domain, Stufflebeam (as cited in Patton, 2012) wrote in an impassioned letter to the joint committee members: The new sequencing of the categories of standards is illogical and a counterproductive break with both the JC Standards’ historic rationale and the rationale’s position in mainstream evaluation thinking and literature…The re-sequencing of categories of standards ignores the historic case for the original sequencing of categories of standards, as Utility, Feasibility, Propriety, and Accuracy. Originally, that sequencing was recommended by Lee Cronbach and his Stanford U. colleagues…Given the scarcity of resources for evaluation, studies should be conducted only if they will be used. (p. 389)
Second, more recently and following 3 years of extensive deliberation, AEA’s Competencies Task Force placed a competency related to evaluation use in each domain of the recently board-approved competencies: professional practice, methods, context, planning and management, and interpersonal. Table 1 documents that all five domains of the AEA’s competencies highlight the role of evaluation use in competent evaluation practice, signifying its centrality to good practice. The pattern theory’s evaluator factor concerning “dedication and commitment to facilitating and stimulating use” aligns strongly with all five of these competency domains.
The Use Competencies in the Draft American Evaluation Association Evaluator Competencies.
In addition, other nonuse competencies align with the research-based factors that comprise the pattern theory of evaluation use. The evaluator factor related to developing rapport and a good working relationship with users, for example, aligns with Competencies 5.3 (“Uses appropriate social skills to build trust and enhance interaction for evaluation practice”) and 5.7 (“Facilitates constructive and culturally responsive interaction throughout the evaluation”). A second evaluator factor, addressing the way that evaluators involve or engage potential users along with the user factor concerning their meaningful involvement in evaluation processes, aligns with Competency 5.2 (“Values and fosters constructive interpersonal relations foundational for professional practice and evaluation use”). Similarly, the appropriateness of the methods employed and their credibility with potential users aligns with Competency 2.7 (“Designs sound, credible, and feasible studies that address evaluation purposes and questions”), communication quality aligns with two competencies (4.10, “Communicates evaluation processes and results in appropriate, timely, and effective ways,” and 5.6, “Communicates in meaningful ways throughout the evaluation…”), and concerns about larger organization issues align with three competencies (3.2, “Respects and responds to the uniqueness of the context,” 3.3, “Addresses systems and complexity within the context,” and 3.9, “Considers both specific and broader contexts of the program”). These direct connections make obvious the centrality of evaluation use considerations in competent evaluation practice detailed in the AEA Evaluator Competencies.
Third, while earlier versions of the Guiding principles of the AEA (2018b) made minimal reference to an evaluator’s role in fostering evaluation use, one section of the recent revision implicitly attends to issues of use. The section heading of Section E of the proposed revision, “Common Good and Equity,” reads: “Evaluators strive to contribute to the common good and advancement of an equitable and just society.” Two principles in particular highlight activities related to potential use or—more likely—to avoiding misuse: “E3. Identify and make efforts to address the evaluation’s potential risks of exacerbating historic disadvantage or inequity E5. Mitigate the bias and potential power imbalances that can occur as a result of the evaluation’s context.” (AEA, 2018b)
Because potential users are often people with power to make decisions that affect others, these revised guiding principles suggest that evaluators need to attend to the use or misuse of evaluation that could lead to negative consequences in terms of social justice.
Taken together, these professional documents provide the first type of evidence of the centrality of evaluation use. The second is provided by the distinct role that use plays in extant evaluation theories. A more detailed discussion of these theories follows below, but unmistakably, the concept of use is an integral part of each of the theories placed on the use branch of the evaluation theory tree (Alkin, 2013); this is, after all, why they were placed on that branch. 2 All of these theories share a common commitment to the evaluator’s role in facilitating and stimulating use and actively engaging potential users in the evaluation process. Indeed, Patton (1978) coined the term personal factor, highlighting the critical role of intended users in fostering evaluation use; Alkin has written of the evaluator personal factor, which includes a strong commitment to the evaluator’s role in attaining use (Alkin, Daillak, & White, 1979; Alkin & Vo, 2018), and King and Stevahn (2013) coined the term interpersonal factor, emphasizing the need for evaluators to nurture meaningful interactions with evaluation clients and participants en route to subsequent use. Given their practice in the federal policy-making arena, Alkin (2013) notes that Chelimsky and Wholey stress the importance of political sensitivity and credibility and, along with Alkin and Preskill, carefully attend to the characteristics of programs and the nature of the organizations in which evaluations take place.
While a commitment to fostering use is necessarily true for theorists placed on the use branch, concern for the use of evaluation is also integral to theories placed on the remaining two branches. Consider one example from each nonuse branch to make the case: Huey Chen from the methods branch and Donna Mertens from the values branch. Chen’s theory, placed on the methods branch owing to its theory-driven orientation, also reflects a sincere commitment to use. The second edition of his textbook, Practical Program Evaluation (Chen, 2015), explicitly describes how to work actively with stakeholders on a program’s scope and action plan and how to implement a contextually appropriate evaluation explicitly framed to foster the use of the results. His content throughout suggests the importance of interpersonal processes to engage evaluation participants (e.g., participatory evaluation, facilitation, and consensus building). There is a clear commitment to what he calls stakeholder credibility, that is, making the evaluation believable and useful, in addition to scientific credibility, again reflecting a commitment to fostering use. Located on Alkin’s values branch, 3 the transformative theory of Mertens (2013), whose “theoretical strands include…feminist theories, critical theory, critical race theory, disability rights theory, indigenous rights theory, queer theory, and deafness rights theory” (p. 229), engages the evaluation process to directly target social injustice and inequities. Evaluation use is a necessary component of her approach; without use of both the evaluation process and its outcomes, society will remain unchanged and injustice will continue. Mertens’ approach requires that evaluators actively engage members of diverse communities throughout the evaluation process both to understand multiple perspectives and to ensure the appropriateness of methods and their perceived credibility with potential users.
The third source of evidence of the centrality of use comes from the fact that scholars routinely continue to study diverse aspects of evaluation use, including different geographic contexts, programmatic subject areas, and sectors. Further, as will be discussed in more detail below, it is evident that the concept of evaluation use continues to evolve as scholars and published research pay attention to increasing their clients’ and stakeholders’ focus on what to do both as the evaluation takes place and after it is completed.
Theories of Evaluation Use and Influence Over Time
If there is now little question of the importance of evaluation use in the field, a question does arise as to what extent this centrality is reflected in evaluation theory. The following section begins with a comment on evaluation theorizing in general, presents a set of criteria for meaningful theories of evaluation use, and applies these to utilization-focused evaluation (UFE), one of the best described prescriptive theories. It then describes numerous theories and models of evaluation use and influence.
A Comment on the General State of Evaluation Theorizing
While the state of evaluation theorizing in general is not the focus of this article, to discuss theories of evaluation use and influence first requires taking a step back to describe the current state of evaluation theorizing as we see it. The point can be made quickly: Whether called a field of practice, a discipline, a transdiscipline, or a profession, program evaluation has yet to develop a singular, overarching, unifying theory or competing theories as is common in traditional social science fields. As Stufflebeam and Coryn (2014) put it, evaluation scholars have not by and large to date engaged in the systematic pursuit of validated social science theory: Although evaluation theorists have advanced creative and influential models and approaches for conducting program evaluations, these constructions have not been accompanied by a substantial amount of related empirical research. Consequently, no vast body of evidence exists on the functioning of different evaluation approaches…[T]he program evaluation field lacks a sufficient body of research and steadily improving theories flowing from an ongoing process of rigorous, empirically grounded theory development. (p. 46)
Evaluation scholars (e.g., Alkin, 2013; Stufflebeam & Coryn, 2014) continue to make a clear distinction between evaluation theory and evaluation models or approaches; however, given the lack of validated theories, the distinction blurs in practice. As Alkin (2013) writes, “…[I]n the strictest sense, what we will refer to as ‘evaluation theories’ do not fully qualify for that status. Nevertheless, we intentionally refer to them in that way to reflect their most common current usage…” (p. 4). This usage expands the definition of theory by dividing it into “two general types of models”: (a) a prescriptive model, the most common type, is a set of rules, prescriptions, prohibitions, and guiding frameworks that specify what a good or proper evaluation is and how evaluation should be done…and (b) a descriptive model is a set of statements and generalizations that describes predicts, or explains evaluation activities—such a model is designed to offer an empirical theory. (Alkin, 2013, pp. 4–5)
Criteria for Meaningful Theories of Evaluation Use—And How They Are Not Applicable
A good evaluator understands the value of explicit criteria in a meaningful evaluation, and, thankfully, authors have proposed two sets of such criteria, one that is applicable for analyzing both prescriptive and descriptive theories, identifying the “critical aspects of the value of theory to practice” (Miller, 2010), and one for a good descriptive evaluation theory overall (Shadish, Cook, & Leviton, 1991). Miller (2010, pp. 391–396) presents five criteria that speak directly to the value of evaluation theory to practice: operational specificity, range of application, feasibility in practice, discernible impact, and reproducibility. Shadish, Cook, and Leviton (1991) propose broader criteria for a “good theory for social program evaluation,” including one section specific to evaluation use (see Table 2).
A Summary of the Use Component.
Source. Shadish et al. (1991, Table 2.4, p. 53).
Table 3 matches Miller’s criteria with the elements of Shadish, Cook, and Leviton’s good evaluation use theory. The match is not entirely one-to-one. For example, Shadish, Cook, and Leviton focus on negative aspects of discernible impact and do not include reproducibility in their list at all, although it is implied by their focus on descriptive theory. Nevertheless, putting the two sets of criteria together and applying them to evaluation use yields a framework for examining examples of use theory over time: Operational specificity: Explicit details are given about how to foster evaluation use for studies in specific settings. Range of application: Explicit description is provided of where the theory is likely to increase use and where it is not likely to succeed. Feasibility in practice: Practitioners can easily and routinely conduct the activities. Discernible impact: The prescribed activities do, in fact, lead to increased use. Reproducibility: Different practitioners can reproduce the same outcomes (i.e., use) at different times and places.
Criteria for Reviewing Evaluation Use Theories Based on General Criteria From Miller (2010) and Use Theory Criteria From Shadish et al. (1991).
Unfortunately, owing to the lack of formal descriptive theories of evaluation use, to apply these criteria is essentially to ensure that the extant “theories” will fail to meet them. Consider UFE, a prescriptive theory that is the subject of five lengthy books—four editions of Utilization-focused Evaluation (Patton, 1978, 1986, 1997, 2008) and a 440-page primer, Essentials of Utilization-focused Evaluation (Patton, 2012). Surely, given the amount of content available on UFE, if any theory of use could meet the criteria proving its value to practice, it might be this one. Analyzing the 17 UFE steps listed in The Essentials of UFE, it does fairly well on four of the criteria: 5
Operational specificity: Patton (2012) identifies the 17 steps of UFE, providing explicit details, step-by-step, about how to plan and implement evaluations that are likely to foster use in specific settings. He makes it clear that, given the context-oriented nature of any evaluation, there can be no guarantees. Feasibility in practice: Results of a survey of U.S. members of the AEA (Fleischer & Christie, 2009) document the extent to which practitioners reportedly attend to issues of use, suggesting that evaluators are able to routinely engage in utilization-focused activities (e.g., planning for use at the beginning of a study, identifying and prioritizing intended uses and intended users, involving stakeholders in the evaluation process). Reproducibility: Novices learning about UFE often lament the fact that they lack the intellectual and interpersonal skills of Michael Quinn Patton. “I’ll never be as good as he is,” they moan. “His skill set can’t be reproduced!” But what exactly would it mean for different UFE practitioners to produce the same outcomes (i.e., use) at different times and places? It seems possible to us that different UFE proponents could effectively engage in the practice at different times and in different contexts and successfully foster use. The challenge for this criterion may well rest in what we mean by “use” and how we measure it. Range of application: In his UFE writing, Patton clarifies settings where use is likely and where it is not likely to succeed. The first step of UFE—“assess and build program and organizational readiness for utilization-focused evaluation” (Patton, 2012, p. 15)—addresses this directly. If a program or its organization is not ready for evaluation, the implication is that use is unlikely to occur. Step 3—“identify, organize, and engage primary intended users” (PIUs; Patton, 2012, p. 61)—extends this to the research-based notion of the personal factor, that is, the importance of engaging people who actually are interested in and care about the evaluation process and its results. We should note, however, that Patton’s theory does not make a distinction among types of evaluation—it applies to all—nor does it make a distinction as to whether the theory is equally valid for large-scale versus small-scale programs, calling into question its range of applicability.
If UFE mostly meets four criteria for a useful theory of use, however, it certainly falls short on the remaining criterion: discernible impact. To say with confidence that prescribed UFE activities consistently (i.e., always) lead to increased use is to ignore the contextual nature of evaluation settings where, for example, the elimination of a PIU, UFE’s Achilles’ heel (Patton, 2008, pp. 566–567), can scuttle even the most masterful UFE study. As noted previously, there simply are no guarantees. And if this most detailed of prescriptive theories cannot meet all five criteria for a meaningful theory of evaluation use, then other less detailed theories will surely fail to meet them.
Having a set of criteria for evaluating use theories, however, is surely helpful even it reaffirms the fact that the field is a long way from having an empirical-grounded descriptive theory of evaluation use. On the one hand, at least scholars know what theory is needed ultimately. On the other hand, however, absent such a descriptive theory (or theories), evaluation practitioners may continue to adopt and adapt prescriptive theories with which they are familiar without knowing exactly what to do to increase the likelihood of potential use. In the meantime, given the absence of a descriptive use theory and perhaps to stimulate its eventual creation, it makes sense to briefly trace the evolution of prescriptive use theories to identify their distinctive attributes.
Evaluation Use Theories
In the spirit of historical documentation, rather than analyzing use theories by applying stringent criteria, we will instead first describe their evolution, explicate what distinguishes them, and finally examine two formal theories of use.
Prescriptive use theories
Alkin (2013) places nine theorists on the use branch in the following order: Daniel Stufflebeam, Joseph Wholey, Eleanor Chelimsky, Marvin Alkin, Michael Quinn Patton, David Fetterman, Hallie Preskill, Jean King, and J. Bradley Cousins. As one of the first theorists to frame the evaluation process to foster use, Stufflebeam is placed on the branch itself near the base of where it branches off from the trunk, with Patton, whose writings since the late 1970s have influenced generations of evaluators, farther out on the branch. Both are featured as checklists on the Western Michigan University’s Evaluation Center website (https://wmich.edu/evaluation/checklists). The remaining seven theorists each occupy a leaf on the use branch. Table 4 provides a brief summary of these theorists’ writings over time, highlighting why they belong on the use branch. What features do they share? They all make evaluation use central to the design and implementation of evaluations, paying careful attention to context, situation dynamics, and engaging potential users in hopes of fostering various kinds of use, both of its process and results.
Evaluation Use Theories From the Evaluation Theory Tree.
Source. Alkin (2013).
Formal models of evaluation use
In addition to the nine prescriptive theories described in Table 4, scholars have proposed two theoretical models that explicitly address evaluation use: 6 (a) a theoretical model of evaluation utilization (Johnson, 1998) and (b) the ecological model of evaluation use (Ottoson & Martinez, 2010).
First, about 20 years ago, Johnson (1998) reviewed both what he called implicit and explicit “evaluation utilization process models” to propose a unifying theoretical model of evaluation utilization, much like an outcomes chain for the use process. “In an implicit process-model, variable ordering or process is implied but is not directly depicted by the evaluator” (Johnson, 1998, p. 95). To put these models together, Johnson had to make multiple assumptions and add his own thinking since the work is based on the authors’ writing but not confirmed. In this way, he created implicit models for seven theorists: Donald Campbell, Michael Scriven, Carol Weiss (two versions), Joseph Wholey, Robert Stake, Lee J. Cronbach, and Peter Rossi. Next, he presented nine additional process models, that is, explicit models that were “constructed by researchers and generally, appear in articles and books” (p. 97), to highlight the theoretical contributions of 10 authors, including Jennifer Greene, Marvin Alkin, and Michael Quinn Patton. 7
Johnson then took the 16 models he created, added the concept of organizational learning, which did not appear in any of them, and compiled a single unified theoretical model of evaluation utilization “intended to apply to any evaluation” (p. 106), surely a serious claim (Figure 1). The model has multiple components, presented as variables suitable for research: The external and internal environments/contexts of the evaluation, which give inputs to and receive outputs from the chain of logic; background variables, including organizational, individual, and evaluator characteristics; interactional variables, comprising evaluation participation, dissemination, and politics; explicit utilization variables, divided into three categories, that is, “a multidimensional conceptualization of the outcome variable” (p. 103), including cognitive use and behavioral use and adding organizational learning, a variable not present in any of the process models, but mentioned in the literature. Johnson’s model suggests that “…evaluation use occurs through an open system of interrelated background, interaction and use variables operating in an internal environment situated in an external environment…” (p. 106). He emphasizes that it is not a static model. Rather, it is a “‘model in action’…based on the assumption that the utilization process needs to be viewed as a dynamic and open, complex system…” (p. 107). Table 5 compares Johnson’s variables with the pattern theory of evaluation use.

A theoretical model of evaluation utilization (Figure 19 in Johnson, 1998, p. 104).
A Comparison of the Pattern Theory of Use and Johnson’s Theoretical Model of Evaluation Utilization.
Whereas Johnson (1998) created his model by integrating 16 self-created “process models” of evaluation utilization, Ottoson and Martinez (2010) based the second model, their ecological model, on the results of a single case study of use, a limited source of data for a generic model. They interviewed a total of 23 informants—“staff, evaluators, grantees, program developers, consultants and other stakeholders” (p. 6)—from the 4-year evaluation of an R. W. Johnson Foundation-funded program entitled Active for Life: Increasing Physical Activity Levels in Adults Age 50 and Older. As they put it, Linear models just do not tell the interactive story we found. Like other ecological models, this one proposes multiple “eco-systems” or contexts of evaluation use in this case study, with multi-directional and multi-layered influences. (p. 7)

The ecological model of evaluation use (Ottoson & Martinez, 2010, p. 8).
Taken together, the two models present different images of the use process. They minimally fall into the descriptive theory category: One builds on researcher-created models based on other theorists’ writings; the other is based on case study data from a single study. Grounded in internal and external environments/contexts, Johnson’s linear model, with the obvious limitations of such thinking given the complexities of social programs, moves purposefully from background variables to interactional variables then to utilization variables. Eschewing a linear approach and, again, grounded in data from a single case study, Ottoson and Martinez, by contrast, nest evaluation in its layers of surroundings, moving from the center outward.
But while taking different approaches, these models share commonalities. Each points to multiple and complex possibilities for use and to the systemic and multivaried nature of the use process, including the role of politics and interpersonal dynamics. Each also makes it clear that evaluation use can occur in different ways and over time. Either potentially offers a framework for systematic research on the topic.
Theories of Evaluation Influence
At the turn of the 21st century, Kirkhart (2000) proposed a new term—evaluation influence—as “the capacity or power of persons or things to produce effects on others by intangible or indirect means” (p. 7, emphasis added). The discussion of such a shift in the early 2000s led to an upsurge of discussion of the consequences of evaluation. No historical summary of the evaluation use literature, therefore, would be complete without discussing the two published “theories” of influence: (a) the integrated theory of evaluation influence (Kirkhart, 2000) and (b) the schematic theory of evaluation influence (Henry & Mark, 2003; Mark & Henry, 2004). It is interesting to note that, in contrast to the models of evaluation use that were labeled models, these authors labeled their work theories. If, as Bacharach (1989) writes, “The goal of theory is to diminish the complexity of the empirical world on the basis of explanation and prediction” (p. 513), these are not traditional social science theories; they are not grounded in empirical research. Calling them theories instead applies the field’s tradition of equating “theory” with models or approaches (Alkin, 2013).
Integrated theory of evaluation influence
Kirkhart (2000) argued that to understand how evaluation actually changes society, researchers should extend the narrow framing of use by adding a broader-based construct. She suggested the term influence as an addition to use: The term influence…is broader than use, creating a framework with which to examine effects that are multidirectional, incremental, unintentional, and noninstrumental, alongside those that are unidirectional, episodic, intended, and instrumental. (which are well represented by the term use; Kirkhart, 2000, p. 7, emphasis in original)
Alkin and Taut (2003) expanded Kirkhart’s model by adding the element of awareness, purposefully focusing attention on what evaluators can be aware of and do something about, that is, intended and unintended use that is immediate and end-of-cycle. They called these instances evaluation use. Any impacts that are unintended and of which the evaluator is unaware, they wrote, appear “not as essential to the evaluation profession as the impacts that are of a conscious…nature, in the eyes of the users, and hopefully of the evaluator as well” (p. 9). They referred to the instances that are long-term and beyond the control and the purview of the evaluator as influence.
Schematic theory of evaluation influence
Building on Kirkhart’s argument, Henry and Mark published “Beyond Use: Understanding Evaluation’s Influence on Attitudes and Actions” in 2003 and, with the author order reversed, “The Mechanisms and Outcomes of Evaluation Influence” the following year. In the first article, they joined Kirkhart in moving beyond the concept of use, writing that “…neither the change processes through which evaluation affects attitudes, beliefs, and actions, nor the interim outcomes that lie between the evaluation and its ultimate goal—social betterment
9
—have been sufficiently developed” (Henry & Mark, 2003, p. 293). They sought to identify these change processes and outcomes by expanding the scope of the field’s scholarly content: Fortunately, the framework need not be developed de novo. Social science provides both theories of change that are relevant and research on specific outcomes that are similar to those that can be expected to appear in chains of outcomes through which evaluation could lead to social betterment. (Henry & Mark, 2003, p. 296)
In their second article, Mark and Henry further developed their ideas by presenting a model that crosses four types of processes/outcomes (general influence, cognitive and affective, motivational, and behavioral) with three levels of analysis (individual, interpersonal, and collective) to categorize selected alternative influence mechanisms, many of which were included in the first article. Building on Cousins (2003), they presented a refined graphic of a theory of evaluation influence that integrated their ideas (see Figure 3). It is a functional logic model for program evaluation and includes inputs, activities, outputs, general mechanisms, intermediate and long-term outcomes in three areas (cognitive and affective, motivational, and behavioral), and contingencies in the environment, all culminating in social betterment. Again, with no explicit definition of influence, they argued that this “schematic theory” would enable researchers to identify and study specific influence pathways and make “concrete predictions about the general relations between different components of the logic model of evaluation” (Mark & Henry, 2014, p. 47).

The schematic theory of evaluation influence (Mark & Henry, 2004, p. 46).
At the time these influence theories were presented, they seemed to provoke the field to think more broadly about the potential bandwidth of evaluation use. Kirkhart’s theory sought to identify the long-term effects of program evaluation and their sources; Henry and Mark’s theory, grounded in realist thinking, sought to identify mechanisms from a variety of disciplines (e.g., behavioral and social psychology, management science, and political science) that could identify pathways to explain the ultimate impact of evaluation. Miller’s criteria for a good evaluation theory (i.e., operational specificity, range of application, feasibility in practice, discernible impact, and reproducibility) make clear how influence genuinely differs from use. By determining the causal pathways that lead to observable effects, it surely focuses on impact but little on explicit evaluation practice that might lead to it. As Alkin and Taut (2003) noted, because influence is an intangible and indirect process, there is no way to specify details of what evaluators should do to increase it.
Influence theory, finally, is more about studying evaluation than about changing its practice. It may help make sense of how evaluations effect change over time, but only when there is a significant body of research on influence might the implications for evaluation practice (i.e., what to do in specific settings to increase the power of evaluation) become evident. That research base on evaluation influence is not yet available, in part because the term influence means different things to different people. As Herbert (2014, pp. 388–389) writes, “Despite the prominence of evaluation influence in the literature, there is slow progress toward a persuasive body of literature.”
To review, this discussion of evaluation use/influence theory has consisted of three parts. The first reviewed 14 prescriptive theories, each of which makes evaluation use a critical attribute. The second presented two published theories of evaluation use. The third presented two extant theories of evaluation influence. What, now, is the current status of evaluation use theory? In one sense, even after 50 years, evaluation use theory is in its infancy. Speaking about evaluation theory in general, Alkin and Vo (2018) describe the situation succinctly: “There simply is not a sufficient body of knowledge about what happens in an evaluation to be able to predict with certainty what would happen when an evaluation is employed in a particular way” (p. 297). Evaluation use theory is encompassed in that assessment. But in another way, while there is surely a long way to go before scholars develop a meaningful descriptive use theory, we believe that the various “theories” presented here—based on practical experience and to some extent upon research—are helpful beginnings because they focus attention on what we know to be critical attributes related to evaluation use.
Scanning Research on Evaluation Use: Past and Present
Regardless of the state of use theorizing, the fact that researchers have studied this topic since the 1970s is an indicator of its centrality to evaluation practice. The second article in this series (Alkin & King, 2017) provided a history of this research, identifying factors shown to affect evaluation use. In this section, we will discuss a critique of this past research, then—given that this is a historical piece—briefly discuss present and future research, But first a comment on competing understandings of the term use.
In contrast to the Inuit people who require multiple words to describe different types of snow, our field has stuck with one term—use—to cover multiple possibilities. Almost 30 years ago, in Debates on Evaluation, Alkin (1990) discussed a disagreement, long since resolved, between Weiss (1988a, 1988b) and Patton (1988), noting that a key part of their tension resulted from a difference in focus. Weiss worked in complex policy environments where evaluation results were one piece of information available to decision makers. She thought there had been only “indifferent success in making evaluation the basis for discussions” (Alkin, 1990, p. 225); decisions accreted, and, if evaluation did not lead directly to decisions, it often led instead to decision makers’ enlightenment.
By contrast, Patton worked with PIUs in settings where decision makers with a sense of intended uses might directly apply evaluation results, and he could easily provide many examples of the instrumental use of evaluations. The distinction at that time may have been between so-called academic evaluators and client-centered evaluators, but the crux of the disagreement stemmed from different understandings of what use meant and how someone would know it when they saw it. The ultimate resolution was coming to understand that “evaluation use is not a question of either enlightenment or instrumental use, but rather both/and” (King & Stevahn, 2013, p. 54). Such rival understandings of evaluation use have complicated research efforts, as has the relatively recent addition of the concept of influence. The definition for evaluation use proposed by Alkin and King (2017) sought to create consensus on a definition, one step toward clarifying a path forward with a common understanding. It remains to be seen whether such a stipulation is useful.
Past Research
As noted, Alkin and King (2017) reviewed research on evaluation use factors, including three major compilations (Cousins & Leithwood, 1986; Johnson et al., 2009; Shulha & Cousins, 1997). Although researchers have conducted multiple studies on the topic over the years, Brandon and Singh (2009) raised questions about the quality of the work: As a body of evidence for a scientific understanding of the use of evaluation findings…the results of the studies on use are currently of questionable quality…Standing alone as a body of results about evaluation use…the findings of the studies examined here do not as a whole have sufficient scientific credibility. (p. 135)
Their critique applied two criteria: (1) “the balance of the types of methods and the implications of this balance for making conclusions about use” and (2) content-related validity (Brandon & Singh, 2009, p. 125, emphasis in original). The 52 studies used four methods—“surveys, quasi-experimental simulations, case studies, and narrative reflections” (Brandon & Singh, 2009, p. 133)—and, because 69% were grounded in education, only that area (i.e., the use of education evaluations) met the first criterion. The second criterion “addresses the quality of the methods—that is, the extent to which they showed evidence of content-related validity” (Brandon & Singh, 2009, p. 133), and, taken as a whole, the 52 studies failed: “We found little discussion of content validity issues in the quantitative studies or parallel information in the qualitative studies” (Brandon & Singh, 2009, p. 133).
Nevertheless, Brandon and Singh (2009) conclude that thoughtful evaluators, aware of the limitations, might well apply the research results in ongoing efforts to improve evaluation use in their practice. “This approach implies that utility trumps accuracy…” (Brandon & Singh, 2009, p. 134). 10
Current Research Activity
Critique notwithstanding, people continue to study evaluation use, and a search for articles on evaluation use research will generate an ever expanding list. Table 6 gives examples of studies that scholars have conducted since the Johnson et al. (2009) research compilation, grouped by place, organizational setting, content, and methods. The list is not meant to be inclusive, but illustrative, and the evidence is clear. Researchers in countries around the world persist in studying questions about evaluation use, 11 and they do so in organizational settings that range from large to small and in programs from a number of different subject areas. Collectively, authors continue to use tried and true methods (e.g., reflective case narratives and surveys) and have added at least two new methods in their studies.
Examples of Evaluation Use Research Studies (2009–Present).
Because the present article is not a formal review of current literature, consider just four articles published since the Johnson et al. (2009) compilation that highlight the range of issues scholars are tackling. They represent research from Canada, Denmark, Israel, and the United States. Contandriopoulos and Brouselle (2012) apply knowledge exchange concepts—polarization and cost sharing—to contrast evaluation models (utilization-focused, realist, empowerment, and democratic evaluation) and to highlight the relationship among context, choice of model, and use of results. In his study, Højlund (2014) uses institutional theory in a policy environment, concluding that “…we need to focus more on the organizational context of evaluation and less on the evaluation and its immediate conditioning factors” (p. 38). Examining the use process in a single educational organization, by contrast, Neuman, Shahor, Shina, Sarid, and Saar (2013) develop a “local theory” about evaluation use specific to that organization and discuss how such theories might help increase use. Sturges (2015) takes a critical stance to present a case study that documents “evaluation’s complicity in helping to maintain power asymmetries” (p. 462), offering suggestions to alleviate or at least address such problems.
Where Do We Go From Here?
At the beginning of this article (the final of three), we explained that our purpose in compiling a history of evaluation use was to document and clarify its evolution and, in so doing, suggest the potential of certain practices and contextual considerations for scholars and practitioners. We begin this conclusion by summarizing four ideas that emerged from the review: Evaluation use research has a 50-year history; the concept of evaluation influence has existed for roughly 20 years but has not yet generated a sizable research literature. The research-based pattern theory of use proposed in our second article identified four sets of key factors to study, those related to users, evaluators, the evaluation process, and its context. Scholars continue to conduct research on evaluation use around the world using different framing theories and methods with different types of people in a variety of different settings. At this time, the question of what explicit theory might help evaluators improve practice and increase the appropriate use of the evaluation process and its results remains just that—a question.
Writing about good evaluation theory almost 30 years ago, Shadish et al. (1991) proposed that detailing an evaluation practice component would be most important because “…evaluators have to practice in a context where leisurely reflection about theoretical alternatives must yield to action within constraints” (p. 37). Ideally, such a practice theory would identify contingencies, including if/then statements, that is, if an evaluator is in this situation in this context, then here is the best action to achieve the outcome of evaluation use, as defined now in Alkin and King (2017). Based on the initial 50 years of research, it is likely that the development and validation of such a contingency theory, although extremely desirable, may well be the holy grail of use research, especially within the limits of available funding.
How, then, might the field advance thinking on use and influence? Let us begin with two ideas to lay groundwork for a path forward. First, in considering context both small and large, it is important to note how radically times have changed since the 1970s when scholars published the first research on evaluation utilization. It is an understatement to note that the world has changed significantly in the roughly 50 years since people began studying evaluation use, and future research needs to attend to the ways the scene has changed and the impact these changes have had on acts of use and influence themselves and also on how to study these concepts. Consider these points of context: Evaluation is a growth industry internationally. The continuing expansion of voluntary organizations of professional evaluators across the globe documents the development and expansion of evaluation activities. There is a wider acceptance of evaluation and expectations that programs will be evaluated and that something should happen as a result. The growing numbers of decision makers with access to evaluation processes and results in organizations ranging from small nonprofits to large multinational corporations increase the possibilities of use, misuse, and influence as social change efforts become increasingly interconnected and complex. While evaluation remains an important means of accountability, it has increasingly taken on an additional role as a learning activity. Organizational learning, mainstreaming evaluation, and evaluation capacity building (ECB) reflect the possibilities of “do it yourself” evaluation that is integrated into ongoing program functions with or without support from a professional evaluator. Program staff and administrators can learn evaluation processes and create systems for data collection, analysis, interpretation, and use over time. New roles for evaluators, for example, include teaching workshops on data collection and interpretation, facilitating data parties, and developing innovative report formats that engage potential users in making sense of evaluation results. In these 50 years and reflected clearly in the recent draft revision of AEA’s Guiding Principles, the field has acknowledged the critical importance of diversity and inclusion and of evaluators’ need to focus on issues of power and equity. This is directly related to the broader acceptance of multiple approaches to knowledge creation, including those of indigenous peoples, with Northern Enlightenment approaches contrasting dramatically with those of the global South. The issue of “whose truth for use by whom” and the means for generating it can raise difficult questions for evaluators, especially when they do not reflect the cultural background of program participants. What exactly does it mean for evaluation “to speak truth to power”? Perhaps the most visible change in the past 50 years, however, has been in the development of technology that significantly affects how evaluators work. Personal computers, laptops, the Internet, cell phones and their associated apps, Google and other search engines, and almost daily innovations have radically changed how people—evaluators and their clients alike—create and access information. Examples are numerous. Evaluators and potential users now have easy online accessibility to evaluation (and research) studies in different forms. Evaluation reports are available on agency and organization websites rather than in the gray literature that required potential users formerly to write to ask for hard copies. If decision makers around the world have access to technology and the electricity to run it, they have 24/7 access to evaluation information, including reports from multiple funding sources, and they can research how to conduct evaluations, compile studies on similar projects, and even identify evaluators to hire. Evaluators and program leaders can reuse the standardized tools developed for one evaluation that are available online. There are evaluation blogs and webinars and websites from professional evaluation associations. There are networks of users on the evaluation of specific topics (e.g., science education, disaster relief, and environmental change). Finally, of grave concern for the practice of evaluation—and an area where we may well feel helpless about what to do individually and collectively—is the recent emergence of an era of “fake news,” where experts are distrusted, the scientific method is no longer necessarily valued, and some political leaders reject and even disparage the use of meaningful data. We are reminded anew that the Latin root for the word fact is “to make” (cf. “factory”) as people create facts to support a position or fit their desired outcomes.
These contextual changes—the rapid expansion of evaluation, its newfound potential for learning, explicit attention to issues of diversity and equity, technological developments, and the emergence of fake news—occurred over the many years during which evaluation use research took place. We believe it is important to acknowledge current environmental conditions in hopes that scholars will take them more explicitly into account as they plan future studies.
Second, if it is worth reflecting on the effects of an evolving context, it is equally valuable to step back and look at the overarching framing of research on evaluation use. One framework for doing so is to apply Gowin’s Vee heuristic (Novak & Gowin, 1984). Gowin developed the Vee to “illustrate the conceptual and methodological elements that interact in the process of knowledge construction…” (Novak & Gowin, 1984, p. 3). At the point of the Vee are the events or objects of interest to the research, in this case, instances of evaluation use or influence. The two legs of the Vee document the methods and the concepts that will guide the inquiry. Ideally, the methods for creating records (data) will clearly match the concepts, so that, taken together, they succeed in answering the research’s focus question (which in Gowin’s system is written in the center of the Vee).
Applied in brief given the limitations of space, each of the three parts of the Vee furthers this discussion: Events of interest: Given the many changes in context just described, it is highly likely that the events of interest around evaluation use/influence have changed since the 1970s, and researchers should attend to this. Below, we will discuss how Alkin’s concept of context-sensitive evaluation may be a viable way to focus future research (Alkin & Vo, 2018). Methods: As discussed in the section on continuing research, scholars are employing new methods for the study of use/influence, and this methodological innovation should continue as it could well prove useful. Concepts: Critics may blame the construct of evaluation use for less than definitive findings after 50 years. Even with the more recent addition of the concept of influence, the lack of definitive answers to questions of use may suggest a causal relation between a construct that is ill-defined and diffuse and the lack of conclusive research results. We would argue, however, that it is not a question of the construct being too broad. Rather, we believe that the problem may stem from the fact that people are missing the distinction between what the literature says is evaluation use and what is evaluation influence, including a lack of specificity of these concepts’ outcomes. What exactly does evaluation use or evaluation influence, a type of use, look like in different settings? Given a specific type of organizational setting, how would you know either if you observed it? This was the reason we defined evaluation use in Alkin and King (2017). If scholars can agree on a definition—ours or someone else’s—and explicit outcomes of evaluation use and create validated instruments for measuring it across various locales, studies would have common metrics for comparison across contexts.
To summarize, we believe that future research should pay close attention to the evolving contexts of evaluation use and of the need for a common definition and outcomes.
We also want to suggest a family of evaluation use theories. Borrowing from the structure of scientific classification, we propose a unifying family of theories with three context-specific species (King, 2011). Species 1 involves use in a single setting (e.g., one school district or one nonprofit). Much of the existing research, which focuses on use within individual organizations, belongs to this species (e.g., Alkin et al., 1979; King & Pechman, 1984). Species 2 involves use across multiple sites, for example, within a network of organizations or across all sites that share a common funding source (e.g., Toal, Johnson, King, & Lawrenz, 2008). Species 3 is much broader in scope, involving use across time and space; it is evaluation influence, accommodating the intangible and indirect effects of the evaluation process and its results (e.g., Rebolloso, Baltasar, & Canton, 2005; Oliver, 2008). This use family is grounded in an ecological model of contexts ranging from small to large. One key caveat is the need to consider program size in each species. Evaluation use research must acknowledge the effects that the size of a program has, regardless of context, as larger programs are unavoidably more complicated and/or complex, affecting the likelihood and potential of both use and misuse.
Building in part on the framing proposed in Henry and Mark (2003) and Mark and Henry (2004), we further suggest that the field considers a realist evaluation approach for future studies, that is, identifying what works for whom in what specific contexts (Pawson, 2008; Pawson & Tilley, 1997, 2004). Realist evaluation attends to the complexity of the multiple systems within which programs work by documenting the multiple possibilities of specific contexts (C), focusing on the mechanisms (M) at work in the setting, and measuring the outcomes (O), in this case instances of use (including what has been called influence), hence context-specific CMOs. If evaluation use scholars consistently develop CMOs in their studies across species, especially using common outcome measures, then the important features of use contexts and of the mechanisms that result in use in them may finally make the content of a unifying theory evident.
At the end of Alkin and Vo (2018), his revised introductory textbook, Alkin borrows a term from Miller (2010) and presents the “theoretical signature” of a descriptive theory he calls context-sensitive evaluation that is grounded in research on evaluation use. It indicates context (C), applicable situations, in two ways—type of evaluation (formative) and size of program (local, small scale); it identifies mechanisms (M), operational activities that the evaluator should do in order for the action to be classified as a context-sensitive evaluation; and it specifies an outcome (O), namely, use. Studying context-sensitive evaluation by compiling multiple CMOs may make it possible to develop formal research-grounded descriptive theories of evaluation use. Such research could certainly build on what we have learned from existing research. Variables to be considered include those that Kirkhart’s theory highlights (time, source, and intention) plus Alkin and Taut’s term awareness, the three levels (individual, interpersonal, and collective, i.e., public and private organizations) that Henry and Mark name, multiple subject areas (public health, social justice, education, etc.), the social context and technological advances connecting people, and so on. We believe that describing and documenting context in careful detail is imperative.
Finally, following our review of the history of evaluation use research and thoughts on how to structure its future, we want to suggest two broad topics that in our opinion represent fertile ground for additional research: The engagement and evaluative education of potential users: Research reviews have pointed to the importance of engaging people in evaluation processes, but multiple questions remain. What exactly does involvement or engagement mean in relation to use, and how might you measure it? Who exactly needs to be involved? What is the role of those with decision-making authority or those who champion evaluation? In addition, the growing literature on ECB suggests that evaluators may require different competencies to actively involve stakeholders in evaluations over time and teach them the skills of evaluative thinking. A central question would be: To what extent is use built in in organizations with a high level of evaluation capacity? What is the role of both internal and external evaluators in creating systems that foster routine use? Evaluator competence and commitment to fostering use: Factors related to an evaluator’s explicit skills at developing and sustaining their clients’ commitment to use represent a second area ripe for study. Such research could focus on a number of specific evaluator attributes and competencies we know to be important, for example, a full personal commitment to use, high-level interpersonal skills, facilitation and instructional skills, and an ability to develop trust and meaningful relationships in culturally appropriate ways.
This final section has included a critique of past research, a quick overview of current research, and thoughts on future research directions. Brandon and Singh (2009) wrote that the results of existing research might help thoughtful evaluation researchers to frame better studies, knowing what variables to consider and focusing more on validity issues. We agree. Even though after 50 years research on evaluation use may appear to remain in its initial stages, the pattern theory suggests undeniably that it has provided valuable insights. We would argue that the lack of common understandings to build upon and the complexity of the context have hampered this research, but that now, with this empirical grounding, greater progress can occur.
Conclusion
The earliest discussions of evaluation use held a symbolic mirror up to evaluation practice, and some saw a disappointing image. If decision makers were not using evaluation results, why spend the money to conduct program evaluations? A broadened definition and multiple research studies clarified the use picture as people came to see it as a multifaceted process dependent on many factors, some within an evaluator’s control and others related to the settings where evaluations took place. The three articles in this series (Alkin & King, 2016, 2017; current article) have recorded what we now know about evaluation use. What is the current status of evaluation use? Researchers continue to study this critically important topic, and some are applying concepts from other disciplines, expanding the theoretical frames available for understanding this complex process. We are increasingly aware of the importance of context and the likely need to develop context-specific descriptive theories. Knowing this, we conclude our historical review with hope for the future.
Footnotes
Acknowledgment
The authors sincerely thank the anonymous reviewer for suggesting that they discuss the changing context of evaluation use.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
