Abstract
The evaluation theory tree typology reflects the following three components of evaluation practice: (a) methods, (b) use, and (c) valuing. The purpose of this study was to explore how evaluation practice is conceived as reflected in articles published in the American Journal of Evaluation (AJE) and Evaluation, a journal supported by the European Evaluation Society. A key finding from this international comparison suggests that evaluation practice as reflected in AJE and Evaluation both emphasize methods, in comparison to use and valuing. This article concludes using Peter Dahler-Larsen’s discussion on evaluation societies, among other sources, to examine the audit society, which might account for the trends in our findings. EvalPartners, a global community of evaluators, has declared 2015 as the international year of evaluation. These findings regarding cross-continental trends in evaluation are relevant for engaging in a global dialogue on evaluation practice.
EvalPartners has declared 2015 as the international year of evaluation (EvalYear). EvalPartners (http://www.mymande.org/evalpartners) is a global community of evaluators, consisting of over 50 organizations such as the American Evaluation Association (AEA) and the European Evaluation Society (EES), working in partnership to strengthen evaluation capacities. The aim of the EvalYear initiative is to engage international, national, and regional evaluation actors and other relevant stakeholders in a global dialogue to advance evidence-based evaluation practice and policy.
To contribute to this global dialogue about evaluation, we worked across continents (two American authors, one European author, and midway through the project an American graduate student) to explore how the articles published in the American Journal of Evaluation (AJE) and Evaluation, a journal supported by the EES, represent evaluation practice on each continent. Although evaluation practice has been explored in North American journals (Christie & Fleischer, 2010), the purpose of the current study was to explore empirically the similarities and differences between evaluation practice in North America and Europe, as evident within their respective professional communities. Our work was guided by the following research questions: Are there differences or similarities in the understandings of evaluation practice in published evaluation articles in Europe and North America? If so, what might account for such differences or similarities? And, are journal articles in the two continents balanced in their attention to three central aspects of evaluation theory—methods, use, and valuing (Christie & Alkin, 2013)?
To address these questions, in this article, we first review prior international comparison studies of evaluation practice, demonstrating the limited current research that exists on such comparisons and the convergence on methods-centric evaluation practices that has occurred across the United States and Europe. Next, we discuss our understandings of evaluation practice that recognizes contextual influences and the theoretical framework that guided our study (e.g., the evaluation theory tree). Then, we describe our methodology and outline key findings. One key finding suggests that methods or methodological rigor is a central focus of evaluation practice in AJE and Evaluation. To interpret the privileging of methods identified in both journals, we use Peter Dahler-Larsen’s (2012) notion of the evaluation societies to contextualize the current methods-centric focus in evaluation practice. To conclude, we briefly discuss the possible implications of our study for a global dialogue on evaluation.
International Comparison Studies of Evaluation
Cross-national comparisons of evaluation have been offered for more than 20 years (Gray, Jenkins, & Segsworth, 1993; Mayne, Bemelmans-Videc, Hudson, & Conner, 1992; Rieper & Toulemonde, 1997; Rist, 1990, 1995). Many of these early studies concentrated on policy evaluation practices, that is, the phases of “policy formulation, implementation and accountability” (Rist, 1995, p. xvii). Given the focus on policy evaluation, the cultural complexity of contexts, and the differences among the studies themselves, it was difficult to draw any general conclusions about evaluation practices across countries. In contrast to the prior studies, with scholarship from The International Atlas of Evaluation, we began to see efforts on systematic comparisons of evaluation practice across countries (Furubo, Rist, & Sandahl, 2002). This work examined evaluation practice in 21 countries in five continents as well as international evaluation organizations that highly influenced trends in evaluation practice. To explain observed patterns, the editors drew on concepts of evaluation culture, including its maturity and diffusion. They used nine criteria to identify whether a country’s evaluation culture was characterized as mature and how evaluation ideas and practices spread across national borders and policy systems (Furubo et al., 2002, p. 10). The criteria included the following: (1) Evaluation takes place in many policy domains, (2) Supply of domestic evaluators in different disciplines, (3) National discourse concerning evaluation, (4) Professional organizations, (5) Degree of institutionalization—government, (6) Degree of institutionalization—parliament, (7) Pluralism of institutions or evaluators performing evaluations within each policy domain, (8) Evaluation within the Supreme Audit Institution, and (9) Proportion of outcome evaluations in relation to output and process evaluations.
These criteria aim to characterize evaluation as a practice and everyday work in multiple policy domains formed by discourse, institutionalizations, and professionalization.
Despite the divergence of some countries, Furubo, Rist, and Sandahl (2002) suggested that findings of mature evaluation cultures in many European countries were in part due to the fact that these countries belonged to what is termed “the pioneers” and “the first and second wave countries” (p. 21) of evaluation. That is, in these countries where educational and social welfare reforms dominated the political agenda in the 1960s and 1970s, there was a spillover of evaluation thinking to other public sector disciplines (i.e., mental health, environment, defense, housing, and urban development), which facilitated evaluation diffusion and utilization. Also, as a possible consequence of their European Union membership, these countries were characterized as having a strong external pressure for evaluation. Findings of mature North American countries (i.e., United States and Canada) were characterized as having mature evaluation cultures but with weak external pressures (i.e., lack of influence from international organizations such as World Bank or United Nations) and strong internal pressures (i.e., political and administrative cultures in federal and state governments) for evaluation.
To compare their 2002 findings to current evaluation practice, Jacob, Speer, and Furubo (2012) replicated the Atlas study 10 years later with 19 of the original 21 countries. In addition to the indicators used in the original study, the 2012 study included a survey of evaluation experts. Based on this investigation, trends indicated that evaluation cultures matured over the last decade, since 15 countries showed increases on their indicators of maturity. The three indicators that most radically changed included (1) increases in the supply of evaluators from various disciplines, (2) the pluralism of institutions and evaluators, and (3) the proportion of impact-oriented evaluations (i.e., evaluations centered on outcomes). According to the authors, the increases in these indicators—in particular, the increase in impact evaluation—has eroded major differences that were noted across many of the same countries a decade ago. As a result, evaluation practices across the countries studied were strikingly similar. To some extent, this is an indicator that a broader, international discourse of evaluation has evolved across the countries studied.
Jacob and colleagues (2012) made three key points related to the increase in impact evaluation. First, they suggested that the increase has roots in the historical development of performance audits. Most notably, countries that developed audit praxis in the 1970s and 1980s, such as the United States, the Netherlands, Canada, and Sweden, now have audit institutions that contribute a critical role to the production of evaluation. Although audit institutions did not impact all countries in the same way, audit institutions have had and continue to have a major influence on evaluation practice. Second, the focus on audits or methods-centric evaluation is advanced by the increased demand for randomized-control trials (RCTs) or evidence-based policy and research, such as the American What Works Clearinghouse, the Nordic Cochrane Collaboration, SFI-Campbell Collaboration, and the Evidence for Policy and Practice Information and Coordinating Centre in the United Kingdom. And third, there has been an increase in the number of impact evaluations in disciplines such as criminal justice, education, and social policy, which Jacob and colleagues (2012) attributed to funding for academics to carry out impact evaluations. In general, both North America and Europe have experienced a shift to an evaluation practice emphasizing the use of particular methods (RCTs) and impact evaluations as an evidenced-based evaluation approach to inform decision making.
In summary, for several decades, scholars have wrestled with cross-country comparisons due to the variability in local contexts. Most recently, Jacob and colleagues (2012) have found empirical indications for an evaluation practice that support recent scholarship on how methods-focused evaluation has a dominant position both in European and North American evaluation practices (Dahler-Larsen, 2009, 2012; Donaldson, Christie, & Mark, 2014; Peterson & Vestman, 2007; Vedung, 2010). This rise in methods-focused evaluation suggests a broader international context influenced evaluation practice, rather than a local or national context. In the next section, we examine what constitutes evaluation practice and further explore the contexts within which evaluation is practiced.
Understandings of Evaluation Practice
When evaluators use the term evaluation practice, they typically refer to the everyday work of doing evaluation, such as dealing with stakeholders, developing an evaluation plan, collecting evidence, communicating findings, and so on (Shadish, Cook, & Leviton, 1991; Stern, 2006). Based on evaluation theory literature and our dialogues as a research team, we recognized that American and continental understandings of evaluation practice varied. Most notably, evaluation theorists have situated evaluation practice within broad and varied contexts. Stern (2006), an editor of Evaluation the journal of the EES, stated that “evaluation as practice is an ‘open system,’ shaped by many particular societal, institutional and global contexts” (p. 293). This view of evaluation practice considers the contextual factors (i.e., globalization and public policy) that shape what constitutes evaluation and how it is accomplished. Similarly, Schwandt (2009) and Dahler-Larsen (2012) described the notion of an “evaluation imaginary.” The evaluation imaginary in a given epoch reflects the views and assumptions undergirding evaluation, but at the same time themselves undergirded by broader views, norms, and values in society (Dahler-Larsen, 2012). From this perspective, context reflects “the way we think about, conduct, and value the practice(s) of evaluation amid a world of other social agents—for example, clients, stakeholders, the general public, ( … ) who desire objective, external judgments of the value of social and educational plans and accomplishments is both animated by (and sustains) the evaluation imaginary” (Schwandt, 2009, p. 22). Put simply, contextual influences, or an evaluation imaginary, shape evaluators’ understandings of their everyday work doing evaluation.
Stern (2006) further articulated various common conceptions of evaluation practice, given contextual influences. These included evaluation practice as technical practice, management practice, professional practice, the practice of judgment, engaged practice, dialogic/argumentative practice, and social practice. For the purposes of this article, what is critical to understand is that each of these conceptions of evaluation practice coexists and is rooted in contextual realities.
Stern (2006) viewed the most prominent conception of evaluation as a technical practice. This conception of evaluation is akin to Schön’s (1983) technical rationality or how “professional activity consists in instrumental problem solving made rigorous by the application of scientific theory and technique” (p. 21). In evaluation practice, technical rationality is most evident in an emphasis on strong methods and the excellence of an evaluator’s toolkit. Schwandt (2002) also noted this form of evaluation practice. According to Schwandt, a technical view of evaluation practice reflects a modernist/naturalistic understanding that pays attention to procedural rationality and the methodological production of knowledge about objects. In contrast to a modernist view, a humanistic/hermeneutic perspective emphasizes human experiences and suggests that evaluation practice is local and involves meaning making. This humanistic understanding overlaps with Stern’s (2006) characterization of evaluation practice as engaged, dialogical/argumentative, and social.
Another crucial understanding of evaluation practice is evident in the scholarship of Shadish, Cook, and Leviton (1991). Although they acknowledge societal and institutional contexts that shape evaluation practice, these scholars also suggest that evaluation practice is informed by theory. Three types of theories have been cited in the literature to describe major approaches to evaluation practice. These theories include knowledge construction (methodology, methods, and kinds of knowledge most worth constructing), values (descriptive or prescriptive approaches to addressing stakeholder values), and use (how to facilitate the use of an evaluation for social problem solving). These three theories align with the more recent evaluation theory tree work by Christie and Alkin (2013), which reflects the scholars who have contributed to theories on evaluation practice.
Overall, our review of the literature suggests that evaluation practice is in large part conceptualized in relation to the contextual factors that influence the everyday work of evaluation. Various influences on evaluation practice include evaluation theories (i.e., methods, use, and valuing) and societal, institutional, and global contexts. We utilized the evaluation theory tree as a theoretical framework for this study to investigate the extent to which these theoretical approaches to evaluation practice—methods, use, and valuing—might manifest themselves within communities of professional practice, as evident in journal articles. Although we found that we were able to characterize the influence of evaluation theory on evaluation practice, we struggled to characterize the societal, institutional, and global contexts on evaluation practice, as evident in journal articles. For this reason, we will return to these influences in our discussion of the findings as we interpret the patterns of how the discourses from the various evaluation theories manifested themselves within the two professional communities. In what follows, we provide more details about the evaluation theory tree and how we used it to guide the study.
The Evaluation Theory Tree
Christie and Alkin’s (2013) Evaluation Theory Tree acknowledges and builds on previous evaluation theory literature (Shadish et al., 1991), reflecting three major areas of theorizing related to evaluation practice, namely, (a) use, (b) valuing, and (c) methods. Although this typology developed within the American context, Carden and Alkin (2012) also argued that it is relevant to the global evaluation community. The use branch draws on the early work of Daniel Stufflebeam and Joseph Wholey (Christie & Alkin, 2013) and refers to the ways in which those who contracted the evaluation work utilize evaluative data. Currently, use has broadened to include the ways in which various stakeholders use evaluative data. The second branch, valuing, is fundamentally influenced by the work of Scriven (1967) and Guba and Lincoln (1989). This branch suggests that a substantial part of the evaluator’s role is to make judgments related to the merit or worth of the object being evaluated; therefore, evaluation itself is considered a value-laden enterprise (Mabry, 2010; Scriven, 2003). Further, many evaluators seek to describe stakeholder values and then use them, in addition to other criteria, to judge the merit of a program (Shadish et al., 1991; Stake, 2004). For instance, values are included in the “descriptions of program experiences and in the judgments of its meaningfulness and consequence in the contexts being studied” (Hall, Ahn, & Greene, 2012, p. 196). The third branch, methods, relates to what Shadish et al. (1991) termed knowledge construction. Anchored in the work of Donald Campbell, this branch emphasizes applying methods in the “most rigorous manner possible given the constraints of the evaluation” (Carden & Alkin, 2012, p. 104).
For the purposes of the current study, we suggest that published articles represent evaluation as a professional practice (Stern, 2006). Since we assumed that various contexts shape evaluation practice, published articles, whether on theory or practice, represent meaningful reflections of evaluation activities and discourse constructed within, and influenced by, communities of evaluation practitioners. Our essential argument is that what appears in evaluation journals is—at least in some significant way—a reflection of an aspect of what evaluation practice actually is, given its location (i.e., Europe, North America, etc.) or evaluation imaginary. We also acknowledge that there is much evaluation practice that does not find its way into journals, which is a limitation researchers of other studies on evaluation practice have acknowledged (e.g., Christie & Fleischer, 2010).
Methods
Journal Sampling Procedures
The AJE and Evaluation journals were selected, as they represent main associations of evaluation practice sponsored by each continent, the AEA and the EES, respectively. Both of these associations are also involved in EvalPartners. Because we wanted a recent description of evaluation practice, all articles between 2008 and 2011 from the journal websites were downloaded. During this time period, Robin Lynn Miller (2008 and 2009) and Thomas Schwandt (2010 and 2011) served as editors of AJE and Elliot Stern served as editor of Evaluation. 1 Because editors influence which articles are published, this sample ensured a balance of editorships in both journals. References that were not full-length, featured articles were excluded, such as editorials, book reviews, memorials, methods notes, teaching evaluation, exemplars, ethical challenges, and so on. Although excluding these articles might have excluded articles that represented the distinctiveness of the professional communities, it ensured that only full-length articles were included from both journals. Thus, similarities and differences across journals could be attributed to the professional communities rather than the format of the articles. Given our understanding of evaluation practice, we also included articles in the sample that were both empirical and theoretical. Ultimately, AJE and Evaluation had 78 and 93 full-length, featured articles, respectively, for a total sample of 171 articles from this time period.
Review Guide and Content Analysis
To review each article systematically, we developed an article review guide based on our initial assumptions and constructs noted in Christie and Alkin’s (2013) evaluation theory tree. We created a draft guide to review a small (n = 30) random sample of articles. Next, we compared and discussed our initial codes and then revised the draft of the review guide. For example, an article characteristic that we commonly had discrepancies in coding was whether the article was primarily empirical or theoretical. For this reason, early on in the process, we added an option for both. We repeated this process several times, observing emergent patterns we saw in preliminary coding efforts and revisiting literature relevant to understanding evaluation practice. After this iterative process, two expert reviewers who have experience with evaluation in international contexts reviewed and commented on the draft guide, resulting in additional revisions of the guide. In addition to our preliminary coding efforts, the review of studies helped to ensure greater consistency across the coding conducted by all four team members. The final criteria used to code terms in each journal article are included in Appendix A. The final review guide is included in Appendix B.
The final guide began with several closed-ended questions that describe characteristics of the article, such as whether the article was empirical, theoretical, or both; methods used in the article (qualitative, quantitative, and mixed); substantive area/discipline of the article (e.g., health care, social work, and education); and institutional affiliation of the authors (e.g., university, independent consultant, government organization). Next, the guide included key terms from the evaluation theory tree typology (Christie & Alkin, 2013) representing use (i.e., evaluation use(s), use of evaluation, user(s), and utilization), valuing (i.e., values, valuing, and value), and methods (i.e., method(s), methodology, and methodological). This section formed the bases of our content analysis (Stemler, 2001), in which we recorded the number of times each term was used and a description of how the terms were used in the context of the article. If a term was repeatedly used in an article, we counted each occurrence.
Each team member was randomly assigned one-quarter of the articles to code. The team routinely discussed questions and issues during the coding process through regularly scheduled meetings and a shared coding document. Also, each reviewer flagged articles that were complicated to code, which a second reviewer coded and discrepancies were discussed among the whole team. We also reviewed the frequency of codes among reviewers in the midst of coding to discuss any major apparent discrepancies. To assess interrater reliability at the conclusion of coding, all four raters coded an additional 10% random sample of articles. We calculated intraclass correlations (ICCs), analogous to interclass coefficients, for all of the codes and categories. We did not utilize the Kappa statistic, because the variables contained categorical and continuous data, and we had four rather than two raters (Shrout & Fleiss, 1979). ICCs ranged from .23 to .92, which are considered to be in “fair” to “almost perfect” agreement (Landis & Koch, 1977; see notes of Table 1). The codes, which were most critical to addressing our research question, were among the highest ICCs. Specifically, the ICCs of the terms methods was .90, use was .92, and valuing was .78. Although we achieved these high levels of agreement on methods and valuing immediately, our initial agreement on the term use was lower. As a result, we discussed these differences, and two raters recoded all of their articles based on these discussions. The subsequent ICC is what we have reported.
Characteristics of Articles Published in American Journal of Evaluation (AJE) and Evaluation from 2008 to 2011.
aFair interrater reliability; intraclass correlation (ICC) is between .21 and .40. bThis count includes only articles found to be “Empirical” in the article description (Question 1). cSubstantial interrater reliability; ICC is between .61 and .80. dArticles may have more than one author and from different affiliations. Therefore, the total numbers in each subcategory will not add up to the number of articles from each journal. eModerate interrater reliability; ICC is between .41 and .60. fThat is, independent consultant, private firm, and government organization.
During the content analysis, we considered alternative approaches for representing the analysis. Most notably, because we found that North American authors published in Evaluation and European authors published in AJE (see Table 1), we explored whether it might be most appropriate to represent the analysis based on the authors’ continent rather than the journal. Excluding 15 articles that had authors from other continents or continents other than North America or Europe, we found 18 of 77 articles by North American author(s) published in Evaluation and 13 of 79 articles by European author(s) published in AJE. We replicated the analysis based on authors’ country rather than journals and found similar trends as we did in the analysis by journal. For this reason, we maintained the comparison between journals, given the original purpose of this study.
Intercontinental Comparison Findings
Based on our content analysis using the evaluation theory tree concepts (i.e., methods, use, and valuing), methods dominated evaluation practice in AJE and Evaluation. Although these patterns highlight the similarity between evaluation practices across continents, how methods were represented in AJE and Evaluation had nuanced differences, which we explored further with an in-depth analysis on a subset of articles. Most notably, AJE articles more commonly understood methods in relation to use and Evaluation articles more commonly understood methods in relation to valuing or exclusively as methods.
Article Characteristics
As indicated by Table 1, the articles reviewed in both AJE and Evaluation were primarily empirical with 64% and 48%, respectively, rather than theoretical. The majority of AJE authors were from North America (76%), and authors of Evaluation articles were mainly from Europe (71%). All authors from AJE and Evaluation were primarily affiliated with a university (91% and 79%). In contrast to these similarities, in AJE, education was the dominant area/discipline (32%) of the substantive study area, and health care, international development, and governance were fairly evenly represented substantive areas/disciplines in Evaluation with 13%, 14%, and 16%, respectively. What we found most notable was that the methodological choice reported by the authors was different between journals. For example, AJE had more articles that used quantitative data collection methods, followed by mixed methods and qualitative with 32%, 18%, and 14%, respectively; while the authors of Evaluation used more qualitative data collection methods followed by mixed methods and quantitative with 26%, 16%, and 7%, respectively.
Although we cannot interpret the contrast in data collection and substantive areas/disciplines trends across the journals with certainty, we suggest some plausible explanations. First, the difference between AJE and Evaluation in substantive areas in which evaluation practice occurs is consistent with the historical development of evaluation on each continent. In the United States, evaluation practice developed primarily in response to the War on Poverty in the 1960s and the passage of the Elementary and Secondary Education Act, which explains the domination of articles in this area (Heberger, Christie, & Alkin, 2010; Shadish et al., 1991). In contrast, European social programming is government funded or funded by the European Union, and almost all programming requires evaluation, which resulted in the spread of evaluation to a greater variety of substantive areas (Peterson & Vestman, 2007; Vedung, 2010). In Evaluation, the variation in data collection methods might be reflective of the increase of evaluators in different areas/discipline (Jacob, Speer, & Furubo, 2012). Second, given that authors are mostly affiliated with universities, variability might exist in potential publication outlets for various disciplines (e.g., education, social welfare, public health, and criminal justice). For example, scholars doing educational evaluation might have less discipline-specific journals for publishing findings in the United States than in Europe.
Evaluation Theory Tree: Content Analysis
To provide an overview of the content in each journal, we provided the total number of times the terms methods, use, and valuing, and their variations were utilized (see Table 2). The notion of method was central in our sample of articles, with 56% of total counts in AJE (1,444 counts) and 57% (1,267 counts) in Evaluation. 2 The term use represented 32% of the counts in AJE and 25% of counts in Evaluation. Although Evaluation had a slightly higher percentage of valuing counts with 18% than AJE with 13%, both AJE and Evaluation had lower counts of valuing represented in their articles as compared to methods and use with 324 and 395, respectively.
The Number of Times the Terms—Methods, Use, and Values—Were Used in American Journal of Evaluation (AJE) and Evaluation Articles from 2008 to 2011.
aHaving “almost perfect” interrater reliability; intraclass correlation (ICC) is between .81 and 1.00. bHaving “substantial” interrater reliability; ICC is between .61 and .80.
The concepts of use, valuing, and methods overlapped within AJE and Evaluation articles (refer to Figures 1 and 2, respectively). Each section of the figures displays the number of articles that had three or more occurrences of the respective terms. Occurrences of zero to two were considered negligible; therefore, we did not consider them to represent a particular category. For example, seven articles in AJE and five articles in Evaluation did not have more than two of any of the terms, which are represented by the numbers outside the overlapping circles. Although these diagrams provide a representation of how these concepts were interrelated, much variability existed within each area of the diagram. For example, in the overlap between methods and use in Figure 1, all of the articles had at least three or more of each term; but in the majority of the articles, the number of occurrences of methods was larger than use. Despite this oversimplification, these patterns demonstrated differences in how methods were understood in both journals. In AJE and Evaluation, a similar amount of articles had terms from all three concepts, 23% and 27%, respectively. In contrast, the journals had varying patterns in the remaining three areas of methods. Articles that had only terms related to methods included 26% and 33% articles, respectively. For articles that had terms related to methods and valuing, AJE had nine articles, representing 12% of the articles, and Evaluation had 18 articles, representing 19% of the articles. For articles that had terms related to methods and use, AJE had 19 articles, representing 24% of the articles, and Evaluation had 11 articles, representing 12% of the articles. In other words, both journals had similar occurrences of the terms related to methods; but in AJE, these terms commonly coexisted with terms related to use, and in Evaluation, these terms commonly coexisted with terms related to valuing, or on their own.

The number and percentage of articles with the terms methods, use, and valuing in AJE (n = 78).

The number and percentage of articles with the terms methods, use, and valuing in Evaluation (n = 93).
In-Depth Analysis of Methods
Given that the methods count was high relative to the other terms in both journals, and the coexistence of the term with use and valuing varied across the journals, we decided to conduct an in-depth analysis to explore the notion of methods in each journal. The purpose of the in-depth analysis was to take a closer look at how the term method might take on similar and different meanings in the journals. For the in-depth analysis, we selected the articles from each journal that represented the most extreme cases, as indicated by the greatest number of occurrences of methods counts (refer to Table 3) from each section of the methods circle in Figures 1 and 2. The categories that correspond to each section of the methods circle include methods; methods and valuing; methods and use; and methods, use, and valuing. This sampling resulted in a total of eight articles, with four articles from each journal. The extreme cases provided the most explicit representations of our study phenomena to explore. Flyvbjerg (2006) also argued that researchers learn about typical cases through studying extreme cases.
In-Depth Analysis of AJE and Evaluation Articles That Had an Extreme Number of Terms in Methods.
Note. AJE = American Journal of Evaluation; NGO = nongovernmental organizations.
After reading each article multiple times, the authors developed narratives or memos for each article. Memos were useful for reflection on the purpose of the articles and to glean analytical understanding of how the authors utilized the terms (Maxwell, 2013). For example, a memo crafted from an AJE article (Brandon & Singh, 2009) illustrating the methods, use, and valuing category (refer to Table 3) stated: These researchers conducted “meta-research” drawing from five previous literature reviews and meta-analyses of studies regarding evaluation use. They concluded that the methods utilized in these original reviews and meta-analyses did not use sound methods to warrant claims regarding evaluation theory on use. The majority of the occurrences of methods relate to the “research methods” or “types of methods” (i.e., survey, case study, simulation, narrative reflection) used in the original studies, and the extent to which these methods produce strong methodological warrants. Thus, the article was about the “strength” and “soundness” of methods.
For further analysis, narratives from the memos, including the number of times the terms were used, were then organized onto a table to make comparisons across all eight articles.
Our in-depth analysis revealed that both the AJE and Evaluation articles that contained a high representation of the terms related to methods focus on evaluating various techniques and provided information related to their effectiveness given the evaluation question. Of the two articles that contain a high representation of the terms related to both methods and valuing, the AJE article used terms related to method to discuss a particular approach to evaluation (i.e., cross-disciplinary evaluation). This article referred to valuing when suggesting that all disciplines are based on a set of assumptions and values. The Evaluation article in this category also used the terms related to methods to discuss an approach to evaluation. This approach was targeted for nongovernmental organizations (NGOs) and incorporates realistic evaluation (i.e., an approach focused on specific contextual factors connected to program outcomes) with attention to the values that guide organizational decision making and actions. The sample AJE article that contained a substantial representation of terms related to both methods and use examined research on the utilization of program evaluation findings, focusing on the overall methodological strength of the studies. The authors suggested that the methods utilized in the studies did not meet their criteria for using sound methods and thus lacked credibility. The Evaluation article uses terms related to methods and use to introduce a new methodology designed to strengthen and improve linkages between planning and program evaluation, and to recommend strategies for increasing the utilization of evaluation findings.
The AJE article that contained a representation of the terms related to methods, use, and valuing, used these terms to understand evaluation practice through a simulation study that asked evaluators to indicate their methodological (i.e., qualitative, mixed methods, and quantitative) and use (i.e., high, medium, and low concern for evaluation use) preferences, as well as how they might design an evaluation of a proposed school-based program. The study demonstrated that evaluators’ preferences were significantly related to their experiences (i.e., methodological training and theoretical orientation) and how they perceived the influence of evaluator and stakeholder values on the evaluation process. The Evaluation article that represents this category focuses on approaches to stakeholder participation in evaluation. The terms related to methods primarily refer to tools that can be used to best carry out participatory approaches. The terms methods and use were linked in this article, as the authors suggest that the participatory methods outlined enhance the utilization of evaluation findings. Finally, the article used terms related to valuing when suggesting that particular values are inherent in evaluation practice and thus present in participatory approaches. The summary of the in-depth analysis demonstrates that the ways in which the evaluation theory tree terms were actually used across the AJE and Evaluation journals were similar within the four areas of the methods circle in Figures 1 and 2, even though the frequency of the articles in the four areas varied across journals.
Summary
Both journals leave the evaluation theory tree with a strong branch of methods, while branches of use and valuing are much weaker. More specifically, it is in the leaves growing from and covering the space between the three main branches that we find interesting differences between contours of a North American and a European Evaluation Theory Tree. Both journals have a substantial amount of articles with the combination of the terms valuing, use, and methods, but AJE has a stronger focus on use than Evaluation. In contrast, the focus on valuing is slightly stronger in Evaluation compared to AJE.
If we look across what we have presented so far on article characteristics, in content analysis, and in-depth analysis, two pictures of evaluation practice as reflected in journal articles can be drawn. Evaluation practice in AJE, and thus in North America, is primarily concerned with empirical matters (in contrast to theoretical) with a preference for quantitative data collection methods. The substantive area and the discipline of educational evaluation has a dominant position compared to areas of health care, social work, governance, and criminal justice. This picture of disciplines involved in evaluation practice is in line with findings from Heberger, Christie, and Alkin (2010). Authors situated within this practice primarily originate from North America and they had institutional affiliations with universities (compared to independent consulting, private firms, governance institutions, etc.). When these authors published in AJE, they represented an evaluation practice concerned with evaluation methods, in particular with evaluation techniques and the value of specific evaluation approaches. Furthermore, they paid attention to the use of evaluation findings and utilization of program evaluation. The in-depth analysis indicated a critical view on the credibility of program evaluation utilization research and evaluators’ preferences for methods regarding use.
Evaluation practice as reflected in Evaluation is primarily concerned with empirical matters, but to a lesser degree compared to AJE as almost one third of Evaluation articles are found to be primarily theoretical. Also, in contrast to AJE, authors of Evaluation have a preference for qualitative data collection methods. Furthermore, evaluation practice in Evaluation is not dominated by one specific substantive area or discipline. Instead, areas of health care, international development, and governance are equally represented together with articles addressing multiple areas or no specific area. This leaves a much more heterogeneous picture of evaluation practice compared to practice of AJE, where educational evaluation holds a dominant position. Authors situated within a practice of Evaluation primarily originate from Europe and they were institutionally affiliated to universities. Authors publishing in Evaluation represented an evaluation practice concerned with evaluation methodologies, in particular with evaluation techniques and the value of specific evaluation approaches. This concern was equal to the evaluation practice reflected in AJE. Also, authors publishing in Evaluation paid attention to the use of evaluation findings and utilization of program evaluation, but from a normative point of view. Thus, the in-depth analysis indicated a practice concerned with methodologies aimed at improvement of evaluation utilization and with the value of increased stakeholder participation in evaluation.
Several theorists and researchers support our finding on how evaluation methodology across continents has gained dominance in recent years in order for the evaluation field to live up to demands on providing credible evidence (Dahler-Larsen, 2009, 2012; Donaldson, Christie, & Mark, 2014; Peterson & Vestman, 2007; Vedung, 2010). Taking Peter Dahler-Larsen’s (2012) conceptualization of evaluation practice situated within a local context of meaning making, we will use the final part of discussion to reflect on the current dominance of methods.
Evaluation Societies
A major finding of the current intercontinental comparison study was that evaluation practice, as reflected in journals (AJE and Evaluation), was concerned with methods, followed by use, and then valuing. As described earlier, the concept of methods, according to the evaluation theory tree, relates to designing and implementing methods in the most “rigorous manner possible” (Carden & Alkin, 2012, p. 104). The following discussion provides a possible interpretation of why both continents appear to emphasize methods for constructing knowledge more than utilization of evaluation findings or values engagement within the evaluation context.
We begin our interpretation by describing Dahler-Larsen’s (2012) notion of the evaluation society because his discussion is helpful to support and further contextualize the connection between the methodological rigor and outcomes focused evaluation practice. Dahler-Larsen argued that the following three sociohistorical changes in society influenced evaluation practice: modernity, reflexive modernity, and the audit society. Each of these societies describes a particular context or “evaluation imaginary,” which is defined as the purpose and meaning of particular forms of evaluation in the light of the society in which they unfold (Schwandt, 2009). We now detail the societies that have shaped evaluation practice as argued by Dahler-Larsen (2012).
The first society, modernity, advocated using evaluation procedures with defined indicators to predict, order, and structure reality. However, modernity did not deliver the value-neutral process desired to draw clear conclusions. Thus, the reflexive modernity society began to unfold. In this society, evaluation practice embraced the complexity of the world and a plurality of perspectives. Although the reflexive modernity society succeeded in engaging stakeholder values, it did less well to produce localized actionable knowledge or knowledge that could be used to contribute to the larger society. Put simply, it produced a “large number of models with inconsistent perspectives, unpredictable processes, and limited utilization” (Dahler-Larsen, 2012, p. 161). Given the limitations of reflective modernity, Dahler-Larsen argued that society returned to the procedural mode. However, in addition to the focus on quantification, a more aggressive goal aimed to control organizations and promote risk management procedures surfaced. Drawing on the work of Powers (1997), Dahler-Larsen (2012) termed this third society the audit society. In this society, evaluations are increasingly in demand to provide quantifiable information in order to (1) set standards, (2) ensure compliance, and (3) make decisions.
Understanding the shortcomings of the modernity and reflective modernity, as explained by Dahler-Larsen (2012), illuminates how the audit society came to be the dominant society. Evidence of the audit society’s ascendancy is also supported by the evaluation practice literature. For example, researchers observed the privileging of randomized controlled trails as the “gold standard” for providing the “secure” or credible knowledge needed to make decisions (Donaldson, Christie, & Mark, 2014; Schwandt, 2009). This current discourse, focusing on the best methods to generate credible evidence, reflects a major controversy about evaluation practice (Teddlie & Tashakkori, 2003). Many scholars today are concerned about the heavy emphasis placed on methods and evidence-based approaches. These scholars have even warned evaluators, mixed methods researchers, and the like about the dangers of methods-centric approaches because these approaches often eschew philosophical discussions, ethics, or social justice (Denscombe, 2008; Denzin, 2012; Lipscomb, 2008; Rist, 1995).
If it is agreed that we are indeed living in an audit society, as suggested by the scholarship, then the low percentage of articles that reflect a focus on valuing could be viewed as an indication of the intensity of the audit society’s influence on evaluation practice. Recall, valuing approaches to evaluation emphasize culture and context and use stakeholder values—in addition to other criteria—to judge the merit of a program (Shadish et al., 1991; Stake, 2004). And, as Dahler-Larsen (2012) pointed out, gathering different perspectives and utilizing a diversity of context specific evaluation approaches make it difficult to (1) control the evaluation process, (2) ensure credible evidence for decision making, and (3) manage risks within the organization or program. Therefore, values engagement is incongruent with the audit society goals or evidence-based evaluation practice. As a result, the audit society has aggressively advanced the use of cookie-cutter strategies, efficiency, and nonhuman forms of evaluation—a phenomenon known as the “McDonaldization” of evaluation practice (Ritzer, 1996, p. 579 as cited in Dahler-Larsen, 2012, p. 180). Overall, Dahler-Larsen’s (2012) notions about the audit society, the discourse on, and practice of, evidenced-based evaluation, as well as the findings of the Jacob and colleagues (2012) intercontinental comparison study mentioned earlier situate the context of our findings and provide plausible explanations for why methods dominate evaluation practice in AJE and the Evaluation journal.
Implications for a Global Dialogue
Since 2015 is the International Year of Evaluation, we wanted to offer potential questions for dialogue within our professional communities that are rooted in the trends of this study. Rist (1995) stated that methodology should not be given prominence when judging the field of policy evaluation. Instead, he suggested a perspective that accounts for methods, use, and valuing. Rather, the judgment should be made on the degree to which policy evaluation has demonstrated the consequences of present and past policy initiatives, clarified present policy choices, informed decision makers as to the costs and benefits of different options, and re-framed the debates on pressing national problems, be they housing, education, or health care, to name but three. (Rist, 1995, p. xiii)
Based on the findings of this study, a lack of emphasis on use and valuing remain. Political contexts of evaluation practice perpetuate these methods-centric approaches (Dahler-Larsen & Schwandt, 2012). Similar to Dahler-Larsen and Schwandt (2012), we assume that “ … evaluators (and evaluations) do not simply identify and respond to contextual factors, but by virtue of their actions are always constructing, relating to, engaging in, and taking part in some reconstruction of the context in which they operate” (p. 84). As global, professional communities of practice, how are we responding to the contexts within which we work? Based on the findings from this study, we offer the following questions for conversation: As we advocate for national and local evaluation practices, what will the nature of those evaluation practices be? As a professional community, what understandings of evaluation practice do we want to maintain and uphold? Do we have a responsibility to uphold the evaluation roots of use and valuing in the midst of the audit society? How might we go about enacting these understandings in practice? If we do want to counter method-centric approaches to evaluation practice, how are we able to do that in current economic, political, and social contexts?
We do not have answers to these questions per se; rather, from our experiences, we recognize the importance of dialogue among multiple perspectives regarding evaluators’ responsibility in shaping evaluation practice and maintaining the roots of evaluation practice.
Limitations
First, this study relied on journal articles to capture the professional communities across continents. Although the empirical investigation of these articles provided valuable information, solely focusing on journal articles as a reflection of professional communities is a limitation. For example, our sample demonstrated that authors are most likely institutionally situated within universities; however, the majority of evaluators are practitioners with different institutional affiliations (Peterson & Vestman, 2007). Second, the evaluation practice framework we used for the content analysis was mainly restricted to North American theorists, which is a limitation as a different framework including European theorists might have yielded slightly different findings. At the same time, some European countries have been exposed to and influenced by North American evaluation theory and practice (Furubo et al., 2002; Vedung, 2010). Third, our sample also included cross-continental publications, for example, European evaluators publishing in AJE and vice versa. From a methodological perspective, this complicated drawing comparisons across continents. These limitations demonstrate that evaluation practice is complex and difficult to nail down geographically and conceptually. However, we consider our findings valuable to examine the discourse on evaluation in two prominent and well-respected journals, published in different continents. We assumed that journals are the right place to look if we want to understand how discussions are focused and shaped, taking into account that many factors that influence these discussions, some of which are continent-specific and some are not.
Conclusion
In the course of conducting this study, we have examined understandings of evaluation practice, contexts that shape practices, and theoretical constructs such as methods, use, and valuing. The conversations we had among our research team about these terms in the process of this study were enlightening. A major goal of working as a team was to ensure that we represented all of the authors’ voices. As part of this process, we constantly questioned our own perspectives and opened ourselves to criticism. Upon reflecting on our process as a team and the outcomes of our research, we have come to the conclusion that our work modestly, yet meaningfully contributes to the global dialogue initiated by the EvalPartners. First, the findings contribute to the growing empirical evidence of methods-centric evaluation practices across continents. Second, these findings demonstrate that layered under this apparent, univocal method-centric evaluation practice trend are indeed nuanced differences in evaluation practice. And third, our work facilitates evaluators in their reflection on historical evaluation trends, the emergent congruence of evaluation practice, and the future of evaluation practice. As we engage in this global dialogue, we invite evaluators and other interested stakeholders to recognize, question, and learn from various approaches to evaluation practice. We look forward to other critical examinations of evaluation practice that emerge as a result of engaging in these conversations.
Footnotes
Appendix A
Appendix B
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
