Abstract
Impact evaluation and measurement are highly complex and can pose challenges for both social impact providers and funders. Measuring the impact of social interventions requires the continuous exploration and improvement of evaluation approaches and tools. This article explores the available evidence on meta-evaluation—the “evaluation of evaluations”—as an analytical tool for improving impact evaluation and analysis in practice. It presents a systematic review of 15 meta-evaluations with an impact evaluation/analysis component. These studies, taken from both the scholarly and gray literature, were analyzed thematically, yielding insights about the potential contribution of meta-evaluation in improving the methodological rigor of impact evaluation and organizational learning among practitioners. To conclude, we suggest that meta-evaluation is a viable way of examining impact evaluations used in the broader social sector, particularly market-based social interventions.
The need for impact evaluation or impact measurement is becoming increasingly commonplace for all types of organizations that use public and private funding to drive social change. More recently, the rise of new actors designing and delivering social interventions, such as international nongovernmental organizations (NGOs), philanthropic organizations, and market-oriented players, has introduced new solutions and instruments to tackle social issues (Picciotto, 2015). Organizations in the social sector are expected to demonstrate their effectiveness, mainly as proof of legitimacy or accountability for both donors and beneficiaries (Barraket & Yousefpour, 2013; European Union & Organisation for Economic Co-operation and Development [OECD], 2015; Nguyen et al., 2015). Existing literature suggests that social impact providers, funders, and investors continue to find impact measurement a challenge despite significant growth in the frameworks and tools available to support them (OECD, 2019; S. D. Phillips & Johnson, 2019). Not-for-profit organizations face particular challenges in evaluating social goals, and their evaluations often succumb to common flaws (Liket et al., 2014). Organizations that deliver social impact—including not-for-profit organizations, social enterprises, purpose-driven corporations, governments at all levels, and funders and investors (such as philanthropic organizations, corporate investors, governments, and international development agencies)—have joined forces in the quest to develop pragmatic and rigorous impact measurement methods.
The current evaluation literature and social impact measurement literature present mixed views about the link between evaluation and social impact measurement. On one side of the debate, evaluation and social impact measurement are depicted as separate fields. For example, Vo and Christie (2018) recognized that evaluation and social impact measurement have each developed in response to different stakeholders and social demands and that private entities in the pursuit of social change often prioritize financial return over social and environmental goals when conducting impact measurement. On the other side of the debate, the traction of social impact is seen as opening a new chapter in evaluation theory and practice development. As Picciotto (2015) puts it, social impact measurement is “the fifth wave of evaluation diffusion” (p. 2). Additionally, practitioner communities from each field have formed separate, yet overlapping, camps, with evaluation practitioners working primarily in public, philanthropic, and not-for-profit sectors and social impact measurement practitioners with market-based entities (Reisman et al., 2015). To foster practical and viable approaches, it is necessary to engage with both evaluation practice and social impact measurement practice and draw from shared knowledge and learnings across sectors and fields.
Asking hard questions around “what works, for whom, and how” has long been a trademark of the evaluative thinking and practice. We have seen a growth in the resources available to evaluation practitioners, including evaluation frameworks, methods, standards, and tools to assess the progress and outcomes of social interventions. Measuring the impacts of social interventions is imbued with complexity, and we believe that there is a strong need to investigate current evaluation and impact measurement approaches. Meta-evaluation is a method that allows us to explore the evidence of an evaluation approach or product and conduct critical analysis on the merit, quality, and effectiveness of the evaluation.
The literature espouses meta-evaluation as a means for assessing the methodological rigor of evaluations and for aggregating data from multiple evaluations (Cooksy & Caracelli, 2009; Henry, 2016). In this study, we drew upon Michael Scriven’s original definition of meta-evaluation as the “evaluation of evaluations” (Scriven, 1969, as cited in Scriven 2009, p. iii). While meta-evaluation is commonly conflated with meta-analysis, the two concepts are different. The key element that differentiates meta-evaluation is that “it evaluates the evaluations to which it refers; it does not merely summarise them” (Scriven, 2005, p. 250). In particular, a meta-evaluation assesses the merit and worth of an evaluation, while a meta-analysis performs a quantitative synthesis of empirical studies (which might include evaluative studies; Stufflebeam, 2001).
Meta-evaluation is a specific type of evaluation that can be used to systematically identify factors that must be addressed to improve the evaluation and support decisions about the quality of the evaluation (Yarbrough et al., 2010). Typically, a meta-evaluation looks at multiple evaluations to understand broader patterns in the processes and outcomes of evaluations. The method can be tailored for various purposes and possibilities. For example, internal meta-evaluations use self-administered checklists to assess accountability, formal external meta-evaluations can be commissioned to assess the quality of evaluation, and targeted meta-evaluations can be used to explore which kinds of evaluation approaches are most efficient and effective for specific purposes (Cooksy & Carcelli, 2009; Yarbrough et al., 2010). The method can thus be used to examine the utility, strengths, and limitations of approaches employed in published evaluations (Henry, 2016; Stufflebeam, 2001; Tingle et al., 2003).
We conducted a systematic literature review to examine how impact evaluations and analyses have been assessed through the analytical lens of meta-evaluation. Systematic literature reviews are typically used to process large volumes of information to map out areas of uncertainty in research and identify gaps in the literature (Petticrew & Roberts, 2006). Systematic reviews also provide a reproducible method for synthesizing empirical research and in some cases allow for generalizations to be drawn from the findings of different studies (Cooper et al., 2009). Aside from its common application in health and medicine fields, systematic reviews are increasingly being adopted to other areas such as management, conservation, international development, and economics (Gough et al., 2013).
As such, this study set out to undertake a systematic literature review that assesses multiple meta-evaluations. The review initially targeted both impact evaluation and social impact measurement literature in order to draw lessons from meta-evaluations in both fields. Following a systematic search of the scholarly and gray databases, we included texts that both were (a) meta-evaluations—defined as the “evaluation of evaluations” (Scrivens, 2009)—and (b) involved an impact evaluation or assessment component. We initially intended to include social impact measurement or impact analysis practices employed by social impact providers and investors. However, we could not trace any explicated application of meta-evaluation in impact measurement of social enterprises or other market-oriented solutions. As a result, the meta-evaluations reviewed in this study examined impact evaluation practices primarily in a program and development evaluation context. Accordingly, our research questions are as follows: What does the available evidence tell us about the extent to which meta-evaluations assess impact evaluations? What can meta-evaluation as an instrument contribute to improving evaluation and impact analysis practices?
The rest of this article is organized as follows. After presenting our methodology for the search strategy, quality assessment of the publications, and approach to thematic analysis and synthesis, our findings are arranged according to our research questions. We describe the scope of impact evaluations and assessments sampled and discuss the key methodological features meta-evaluated, the purposes and aims of meta-evaluations, and meta-evaluation criteria used, before summarizing the recommendations from the studies for improving methodological practices. The article proceeds to discuss applications of meta-evaluation in broader contexts such as social impact analysis, leading to calls for greater dissemination and evidence synthesis of impact evaluations and social impact measurements.
Method
Search Strategy
This study follows a systematic literature review approach. Between January and June 2019 (later updated to April 2020), we conducted a comprehensive search of English-language publications available from January 1999 to March 2019 using four scholarly databases—Web of Science, EbscoHost, ProQuest, and Scopus—alongside a customized NGO search engine, 1 allowing the search to cover both peer-reviewed academic literature and gray literature. To ensure that any additional materials not indexed in these databases/search engines were identified, we undertook a targeted search of 37 evaluation-related journals 2 and examined the reference lists of articles screened. The search strategy defined the search terms to include in either the abstract or key words of the publication. A combination of the following search terms was used: “meta-evaluation” and “social impact” or “impact assessment” or “impact evaluation” or “impact measurement” or “impact review”. See Table 1 for the string terms used in each round of the search. Truncation symbols (*) were employed to include grammatical and spelling variants of the search terms (e.g., “metaevaluation”, “meta evaluation”, and “meta-evaluation”); additionally, Boolean operators (AND/OR) were used to narrow the results.
Databases or Journals Searched and the Corresponding Search String Used.
The initial search in 2019 yielded 71 unique articles from the four scholarly databases. Although three of the databases (i.e., EbscoHost, Web of Science, and ProQuest) collectively index gray literature (including working papers, market research reports, industry reports, and case studies), all results were either scholarly journal articles or book chapters. In a subsequent search for gray literature, we first used Google’s web search engine. However, this yielded a surfeit of irrelevant sources. To find public reports from governments and industries, we conducted a separate, customized NGO search based on the assumption that NGOs undertake a considerable volume of program and development evaluations. Google does not recognize Boolean operators or truncations, so we modified the search terms to (1) “meta evaluation”, “impact assessment” and (2) “meta evaluation”, “impact” (including the quotation marks). The customized NGO search refined the Google search results to only those published by NGOs. This filter yielded 23 unique hits, most of which were industry reports from international charities and humanitarian organizations. Our targeted search (of evaluation-related journals and reference lists of screened articles) yielded eight new studies including scholarly articles and public reports from governments and NGOs. Dissertations, commentaries, and books were excluded from these initial search results (see our discussion on potential bias created by this decision in the Limitations subsection), while conference proceedings and book chapters were included.
The search protocol was repeated in April 2020 to capture records that were published or indexed between June 2019 and April 2020, after the original search was run. In total, the updated search incorporated 11 new results for screening—four of which were from the original four databases, seven from the targeted search of evaluation journals, and none from the custom Google search of NGO publications. A total of 113 publications were identified for screening in this study.
Screening Process
The inclusion criteria developed for screening stipulate that (a) meta-evaluation is defined as an “evaluation of evaluation(s),” which excludes meta-analyses or meta-reviews that may be labeled as “meta-evaluations”; and (b) the meta-evaluation should include some form of impact analysis that has evaluated, measured, or reported the impacts of the intervention(s) under evaluation. Based on these inclusion criteria, screening was carried out in two steps. First, researchers independently screened (1) the 71 abstracts from the four databases, (2) the 23 abstracts from the customized Google NGOs search, and (3) eight abstracts from the targeted search against the inclusion criteria. The same screening process was repeated in April 2020 for the 11 records that were published or indexed since the original search was run. No additional publications from the updated search were added to the final sample. Second, two researchers independently conducted full-text reviews of the 40 articles identified from the first step against the same criteria. Disagreements between researchers were resolved through consensus-based discussion. The screening process is summarized in Figure 1. A total of 15 publications—meta-evaluations that evaluate or assess impact evaluations or impact analysis in some form–were assessed as eligible for inclusion. The meta-evaluations reviewed in this study are summarized in Table 2.

Screening and selection process.
Meta-Evaluations With Impact Evaluation or Impact Analysis Components by Publication Type.
Quality Assessment
To assess the quality of the meta-evaluations under review, a set of criteria was developed based on guidelines under the meta-evaluation standard in the Program Evaluation Standards (second edition) developed by the Joint Committee on Standards for Educational Evaluation (1994), the checkpoints for the meta-evaluation standard in Stufflebeam’s (1999) meta-evaluation checklist, and recommendations for internal and external meta-evaluation standards in the latest edition of the Program Evaluation Standards (Yarbrough et al., 2010). The quality assessment tool contains 10 questions (Table 3). Two reviewers independently assessed the sampled meta-evaluations using the quality assessment questions and marked each question as being either “addressed,” “partially addressed,” or “not addressed”/“not clear (unable to judge).”
Quality Assessment Questions and Results for Meta-Evaluations Reviewed.
Note. n = 15. A = addressed; P = partially addressed; N = not clear (unable to judge).
Table 3 reports the questions assessed and results. Most of the 15 studies we analyzed had clearly articulated aims and objectives, indicated the evaluation criteria or standards applied, and who the meta-evaluator was (team), and the process and steps undertaken to conduct the meta-evaluation. The majority also examined evaluation tools and methodological practices such as instruments, data collection, and data analysis. The resources allocated to the meta-evaluation were either inadequately addressed or overlooked by most studies, making it the weakest area identified by the quality assessment. Appropriately costing necessary resources for meta-evaluation contribute to not only transparency but also to the accountability of the whole meta-evaluation exercise.
Data Analysis and Synthesis
Descriptive data were first collected across the 15 meta-evaluations, specifically the type and year of publication, type of meta-evaluation, type of evaluations evaluated, focus areas of the programs evaluated, and the number of evaluations included in the meta-evaluations. Data were then summarized including the frequency counts for each of the data categories above. It is worth noting that this study does not attempt to draw inferences from the sampled 15 studies to a larger population of potential meta-evaluations. Rather, our focus was to identify patterns and themes that emerged in the sample to shed light on the application and improvement of impact evaluation and analysis practices.
We adopted a “theory-driven” approach to conduct a thematic analysis of the data (Braun & Clarke, 2006). The method involves an initial step to develop a coding frame centering on the methodology or methodological features reviewed and following steps to identify and define emergent themes (for further details, see Braun & Clarke, 2006). First, a draft coding frame was independently tested by two researchers using a small subset of the sample, and revisions were incorporated to finalize the coding frame. Table 4 contains the final coding frame used. This process also enabled researchers to become familiar with the coding frame and to reach consensus in applying it. Next, two researchers independently coded the 15 meta-evaluations using QSR’s NVivo software (Version 12). Disagreements between the two researchers were resolved through consensus-based discussion. Subsequently, based on the initial codes, one of the two researchers who coded the studies synthesized the qualitative data and developed analytical themes that address the research questions (Thomas & Harden, 2008).
Coding Frame Used in the Systemic Review.
Findings
Descriptive data from our sample of 15 meta-evaluations are reported in Tables 5 –7. Meta-evaluations that were published in peer-reviewed scholarly journals accounted for 40% of the sample (Table 5). As shown in Table 2, the meta-evaluations varied in size, ranging between five and over 300 evaluations. On average, program reports meta-evaluated larger samples of evaluations than scholarly studies (see the estimated average size of meta-evaluations of the two groups in Table 5 and a further breakdown of the number of evaluations by publication type in Table 6). Program reports likely had access to either internal or unpublished evaluations and designated funding to conduct meta-evaluations at a larger scale. Although close to half of the meta-evaluations were published before 2010 (n = 7), with the majority being program reports (n = 6), there is a comparatively large amount of recently published scholarly articles in our sample (Table 7). Most meta-evaluations were conducted by evaluators/researchers external to the program evaluated (73%, n = 11), which accounts for all scholarly articles and half of the subsample of program reports (Table 2). Just over half of the studies evaluated programs or interventions that spanned multiple program focus areas (n = 8), and the other half evaluated programs that focused on one or two specific program areas (Table 2). As such, this review sees meta-evaluation as a specific type of evaluation being applied to a diverse range of evaluands.
Type of Publication.
Note. n = 15.
a The total and average numbers of evaluations assessed in the subsample of program reports are estimates as two meta-evaluation reports provided estimates instead of accurate numbers of evaluations assessed.
Number of Evaluations Assessed.
Note. n = 15.
Year of Publication.
Note. n = 15.
This review sets out to examine impact assessment and analysis practices in the context of program evaluation through the analytical lens of meta-evaluations. The rest of this section presents our key findings under each of the research questions.
First, we looked at the methodological scope of impact evaluations sampled in the meta-evaluations. While only two papers had a single focus—one on outcome/impact evaluations (Philips & de Wet, 2017) and another publication on economic return studies of health promotion programs (Chapman, 2012)—the remaining 13 had a multifocal approach, evaluating mixed types of evaluations. These encompassed formative and summative evaluations as well as implementation, performance, and impact evaluation reports. Of these 13 studies that meta-evaluated different evaluations, four identified impact evaluation as a type of evaluation and reported a percentage of impact evaluations in the sample, ranging from as low as 3% to 9% (e.g., Hageboeck et al., 2013). The exception here was Lam et al. (2019), in which 24% of the sampled evaluations were impact evaluations. Another paper noted that among the 50 evaluations meta-evaluated, 40% had an impact assessment component (Leveille & Chamberland, 2010).
Some studies observed performance or formative evaluations that reported outcome data, asked questions on causality or attribution, or applied impact evaluation techniques to some extent. Close to half (40%) of the meta-evaluations investigated whether the evaluations under assessment had raised questions about attribution issues. Hence, on the one hand, impact evaluations and assessments might constitute a small yet growing pool of evidence as a newer type of evaluation, and on the other hand, impact assessment or measurement inquiries and practices are not exclusive to impact evaluations alone.
Most of the meta-evaluations we reviewed focused on the quality dimensions of evaluations with a systematic approach. This review consequently examined the methodological features of impact evaluations or impact assessments captured through meta-evaluations. Our findings reveal a focus on research design and impact measures. Given the attention to attribution, it is not a surprise to see preference given to the more experimental research or evaluation designs, particularly those that use control groups and random assignment in response to the need to verify impact. In their meta-evaluation of the U.S. Agency for International Development (USAID), Hageboeck et al. (2013) indicated that USAID’s definition of impact evaluation articulated criteria such as whether a comparison group was included and whether it had at least two data points. While quantitative evaluations are important for outcome measures, not all impacts are quantifiable. One study recognized that qualitative evaluations offer useful information on barriers and opportunities for improving program processes and outcomes (Lam et al., 2019). Only one publication from our sample focused solely on outcome/impact evaluations that use naturalistic/qualitative methods (Philips & de Wet, 2017). They concluded that lack of methodological clarity is a key contributor to perceptions that naturalistic/qualitative evaluations lack rigor (Philips & de Wet, 2017). Another common design feature examined is the presence (and role) of theory or logic models in evaluations, also known as impact hypotheses. Although the results from the meta-evaluations that assessed this feature are mixed, the underlying understanding is that a theory of change or logic model should be used or referred to as a guide for evaluations (e.g., Houghton & Robertson, 2001; Weigel, 2012).
Measurement quality is not typically a meta-evaluation criterion. However, the outcome and impact measures vary significantly in our sample of meta-evaluations. The measurement challenge resides not only in how to measure but also what to measure (Scott-Little et al., 2002). Many have called for better and/or more appropriate impact indicators. One meta-evaluation highlighted the need to associate the development of appropriate impact indicators with affected populations instead of being driven by the priorities of the funding agency and/or government (Houghton & Robertson, 2001). This resonates with recommendations made by Lam et al. (2019) that engaging beneficiaries and stakeholders through participatory evaluation approaches when developing indicators could encourage stakeholder ownership of the evaluative process. Philips and de Wet (2017) also flagged the importance of understanding the context of an indicator and that the availability of such information could clarify the extent to which an indicator is measurable and therefore its suitability in specific contexts.
Quality assessment and improvement is at the heart of a meta-evaluation because it helps stakeholders know the extent to which they can trust the evaluation (Yarbrough et al., 2010). Table 8 reports the purpose of the meta-evaluation as explicated by each study. Fourteen of the 15 studies focused on either quality assessment (pertaining to the assessment of the quality of evaluations) or quality/performance improvement (pertaining to the improvement of evaluation quality or practice or services delivered) or both. This reflects the importance attached to the quality of practices in our sample. In addition to quality, the sample encompasses a range of other evaluation purposes including organizational learning, tracking progress over time, and project-specific objectives (e.g., assessing gender incorporation in evaluations, contributing to strategic framework development, and assessing the empirical value of a particular model). This type of quality checking lays the foundation for the versatile application of meta-evaluation in the organizational learning and capacity building space. As such, meta-evaluation can be used explicitly as a quality assessment instrument, and the meta-evaluation technique can also serve as an analytical tool to explore specific topics or situations of interest.
Purposes and Objectives of the Meta-Evaluations.
Note. n = 15.
Assessing the methodological rigor of evaluations is a primary feature of meta-evaluations that are used for quality check purpose. Ten of the 15 publications in this review meta-evaluated either the design or methodology of evaluations or both. This reflects the underlying perception that evaluation findings are only as good as the methodologies used (Scott-Little et al., 2002) and that good quality evaluations can stimulate further use (Jacob & Desautels, 2014). Although the elements used to assess the quality varied greatly, there was some commonality in the meta-evaluation criteria/standards used. In comparison to scholarly meta-evaluation studies, which preferred the use of existing criteria/standards, the program reports mainly relied on specifically developed criteria (see Table 9). The most common existing criteria/standards used or referred to in the sample included the OECD’s Development Assistance Committee (DAC) evaluation criteria (OECD DAC, 2019) and the Program Evaluation Standards (second edition and third edition).
Meta-Evaluation Criteria/Standards.
Note. n = 15.
Additionally, the sampled studies also drew on existing assessment tools for particular issues. For example, Lam et al. (2019) adapted questions from gender-related assessment tools, and Leveille and Chamberland (2010) applied assessment tools for qualitative and quantitative research to evaluate the quality and rigor of the studies they meta-evaluated. The meta-evaluations that used study/program-specific criteria had greater variation in their assessment tools. For example, Philips and de Wet (2017) developed four trustworthiness criteria; Chapman (2012) used a set of seven criteria including research design, sample size, quality of baseline delineations, quality of measurements used, and the appropriateness and replicability of interventions; and Universalia’s (2003) report developed a 21-section quality checklist. It is likely that program reports took into account the diverse perceptions of evaluation stakeholders and the different ways that meta-evaluation results can be used and reflected these in the development of meta-evaluation criteria and standards. The prevalence of program-specific criteria in the sample highlights the diversity in interpretations and the demand for quality in practice. These criteria/standards, regardless of existing or program-specific frameworks, were used in various ways by the sampled publications, including creating a rating or scoring system (Chapman, 2012; Scott-Little et al., 2002), building a checklist (Hageboeck et al., 2013), guiding analysis and synthesis (Philips & de Wet, 2017), and formatting sections or headers in reports to present findings (Houghton & Robertson, 2001; Universalia, 2003).
A majority of the sample also examined the methods used to collect and analyze data. Several publications noted that data analysis, particularly the relationship between data and analysis, lacked clarity and the use of appropriate methods to guide the process—despite allowing for more data being collected in the field. The demand for explicitness and detail in data collection and analysis methods was addressed to both quantitative and qualitative evaluations (Hageboeck et al., 2013; Jacob & Desautels, 2014; Philips & de Wet, 2017). In two studies that cited the lack of data as a factor that weakened evidentiary foundations of the project findings (Goldenberg, 2001; Weigel, 2012), the missing data were baseline data. This suggests that data collection (particularly in baseline studies) must be embedded early in the design of an evaluation.
Most of the meta-evaluations we reviewed made recommendations about ways to improve methodological or quality practices. Table 10 groups these recommendations into five areas and outlines the key suggestions made in each area. It is not a surprise to see most recommendations being directed to quality dimensions of evaluations or meta-evaluations, given the central role of quality assessment under a meta-evaluative lens. Five key quality improvement recommendations emerged in the following areas: the evaluation design elements, stakeholder participation, the need to link evaluations to the owner organization’s strategies and goals, impact assessment design, and the importance of enhancing methodological clarity. The operationalization of quality-focused evaluation practice is another thematic area that drew a number of recommendations. These included encouraging organizations to take up recommended approaches and tactics to build organizational capacity, developing resources to support learning, and utilizing evaluation results to inform decision making. Many of these suggestions could benefit evaluation practices regardless of the type of evaluation involved. Others are more relevant to impact evaluation and analysis practices. These include, for example, embedding impact assessment in project cycle management; using a theory of change, logic model, or impact hypothesis as the underlying theoretical foundation; offering technical support for impact assessment tasks; and sharing impact assessment results with the wider expert and practitioner community by publishing in peer-reviewed journals.
Quality Improvement Recommendations From Meta-Evaluations.
Discussion
Just as evaluation addresses evaluative questions around the quality and value of an evaluand (Davidson, 2014), meta-evaluation assesses the quality of evaluations. This makes meta-evaluation a useful analytical tool for organizations looking to monitor and improve their evaluative practices. The fact that we could not locate meta-evaluations evaluating impact measurement of market-based solutions, such as social enterprises or impact investment, suggests that the meta-evaluation technique has yet to be taken up in contexts beyond program and development evaluations. Social sector organizations are facing methodological challenges and resource constraints that limit their technical expertise in social impact measurement and reporting (see Barraket & Yousefpour, 2013; Haski-Leventhal & Mehra, 2016). Adopting a meta-evaluative approach as part of the social impact analysis can help impact providers and funders improve the quality and rigor of impact measurement.
A relatively small number of impact evaluations were identified in the meta-evaluation sample. As discussed in the previous section, the complexities of addressing social issues and measuring and reporting impacts demand further exploration of current approaches and evidence base. The benefits of synthesizing knowledge and evidence have been long demonstrated in the natural and medical sciences (e.g., the Cochrane Collaboration), but much less so in the evaluation and social impact measurement field. The need to disseminate and synthesize the findings of impact evaluations, social impact measurements, and other methodologically demanding evaluative exercises is crucial as the demand for social impact continues to grow. Meta-evaluation can serve as an effective instrument for systematically reviewing and aggregating evaluation studies.
The meta-evaluative lens undertaken by our study highlights some emergent issues in impact measurement practices. All meta-evaluations we reviewed had evaluated impact evaluations or assessments, evaluations that had impact analysis elements, or evaluations that reported on impact. However, very few provided clear definitions of outcomes, impact, and impact evaluations. Many used impact interchangeably with outcomes, and some referred to the impact element in OECD’s DAC standards or discussed impact in the broader context of program results, achievements, and success. The lack of a clear definition of “impact” could be a key contributor to the challenges experienced in impact measurement, particularly around what to measure and how to measure. Indicator development is one process that can be used to define results (Houghton & Robertson, 2001). Similarly, impact measurement plays a role in defining impact. For example, the USAID definition of an impact evaluation stipulates the presence of control groups and at least two data points. Clarity in definitions of impact is required to improve impact measurement practices, just as methodological clarity is required to improve evaluation design and method, data collection, and analysis. Furthermore, only one meta-evaluation (Goldenberg, 2001) in our sample noted the effect of time on achieving short- and long-term goals, arguing that evaluators rarely made the distinction between short-term achievements and long-term prospects. Although evaluation reports typically included information on timing according to project cycles—such as baseline, midterm, end line, or impact—the need to incorporate temporality in impact measures and reporting is critical, particularly when long-term benefits and sustainability-related goals are involved.
There are some limitations in this study that should be noted. The scarcity of meta-evaluations in the public domain is a key drawback. However, systematically sourcing relevant gray literature is evidently a challenge: Our search from the academic databases yielded fewer practice-based reports than expected and using search engines like Google left us saturated with irrelevant results. To ameliorate this limitation, we used a customized Google search with an NGO filter to source most of our gray literature. However, limiting the scope of the search to materials published by NGOs may have created a bias in the gray literature sample. Another potential source of bias is the exclusion of dissertations from our search. The decision to exclude dissertations was made to ensure that our selection was systematic and consistent by reducing publication bias that may incept from the dissertations, assuming that significant research findings would have likely been published. However, we acknowledge that this might have excluded dissertations with unpublished empirical studies on meta-evaluations.
We also note that there is variability in how meta-evaluations are named and discussed in both the scholarly studies and gray literature. During our screening process, we found that many studies that used the term “meta-evaluation” were better described as meta-analyses. Thus, our search criteria may have not captured some relevant studies because of the way those studies were labeled. It is also worth noting that studies included in this systematic review are all self-identified as meta-evaluations.
Conclusion
Meta-evaluations are systematic reviews of evaluations. This study examined meta-evaluations of impact evaluations and impact analysis practices. Our findings suggest that meta-evaluation is a viable approach for developing evaluative capacity in organizations, including quality assessment and improvement of evaluative practices. Given the growing evidence base of impact evaluation and the increasing demand for better impact analysis and reporting from the social sector, meta-evaluative techniques can play a crucial role in improving the quality of impact measures. The limited application of meta-evaluation in our sample, and the fact that all were in the program or development evaluation field, highlight the need for wider uptake of this approach in other contexts. The use of meta-evaluation techniques among new actors—such as social enterprises and other market-based actors—may lead to new developments in social impact measurement practice.
Footnotes
Acknowledgments
The authors wish to acknowledge the contribution of Dr. Erin Wilson and members of the Social Enterprise Impact Lab (SEIL) Project at the Centre for Social Impact (CSI), Swinburne University of Technology for their input. They are thankful for the valuable comments provided by the editor and their reviewers whose comments have helped them to sharpen and strengthen the article. They also wish to thank the Lord Mayor’s Charitable Foundation and Family Life for providing the funding for the SEIL project, of which this study was part of.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Lord Mayor’s Charitable Foundation and Family Life in Melbourne, Australia.
