Abstract
Since the beginning of the 21st century, evaluability assessments have experienced a resurgence of interest. However, little is known about how evaluability assessments have been used to improve future evaluations. In this article, we identify characteristics, challenges, and opportunities of evaluability assessments based on a scoping review of case studies published since 2008 (n = 59). We find that evaluability assessments are increasingly used for program development and evaluation planning. Several challenges are identified: politics of evaluability; ambiguity between evaluability and evaluation, and limited considerations of gender equity and human rights. To ensure relevance, evaluability approaches must evolve in alignment with the fast-changing environment. Recommended efforts to revitalize evaluability assessment practice include the following: engaging stakeholders; clarifying what evaluability assessments entail; assessing program understandings, plausibility, and practicality; and considering cross-cutting themes. This review provides an evidence base of practical applications of evaluability assessments to support future evaluability studies and, by extension, future evaluations.
Keywords
Since its conceptualization in the late 1970s (Wholey, 1979), evaluability assessment has received relatively little attention in the evaluation literature (Trevisan, 2007). However, in the first decade of the 2000s, evaluability assessment has received a resurgence of interest (Davies & Payne, 2015; Leviton et al., 2010; Trevisan, 2007; Trevisan & Walser, 2015; Walser & Trevisan, 2016). Applications of evaluability assessment have paralleled this growth, as evident by the notable publications in the gray literature—research reports, books, theses, and dissertations—that report on evaluability assessments (Davies, 2013; Peersman et al., 2015; Trevisan & Walser, 2015; Walser & Trevisan, 2016). Such efforts reflect the applied nature of evaluability assessments, as well as the interests primarily of program managers, evaluators, and other evaluation partners rather than the wider academic community.
Evaluability assessment was originally designed as a pre-evaluation activity to determine whether a program was ready for an impact evaluation (Wholey, 1979); a program is considered ready to the extent that program goals are agreed on, information needs are well-defined, evaluation data are obtainable, and intended users are willing to use evaluative information (Wholey, 2004, 2010, 2015). The United Nations Evaluation Group (UNEG, 2016b) also supports this definition by stating in their updated Norms and Standards for Evaluation: “The evaluability assessment implies verifying if there is clarity in the intent of the subject to be evaluated, sufficient data are available or collectible at a reasonable cost, and there are no major factors that will hinder an impartial evaluation process” (p. 22). In current practice, however, evaluability assessment has evolved into an approach that can be used at any point during the development and implementation of a program, as well as support process, developmental, and outcome evaluation purposes (Davies & Payne, 2015; Trevisan, 2007; Trevisan & Walser, 2015; Walser & Trevisan, 2016).
While the evolution of evaluability assessments reflects important efforts to improve the breadth of the approach, several concerns have been expressed about the practical application of an evaluability assessment. Some scholars and practitioners emphasize the misconceptions, misunderstandings, and misinterpretations of what the evaluability approach actually is and how it can be used (Trevisan & Walser, 2015; UNEG, 2016; Walser & Trevisan, 2016). Others highlight the importance of more guidance on its implementation (D’Ostie-Racine et al., 2013; Trevisan, 2007), for example, the context in which evaluability assessments are needed or how to tell when enough data are collected to make a judgment (Davies & Payne, 2015; Holvoet et al., 2018). Finally, despite the increase in applications of evaluability assessment, few syntheses of these studies exist (Davies, 2013; Trevisan, 2007; Walser & Trevisan, 2016), thus limiting our understanding of how evaluability assessment has advanced in theory and practice over time.
The objectives of our article are to (1) examine the range and nature of evaluability assessments published in the peer-reviewed and gray literature over the past decade and (2) assess reported challenges and limitations of conducting an evaluability assessment. We select the time frame of the past 10 years in order to build on the review of Trevisan (2007), which assessed evaluability assessments in the published literature over the prior 20-year period (1986–2006). We update the review of Trevisan to help identify advances and gaps in evaluability assessment research and practice. Of note, evaluability assessments are typically not published due to limited relevance to those outside of the program, limited time of evaluators and stakeholders to write an article or report, and alternative forms of communicating evaluability findings are prioritized, presenting challenges in identifying evaluability assessments. However, exploring the breadth and depth of evaluability assessment literature can provide an indirect but useful measure of evaluability assessment research and practice. This scoping review serves as one proxy indicator on the current state of evaluability assessment, with further empirical research required to substantiate findings.
Method
Scoping reviews are a type of knowledge synthesis that aims to identify research gaps and provide direction for future research. To identify evaluability assessment case studies, we used a scoping review approach involving a stepwise process of search, selection, extraction, and synthesis of the literature (Colquhoun et al., 2014).
Search Strategy
We searched for peer-reviewed articles on January 22, 2019, using the citation database Web of Science™ CORE Collection. This database was selected as it is one of the most comprehensive search engines available for retrieving peer-reviewed research from all scientific disciplines (Hightower & Caldwell, 2010). To ensure that evaluation literature was captured, we also searched the following 11 evaluation-focused journals selected based on the term “evaluation” in the journal title (listed alphabetically): African Evaluation Journal, American Journal of Program Evaluation, Canadian Journal of Program Evaluation, Evaluation, Evaluation and Program Planning, Evaluation & the Health Professions, Evaluation Journal of Australasia, Evaluation Review, Journal of MultiDisciplinary Evaluation, New Directions for Evaluation, and Research Evaluation. Of note, this list is not exhaustive. We used the following search string: “evaluability assessment” OR “pre-evaluation” OR “exploratory evaluation”. Exploratory evaluation, sometimes referred to as pre-evaluation, is a related idea to evaluability assessment that “occurs before the formal, official evaluation using quantitative instruments for summative purposes” (Patton, 1987, p. 36). Evaluability assessment has also been conceptualized as a type of exploratory evaluation approach (Wholey, 2015). All citations were imported into the web-based application DistillerSR© (Evidence Partners Incorporated, Ottawa, ON, Canada) for relevance screening.
We complemented the peer-reviewed literature search with a search for gray literature made available in the public domain. We inputted the search term “evaluability assessment” into Google Scholar on September 16, 2019. The first 200 sources were reviewed for inclusion in the review. As the search engine returned results based on relevance criteria related to the search term entered, every 100 sources were reviewed. This process was repeated until the 500th source. Notably, many relevant studies came from sources 0 to 200 (n = 20), followed by sources 200 to 300 (n = 3). No studies were found from sources 300 to 500. While Google Scholar is a useful resource for finding gray literature, it is not able to capture all studies, particularly from sources that do not enable Google Scholar to crawl their sites. All gray literature citations were imported into Excel spreadsheets for relevance screening.
Relevance Screening
The titles and abstracts/summaries of studies were screened according to a priori inclusion criteria. To be included, studies had to be English language articles, books, reports, theses, or dissertations; we excluded conference proceedings, abstracts, and briefs to ensure sufficient information was available for synthesis purposes. Second, studies had to report on a case (or cases) of evaluability assessment applied to a project, program, process, or strategy. In some situations, a full-text review was conducted in order to assess suitability. The reference lists of all relevant studies were hand-searched to identify any further studies not captured in the search.
Data Extraction and Synthesis
We developed a charting form to capture count data and descriptions of the study, evaluand, and evaluability approach (Levac et al., 2010). Study information extracted included the year of publication, study country, study format, and last name and affiliation of the first author. Descriptions of the evaluand extracted included evaluand type (i.e., project, program, process, or strategy), scale (i.e., local, regional, national), and disciplinary focus (e.g., health sciences, social sciences, education). Descriptions of the evaluability approach extracted included purpose, framework, methods, program theory, challenges, and limitations. Furthermore, in light of the global calls for the integration of social equity 1 into all evaluations (Fletcher, 2015; Lam et al., 2019; UNEG, 2016a; United Nations, 2015), we documented whether such considerations were made in the reviewed studies. We focus on gender 2 to examine how women, men, and gender diverse people may differentially experience programs. Importantly, gender cannot be separated from and intersects with race, sexuality, class, and other social identities (Crenshaw, 1991) and the systems that they live within. Achieving gender equity means closing key gaps between diverse groups of genders. We used descriptive statistics to present trends and characteristics of the study, evaluand, and evaluability approach.
Results
The search returned 1,009 studies; after removal of nonrelevant studies that did not meet the a priori inclusion criteria, and addition of studies from the hand-search, a total of 59 studies were included (see supplementary data for a complete list of included studies). Most studies were excluded because they did not describe a case study using evaluability assessment (Figure 1). Evaluability assessments mainly focused on programs (88%, n = 52), followed by plans (7%, n = 4), processes (3% n = 2), and projects (2%, n = 1). The scale of efforts included local (75%, n = 44), national (19%, n = 11), and provincial/regional (6%, n = 4). The format of studies included article (61%, n = 36), report (25%, n = 15), thesis (7%, n = 4), and dissertation (7%, n = 4).

Flowchart of the selection of published studies on evaluability assessment.
Unsteady Growth in Evaluability Assessments Over Time
Although the number of studies is small and the year-to-year variability is high, there appears to be an overall positive trend over time, with the number of studies more than tripling in 2018 compared to 2008 (Figure 2). Based on analysis of the affiliation of the lead authors, studies were authored by academics (66%, n = 39), followed by government organizations (19%, n = 11), nongovernment organizations (8%, n = 5), and consultants (7%, n = 4). Regarding field of study, most studies came from public health (44%, n = 26), followed by education (27%, n = 16), health services (10%, n = 6), and social services (7%, n = 4); within the past 5 years, there were also studies from criminal justice (7%, n = 4) and food security (5%, n = 3).

Number of evaluability assessment case studies published from 2008 to 2018.
Most Evaluability Assessments Were Conducted in the North American Context
The reviewed studies were conducted in 14 countries (Figure 3). Regionally, most studies were conducted in North America (69%, n = 41), in particular the United States (54%, n = 32). Research in Africa (15%, n = 9), Europe (12%, n = 7), South America (5%, n = 3), and Oceania (2%, n = 1) are limited but emerging, with most studies (82%) published within the past 5 years. Of note, numbers do not add up because studies may present cases from multiple countries (e.g., Holvoet et al., 2018). The peer-reviewed literature and gray literature contributed a similar profile of studies by country; one exception is that many studies (6/7) published in the European context came from the gray literature.

Geographic distribution of evaluability assessment case studies published from 2008 to 2018.
Few Studies Considered Social Equity
Among the reviewed evaluability assessments, only three peer-reviewed articles (5%) considered social equity to some extent. In the context of a driver retraining program in Canada (Joanisse et al., 2010), demographic data of participants (e.g., age, gender, class, and ethnicity) were reported. Of note, the differential programming experiences and outcomes by social identities were not reported on. Similarly, Schröter and colleagues (2012) provided gender-disaggregated data of participants but did not explore similarities or differences in gendered perspectives on the evaluability of the program. One evaluability study in the context of development cooperation explored whether gender analysis and social differentiation were integrated into the analysis of 40 interventions in Benin, the Democratic Republic of the Congo, Rwanda, and Belgium (Holvoet et al., 2018). Considerations of these aspects were scored according to the Development Assistance Committee criteria as “weak performance” due to several reasons such as the absence of gender analysis at the outset of an intervention.
Several Purposes Exist for an Evaluability Assessment
Most studies used document reviews (81%, n = 48) and interviews (78%, n = 46) to collect data for the purposes of evaluability assessment, suggesting that evaluability assessment is primarily a qualitative approach. In some cases, focus group discussions/meetings/workshops (30%, n = 18), literature reviews (20%, n = 12), observations (20%, n = 12), and surveys (17%, n = 10) were also used. Most studies developed the program theory (79%, n = 46); among these studies, 38 used logic models, while 8 used theories of change. From the studies reviewed, it is possible to identify at least four common reasons why an evaluability assessment is undertaken:
To assess the extent to which programs are ready for evaluation. Around 44% of studies (n = 26) aimed to determine evaluation readiness. Among these studies, more than half (n = 15) deemed the evaluand to be ready (e.g., Elliott & Zajac, 2015; Hilton & Jonas, 2017; Nilson et al., 2017). Due to weak linkages in program theory (Pangwa, 2016), challenges with past data collection (Watts, 2016), measurability issues of indicators (McKinney, 2010), and limited development of evaluation tools (Lu et al., 2017), a total of eight studies concluded the evaluand was not yet ready to be evaluated, highlighting the need for further program development first. Few studies (n = 3) expressed somewhat ready, explaining how certain aspects of an evaluand are evaluable (e.g., Prasad, 2017) or how certain types of evaluations (e.g., process vs. outcomes) are appropriate (e.g., Lu et al., 2017). To determine evaluability, studies provided narrative arguments (n = 22), and in a few cases, a checklist was used (n = 4).
To support evaluation planning. In total, 15 studies (25%) considered this objective as a rationale for the study (e.g., Akintobi et al., 2012; Belford et al., 2017; Schröter et al., 2012). For example, McAdams et al. (2017) reported undertaking an evaluability assessment to “inform the development of an evaluation framework” (p. 2). Willison et al. (2013) reported that funders were interested in some level of evaluation and that: “Evaluability assessment data collection must support more nuanced evaluation recommendations than Evaluate: ‘Yes or No’” (p. 7); as such, the authors sought to determine not only evaluability but also the level and type of evaluation.
To identify promising interventions for evaluation. A total of nine studies (15%) had this objective (e.g., DeGroff et al., 2015; Hester et al., 2013; Taylor et al., 2018). All of these studies referenced the systematic screening and assessment (SSA) method, a novel application of evaluability assessment (Leviton & Gutman, 2010). SSA can help not only to identify multiple programs for evaluation but also to identify shortcomings in program implementation (DeGroff et al., 2015). Such approaches were used in the context of an obesity prevention program (Barnes et al., 2010) and a food security program (Lundeen et al., 2017).
To support program development. A total of eight studies (14%) used evaluability assessment as a means to improve programming. Belford and colleagues (2017) reported conducting an evaluability assessment to “help develop programme and outcome objectives to improve programmes which they run and to assist in producing effective evaluations” (p. 1). In another example, Henson (2018) reported using the evaluability approach to explore programmatic complexities, leading to a “more precise understanding of outcome operationalization and encourages the democratization of research” (p. 3186).
In achieving these purposes, studies reported many outcomes of evaluability assessments, including the following: promotes dialogue among stakeholders and fosters a shared understanding of evaluation (Taylor et al., 2018), identifies gaps and recommendations for program improvement (Peterson et al., 2018), identifies programs most likely to produce useful results (Honeycutt et al., 2017), identifies possible approaches for evaluation (Mcadams et al., 2017), encourages evaluative thinking (Lu et al., 2017), provides preliminary evidence of program success (Zint et al., 2011), improves the use of evaluations (Sanou et al., 2011), builds enthusiasm for evaluation (Joanisse et al., 2010), and builds relationships with evaluation stakeholders (Henson, 2018). The final output of an evaluability assessment may be a rich description of the program, an overview of evaluation readiness, an evaluation plan, or simply a conversation with the program staff and evaluation users about refining the program’s design, implementation, or evaluation.
A Plurality of Frameworks Exist to Guide an Evaluability Assessment
Studies referenced many different frameworks to guide assessments, particularly frameworks published in the peer-reviewed literature. Most studies adopted the revised 6-step model of Wholey (2004, 2010, 2015), the SSA model of Leviton and colleagues (2010), the original 8-step model of Wholey (1979, 1987, 1994), the 6-step model of Thurston and Potvin (2003), or the 10-step model of Smith (1989; Table 1). Several studies (n = 5) cited different frameworks published in the gray literature. A quarter of studies (25%, n = 15) did not reference a specific evaluability assessment framework; rather, they reported using common steps and/or methods of an evaluability assessment. Evaluability assessment models are often associated with a specific field, such as government (e.g., Wholey, 2015), public health (e.g., Leviton et al., 2010), social change (e.g., Thurston & Potvin, 2003), and agriculture (e.g., Smith, 1989). While there is no single accepted model, most include several core elements: identifying and engaging stakeholders, developing and assessing the program theory, planning and making recommendations for evaluation, and reporting on and using the findings.
Common Frameworks and Associated Components Commonly Referenced in Case Studies of Evaluability Assessment From 2008 to 2018.
Identifying and engaging stakeholders is becoming an important aspect of evaluability assessment. Henson (2018) suggested: “Researchers must be transparent with stakeholders about their data collection and analytic processes and establish a collaborative and constant feedback loop that develops a shared language between researchers and stakeholders” (p. 3197); the author added that conducting an evaluability assessment before an outcome evaluation helps to create a healthy relationship with program stakeholders. Honeycutt and colleagues (2017) reflected: “Our initiative was built on lasting relationships with community organizations; it involved multiple stakeholder meetings and onsite visits to learn about the programs” (p. 460). They also emphasized the importance of involving the organization’s leadership, program staff, and partners in assessing readiness for evaluation, identifying evaluation questions, and disseminating results.
According to many frameworks, steps in an evaluability assessment are not linear but iterative, requiring the evaluability assessment team to engage with each step in a reflective way and, where necessary, repeat steps to ensure ongoing development of the program’s design, implementation, and evaluation. Adaptation and flexibility are important features; if, for instance, there is a difference in the program model with what is occurring in practice, infeasibility of actions leading to outcomes, or insufficient resources to carry out activities, the evaluability team will need to decide whether improvements to the program are necessary before conducting an evaluation (Smith, 1989; Wholey, 2015). Changes may include reconsidering intended outcomes that appear unrealistic, focusing efforts on activities to support outcomes, allocating resources to efforts, and so on. Finally, commitment to transparency in all aspects of the evaluability study is needed; the decisions made, lessons learned, and the context in which the evaluability assessment was carried out should be well-documented.
Reported Challenges and Limitations
The most frequent challenge reported in the studies was the political nature of deciding whether a program is ready for evaluation, leading to tensions between stakeholders involved. As explained by Laperrière and colleagues (2012): “Programming, evaluation, and evaluability assessment are grounded in social and institutional authority structures within particular socio-political systems that inevitably influence the actors involved and their practices” (p. 256). Holvoet and colleagues (2018) experienced resistance toward the evaluability assessment due to confusion around the notion of evaluability, the exact purpose of the study, and implications of a poor evaluability score, as had occurred in the context of development cooperation. To alleviate resistance, the authors provided an additional explanatory note on the context of evaluability and adjusted the initial study aim toward a more formative use of the study. Similarly, Gilchrist (2014) expressed that stakeholders may be intimidated by the concept of evaluability and emphasized the importance of communicating the purpose and focus of the study. Another potential limitation of the evaluability assessment is that stakeholders “may not feel comfortable saying anything negative about the program” (Fitzpatrick, 2015, p. 93), highlighting the importance of relationship- and trust-building.
Challenges were reported in determining the required level of depth for an evaluability assessment, understanding what should be included in an evaluability assessment without ending up doing an evaluation, and drawing the line between evaluability and evaluation (D’Ostie-Racine et al., 2013; Holvoet et al., 2018). Belford and colleagues (2017) offered that an evaluability assessment is just one step in the evaluation process and that a full evaluation needs to “build on the evaluability assessment and continue the iterative process of consulting stakeholders and refining the program theory” (p. 9). Other articles have crossed the boundary between evaluability assessment and evaluation; for example, Losby and colleagues (2015) combined merits of evaluability assessment, SSA, and effectiveness evaluation to identify and evaluate promising public health efforts. And in the context of a knowledge translation strategy, Dagenais and colleagues (2017) found that: “The present evaluability assessment has established it is possible to evaluate the [knowledge translation] strategy; however, we have also concluded that a more detailed evaluation is not needed to obtain a clear picture of the outcomes of this strategy” (p. 57).
Discussion
Due to growing interest in evaluability assessment, it is important to understand and learn from how the approach is being practiced. We systematically reviewed the evaluation literature and found 59 relevant studies published from 2008 to 2018 (amounting to around five articles per year). This finding is more than double the 23 studies identified by Trevisan (2007) over the prior 20-year period (1986–2006), suggesting an overall growth in publications on evaluability assessments. However, the evidence base remains limited; indeed, a systematic review of published research on evaluation methodologies found only 20% of evaluations focused on evaluability (Galport & Galport, 2015). Several reasons may help to explain the sparse use of, and reporting on, evaluability assessments, including a lack of a clear evaluability assessment methodology (Trevisan, 2007), difficulties in distinguishing between evaluability and other evaluation activities (Holvoet et al., 2018), and many assessments often go unreported (Smith, 2005).
Evaluability assessments first became popular in the 1980s in the health field, after they were promoted by Joseph Wholey and colleagues at the Urban Institute in the United States (Wholey, 1979). In our review, we found that published evaluability assessments are increasingly applied in fields outside of health to include social sciences, criminal justice, and food security. Moreover, such assessments are being used beyond the North American context to include Africa, Europe, and South America, with scholars increasingly highlighting the utility of evaluability assessments in limited resource contexts (Akintobi et al., 2012; Salvatierra da Silva et al., 2016). These findings suggest that the scope of evaluability assessments has evolved in terms of disciplinary and geographic reach; however, more case studies of evaluability assessment in other disciplines and geographies are important for demonstrating its applicability, utility, and generalizability. Perhaps more cost-effective than conducting a formal evaluability assessment is to integrate the tenets of evaluability assessment into program monitoring (Peersman et al., 2015). Scholars working in the North American context can also make their work more useful by giving more attention to the contextual variables that affect their assessments; understanding the conditions that facilitate evaluability assessment processes supports those working to extend the approach to understudied locations or disciplines.
Few studies considered social equity in the evaluability assessment. In cases where considerations were made, only gender and social differentiated data of participants were presented without reporting on differentiated perspectives on the evaluability of the program. We see this as an important gap in evaluability assessments. Inadequate gender and social equity analysis risks to leave unexplored evidence of, and reasons for differential outcomes, thus undermining evaluation quality and social justice agendas. Like United Nations Women (2015), we believe: “All evaluability assessments should examine if human rights and gender equality are integrated into an intervention, regardless of whether or not the intervention is targeting these issues” (p. 124). This means asking whether gender and social differentiated data are available, whether attention was given to social equity in the program theory, and whether the context (e.g., social, political, cultural) in which the program is situated is conducive to the advancement of social equity (UN Women, 2015; UNEG, 2014).
Evaluability assessment typically involves a qualitative assessment of program readiness for evaluation. Among the 26 studies that included this objective, 8 studies deemed the program not ready for an evaluation. Inadequate evaluation readiness was generally related to a number of challenges in the area of program theory, evaluation planning, and data availability. As such, investing in an evaluability assessment provides an opportunity for reflecting, learning, and adapting before conducting a full-scale, time- and resource-intensive evaluation. Indeed, the scope of evaluability assessment purposes includes not only assessing evaluation readiness but also identifying promising interventions, designing an evaluation plan, and developing a program. Generally speaking, these four purposes suggest two different ways of thinking about an evaluability assessment: The first two suggest that evaluability assessment might be perceived as a one-time assessment, the ultimate aim of which is to determine whether the program proceeds to an evaluation. The second two suggest that evaluability assessment might be conceived as an ongoing process of improving a program’s design, implementation, and evaluation. The latter nuanced aim of improving programs and evaluations is particularly important for enabling conditions in place for a sound evaluation and may or may not lead to a full evaluation.
Several indicators point to improved quality of evaluability assessments. For example, 79% of studies reported developing the program theory compared to around 50% of studies reviewed in Trevisan (2007). Of note, the SSA approach does not require the development of program theories; indeed, all nine studies applying the SSA approach did not report developing a program theory. It is important for studies applying SSA to not only assess whether programs have a clear theory/logic model but also to develop one; doing so will contribute to middle-range theories that inform strategies for implementing programs in different contexts (Douthwaite & Hoffecker, 2017). Like other scholars (Davies & Payne, 2015; Holvoet et al., 2018; Watts, 2016), we found there is a plurality of methodological frameworks for evaluability assessment, with a lot of overlap between components of frameworks. Trevisan (2007) recommended that evaluability assessment studies “should be clear about the [evaluability assessment] model or framework and deliberately connect data collected with the steps in the model” (p. 300). This recommendation remains relevant as we found most studies were explicit about the evaluability framework used (75%) but often did not distinguish the data collected from the different steps in the framework. That some studies did not specify an evaluability assessment model or steps used might suggest the flexible application of the approach, considered desirable for supporting adaptation to different contexts. Yet, continued reporting and reflecting on the models and steps applied is also important for understanding which aspects of the approach are used and useful.
Our review identified challenges and limitations to assessing evaluability that are currently not fully addressed in previous reviews. These include politics of evaluability and blurred lines between evaluability and evaluation. The politics of evaluability is perhaps linked to evaluability’s common definition of “extent to which an activity or a program can be evaluated in a reliable and credible fashion” (Organisation for Economic Co-operation and Development, 2010, p. 21). The reported challenges and limitations identified in our review suggest this definition might not be appropriate given the tension inherent in determining evaluability as well as confusion with the concept of evaluability. According to Wholey (2015): “Evaluability assessment answers the question of whether a program is ready for useful evaluation, not whether the program can be evaluated (any program can be evaluated)” (emphasis added, p. 90). A growing number of studies are conducting evaluability assessments primarily for formative purposes. Trevisan (2007) also found that formative purposes are common secondary objectives in evaluability assessments. These findings suggest that evaluability assessment might be better accepted if it is conceptualized, described, and applied as an activity to prepare programs for evaluation or plan for evaluation rather than to determine evaluability. Evaluators should also consider the political context for the evaluation and examine how political factors may affect use. Utilization-focused evaluation acknowledges that evaluation use is people- and context-dependent and offers a wide range of methods and processes for working with intended users to determine priority uses with attention to political considerations (Patton, 2003).
Consistent with findings from Trevisan (2007), we find a lack of clarity between evaluability and evaluation. Based on our review, evaluability assessment is considered distinct from evaluation. Trevisan highlighted the importance of emphasizing that evaluability assessment is very different from other formative research strategies (e.g., formative evaluation, process evaluation, needs assessment). We echo this recommendation and encourage evaluability assessment case studies and guidance documents on evaluability assessment to clarify the boundaries of evaluability assessment. While evaluability assessment can serve formative purposes, evaluability assessment is not evaluation; it does not, for example, serve outcome, impact, or summative evaluation purposes (Walser, 2015). It is important for stakeholders to understand this when initiating an evaluability assessment and disseminating evaluability assessment findings.
An important limitation of this review is that studies were analyzed based on information presented in the study. We note that studies themselves may not fully report on the evaluability process. To illustrate, Louw et al. (2008) state in regard to their methods: “Space does not allow for an examination of all these steps in this paper” (p. 43). As such, the authors “concentrated on the outcome evaluation component because we believe there are some interesting procedures and findings to report” (p. 43). Yet reporting on the evaluability assessment process and lessons learned is important for strengthening the body of literature on practical applications of evaluability assessment. Given that evaluability assessments tend not to be published, this review is also unable to comment on the extent of evaluability assessments conducted. To capture unpublished evaluability assessments, one reviewer for this article suggested interviewing funders; then, interviewing the evaluators who conducted the evaluability assessments to further understand evaluability challenges and opportunities. Nevertheless, our review represents a proxy indicator of evaluability assessments and identifies some important challenges and limitations that, if addressed, have the potential to advance the use and usefulness of evaluability assessments.
Conclusion
This article explores how case studies of evaluability assessment have been conducted over the last decade (2008–2018), motivated by the increase in attention toward the evaluability process. When viewed as a whole, our findings reveal the variety of ways in which the evaluation community is reporting on evaluability assessments. We find, first, that the evidence base of evaluability assessment practice is likely small but growing in terms of geography and sector. Second, considerations of social equity in evaluability assessments are not yet adequate, presenting opportunities for further engagement in these areas. Finally, while a plurality of frameworks is available to guide evaluability assessments, there exist challenges and limitations relating to the politics of evaluability and how evaluability differs from evaluation. To address these shortcomings, we offer six recommendations to complement existing frameworks and guide researchers, evaluators, and partners in implementing evaluability assessments.
Stakeholder engagement. Frameworks (e.g., Thurston & Potvin, 2003; Wholey, 2015) and case studies (e.g., Henson, 2018; Honeycutt et al., 2017) highlight the importance of identifying which evaluation users will be involved and at which points. The implications of leaving out evaluation users or engaging them in a less participatory manner should also be discussed (Thurston & Potvin, 2003). Stakeholder involvement in an evaluability assessment might involve not only including stakeholders as data sources but also as participants in the research process (Walser, 2015). It is also important to consider why, in what role, and by what forms of engagement stakeholders will be involved.
Discuss the evaluability study with stakeholders. Evaluation users can have varying understandings, expectations, and concerns regarding the study (D’Ostie-Racine et al., 2013; Holvoet et al., 2018). Through early discussions surrounding the purposes, processes, strengths, and limitations of the evaluability study, evaluation users develop a shared understanding of the evaluability study. Furthermore, trust, relationship building, and participation are facilitated.
Establish a shared understanding of the program. Before assessing the program, it is important to develop a shared understanding of the program. Developing and documenting the program theory can facilitate this shared understanding (e.g. via a Theory of Change process) (Leviton et al., 2010; Smith, 1989; Thurston & Potvin, 2003; Wholey, 2015). This process requires engaging stakeholders (as appropriate) in the reflection of the program and whether the program theory matches what happens in practice.
Assess plausibility of the program. Plausibility is concerned with whether a program is expected to achieve its intended outcomes in theory (Rog, 2010; Wholey, 2015). Plausibility asks questions such as: to what extent does a logical flow exist between activities and outcomes? Is the level of activities commensurate with expected outcomes? Is the time frame realistic? Evidence to support such assessments may include empirical studies from the published literature, experiences of stakeholders, and social science theory.
Assess practicality of the program. Practicality refers to whether outcomes are possible given the context and appropriateness of the project design (Davies, 2013). It asks questions such as: Is the context (e.g., social, political, environmental) conducive to programming efforts? Are adequate resources (e.g., time, funds, competencies) available to support the achievement of intended outcomes?
Consider cross-cutting themes of the program. Cross-cutting themes are important additional areas that intersect with the program, such as gender equity, environmental sustainability, climate change, and capacity development (United Nations Women, 2015; United Nations Development Programme, 2010; United Nations, 2015). The assessment might consider these aspects by asking whether gender and social differentiated data are available, whether attention was given to social equity in the program theory, and whether the context (e.g., social, political, cultural) in which the program is situated is conducive to the advancement of social equity.
As emphasized by UNEG: “An assessment of evaluability should be undertaken as an initial step to increase the likelihood that an evaluation will provide timely and credible information for decision-making” (p. 22). The growing interest in evaluability assessments makes it crucial to research and apply frameworks for quality assurance of evaluability assessments. Moreover, future research exploring the perspectives of evaluators, funders, and evaluation commissioners on evaluability assessments would further illuminate the barriers and facilitators to its use. Evaluability assessment can be a critical first step in preparing future evaluations and preparing us to address key issues of our time. Evaluability assessments deserve wider application, particularly in contexts of early evaluation planning and program development.
Supplemental Material
Supplemental Material, sj-docx-1-aje-10.1177_1098214020936769 - The Use of Evaluability Assessments in Improving Future Evaluations: A Scoping Review of 10 Years of Literature (2008–2018)
Supplemental Material, sj-docx-1-aje-10.1177_1098214020936769 for The Use of Evaluability Assessments in Improving Future Evaluations: A Scoping Review of 10 Years of Literature (2008–2018) by Steven Lam and Kelly Skinner in American Journal of Evaluation
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding contributions were provided by the Graduate Student Excellence Entrance Scholarship (to SL) and the Ontario Veterinary College PhD Scholarship (to SL).
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
