Abstract
The evaluation models described in the literature may be interpreted as prescriptive and uniform approaches to practice but, in the real world, practitioners are likely to blend aspects of different models to achieve multiple goals. Despite the commonality of pluralistic approaches in evaluation practice, literature on theoretical integration is sparse. This article guides readers through a theoretically integrative evaluation design process and explicates how different theories informed design decisions. The process integrates program theory–driven and utilization-focused evaluation with evaluability assessment and eclectically draws on principles, methods, and tools from other models. This integrative approach to evaluation aims to increase process use for intended users through shared decision-making, organizational learning, and capacity building while simultaneously producing a robust and relevant evaluation design suited to stakeholder needs and the evaluation context. The authors describe the process utilizing a case example to contribute to the literature on theoretically integrative evaluation practice.
Keywords
Introduction
This article outlines a theoretically integrative and staged evaluation design process grounded in program theory-driven and utilization-focused evaluation, and situated within a broader evaluability assessment framework. Aligning with a critical realist perspective, the approach also draws on principles, methods, and tools derived from seemingly discordant post-positivist and constructivist paradigms to give voice and increase process use for intended users through collaborative evaluative thinking, shared decision-making, and knowledge generation for organizational learning. The approach simultaneously aims to produce a robust and relevant evaluation design that enables empirical assessment of the program theory as part of outcome-focused evaluation. We offer the process to readers as one approach to evaluation design that we have found useful in achieving the abovementioned aims and describe the process as used with a case example to contribute to the scarce literature on theoretically integrative evaluation practice.
The evaluation theory to practice gap
The evaluation theory to practice gap has been a long-standing concern of evaluation theorists. For some time, methods-driven studies and practice examples that only loosely linked to evaluation theory dominated the field. This prompted calls for more examples of how to enact evaluation theory in the real world (Christie, 2003; Donaldson, 2007; Gargani, 2013; Shadish, 1998; Weiss, 1997). Schwandt (2014) argues that the ill-structured and imprecise nature of practical evaluation problems and the way in which they demand a response in situ makes the application of theory in practice a complicated endeavor. Referring to a wide range of “concepts, insights, explanations and tools that professional practitioners can use as heuristics, tools to ‘think with’” (Schwandt, 2014: 234), evaluation theory is derived from bodies of knowledge on approaches to framing, implementing, and using evaluations, as well as program theory; political, philosophical, and social psychological theory; and methodological theory that inform different evaluation approaches. Thus, the breadth and looseness of what we conceptualize as evaluation theory further muddies the practice waters. The many evaluation models advanced by theorists can also obfuscate the complexities associated with applying theory in practice; models may be construed as uniform prescriptions to evaluation practice from which practitioners should choose on the basis of purpose, context, and values. In reality, few evaluators appear to practice in accordance with one particular theory or approach (Christie, 2003). Instead, many are theoretically eclectic (Bledsoe and Graham, 2005; Christie, 2003; Gargani, 2013) using what they feel are the most suitable “think tools” drawn from multiple theories to respond to the situation at hand.
Theoretical pluralism in evaluation practice
The observation that using multiple evaluation approaches is common practice but not oft discussed in the evaluation literature prompted Bledsoe and Graham (2005) to publish a rare example of a case that integrated empowerment, theory-driven, consumer-oriented, and inclusive evaluation approaches. The authors argued that their pluralistic approach enabled them to conduct an evaluation that was more useful and robust than it would have been had they focused on an approach arising from a single theory. Questioning whether “proper use” of an evaluation theory requires purist adherence, Gargani (2013) cautions that we have little evidence upon which to argue that theoretical “mash ups” (Gargani, 2013: 83) are problematic in evaluation practice and suggests that the field could benefit from learning more about practitioner experiences of using pluralistic approaches. Although not without its critics, the popularity of theoretically integrative and eclectic practice in psychotherapy has stimulated a rich body of research over the past 40 years to inform psychotherapeutic practice (Norcross, 2005; Norcross et al., 2005). Given its ubiquity in evaluation practice, it is time for greater research attention to theoretically integrative and eclectic approaches in our transdisciplinary field. We aim to contribute to this literature by outlining an example of our own theoretically integrative evaluation practice.
This article begins with a discussion of the theoretical foundations of our evaluation design approach, including the rationale for theoretical integration. We then describe each stage of the process, which is structured as staged workshops. We outline the aims and activities as well as the principles, methods, and tools used from different evaluation models as they relate to each stage. To provide a concrete understanding of our approach, we weave in a description of its application using an illustrative case. However, as our focus in this article is on the evaluation design process, we omit details of the case study outputs and findings. We conclude with consideration of the value and limitations of using this integrative and eclectic approach.
The theoretical foundations of our integrative approach
Within the contemporary field of psychotherapy, theoretical integration is distinguished from eclecticism. The integrationist stays cognizant of the theoretical origins of the strategies used and purposefully blends different theoretical approaches, whereas the eclectic therapist is more pragmatic in orientation and uses a combination of techniques or tools to address client needs without taking into account the theoretical perspectives from which they are derived. Nevertheless, both integrative and eclectic therapists argue that strict adherence to a single approach has limitations. Clients are better served when therapists draw from different schools of thought to provide support that more adequately addresses client needs (Norcross, 2005).
The approach we present here is both integrative and eclectic as we argue that evaluation clients are better served through a combination of theories, techniques, and tools. In terms of integration, the lead evaluator (first author) designed the process predominantly as an integration of program theory-driven (Donaldson, 2003, 2007) and utilization-focused evaluation (Patton, 2012) because she was interested in using program theory to provide evaluative direction and enhancing use by intended users. Given a shared focus of our process and that of evaluability assessment is evaluation planning, where program theory development is a key feature (Thurston and Potvin, 2003; Trevisan and Walser, 2015; Wholey, 1987, 2010), we also situate the process within a broader evaluability assessment framework. With all three approaches emphasizing stakeholder involvement, evaluation questions derived from stakeholder input, and method neutrality, we found that they easily dovetail and provide a coherent integrated approach. The eclectic aspects of our approach stem from our use of principles, tools, and techniques from perspectives not central to our own theoretical orientations and without great attention to their theoretical origins. We elaborate on each of these facets below.
The program theory–driven evaluation thread
As the essential feature of theory-driven evaluation, program theory development enables examination of how programs produce outcomes (Astbury and Leeuw, 2010; Donaldson, 2007; Rogers, 2007; Weiss, 1997). Often, outcome evaluations are designed to assess stated objectives (commonly derived from funder interests) that are unrealistic or unlikely to occur until the distant future. Using program theory to guide relevant and realistic evaluation questions and associated measures can improve the sensitivity of the evaluation design and reduce the chance of obtaining null effects due to failure to account for other extraneous sources of variance (Cook, 2000; Donaldson, 2003, 2007). Moreover, by attempting to unearth and empirically investigate common mechanisms and moderators of change in social programming, theory-driven approaches can contribute to cumulative knowledge development and thereby enhance use through a broader enlightenment function (Weiss, 1998). In addition to producing relevant and sensitive evaluation designs for our clients, as academic evaluators, we are interested in investigating the hows, whys, and whens of program effectiveness in a manner that can inform practice beyond a single localized program. These interests are inherently aligned with a theory-driven approach. Accordingly, many of the techniques and tools incorporated throughout the process come from a program theory–driven evaluation tradition (see Table 1).
Overview of workshop stages illustrating a theoretically eclectic approach.
In agreement with criticisms that the simplistic linear accounts of inputs, activities, outputs, and outcomes often seen in logic modeling do little with respect to providing an explanatory account of how programs produce change, the process presented in this article incorporates the theory of change approach advanced by Weiss (1997) and Rogers (2007). Their approach captures aspects of programmatic and implementation theory. The programmatic theory specifies the program mechanisms presumed to drive change (i.e. mediators). The implementation theory focuses on whether the program was implemented as intended and thus includes moderators that influence implementation fidelity (Weiss, 1997). In contrast to theory-driven approaches that are initially derived from existing social science theory and have been criticized for poor feasibility and relevance to grassroots program development, our approach aligns strongly with Donaldson’s (2003, 2007) program theory-driven evaluation science. In his approach, stakeholder views of how the program produces change are prioritized and other sources of information, such as social science theory and research, can contribute to assessing the plausibility of the proposed theory. A refined program theory helps with the identification and prioritization of relevant questions to be answered through robust, systematic research, whether quantitative, qualitative, or mixed (Donaldson, 2007).
The utilization-focused evaluation thread
While utilization-focused evaluation often involves program theory development, its distinguishing feature is its focus on enhancing use by intended users at all stages of the process (Patton, 2012). Evaluation use is increased when those who have the “personal factor” (Patton, 2012: 62; that is, a genuine personal interest in the process and the findings) are involved. Accordingly, these individuals become the primary intended users who help focus and design the evaluation in a manner that is situationally responsive and addresses relevant information needs. Once identified, utilization-focused evaluators work with primary intended users to better understand the evaluation context, determine priorities, focus evaluation questions, design a theory of change, decide on appropriate methods, interpret findings, and develop accessible dissemination strategies and outputs. In addition to increasing instrumental use, a corollary of this level of intended user involvement is process use: the learning gained from being engaged in the evaluation process and the changes that occur in individuals, programs, and organizations as a result (Patton, 2012).
The strong utilization-focused flavor of our approach stemmed from drivers within our evaluation context. There is a critical need for evaluation capacity building within the social and community sector in New Zealand; yet, managers, program developers, and practitioners who are working in these sectors are significantly over-stretched with respect to time and resources (Bullen, Deane, Meissel and Bhatnagar, 2019). At the same time, these individuals recognize the importance of and are interested in evaluation and developing their understanding of effective evaluation practice; thus, they have the personal factor. They are also the connectors to other upstream and downstream stakeholders but our personal experiences working in this space indicate that their limited time and resources substantially restricts their ability to assist us with involving a diverse and representative range of stakeholders in the evaluation process, as you would see with a deliberative democratic evaluation approach, for instance (see House and Howe, 2000).
Being situationally responsive in our context has therefore led us to focus our attention on a small group of intended users, typically consisting of senior organizational leaders, program developers, practitioners, and, if possible, direct beneficiaries. We also focus on “intelligent involvement” (Davidson and Chianca, 2016)—bringing them in when and as needed based on their expertise. We do so in a way that intends to build process use, in accordance with a utilization-focused approach (Patton, 2012).
Evaluability assessment as the broad framework
Evaluability assessment shares a great deal with theory-driven and utilization-focused evaluation. Program logic or theory development is a central feature as is the involvement of stakeholders to increase the evaluation’s utility. What distinguishes evaluability assessment from other approaches is its emphasis on evaluation planning. Through careful planning with stakeholders, potential problems are diagnosed in advance to prevent misuse of time and resources in future evaluative activities (Thurston and Potvin, 2003; Trevisan and Walser, 2015; Wholey, 1987). Wholey (1987) explains that an evaluation’s feasibility and utility are undermined when the program is poorly defined, there is a lack of testable assumptions regarding program theory links, stakeholders have not agreed on evaluation priorities or intended uses, or there is resistance to act on evaluation information. Evaluability assessment addresses these issues directly by involving stakeholders in program theory development, reviewing its implementation, establishing priorities, and agreeing on how to proceed in the face of the information gathered through the process. By guiding evaluative decisions and future directions, evaluability assessment increases an evaluation’s relevance and feasibility, and, therefore, the likelihood that evaluation findings will be used for program improvement (Wholey, 1987).
Although evaluability assessment has previously been framed as a pre-evaluation activity geared to determine if a program is ready for outcome measurement or testing of program theory links (Wholey, 1987), contemporary evaluators argue that evaluability assessment has earned its rightful place as an evaluation approach in and of itself. As a feedback process designed to immediately inform program improvement, evaluability assessment serves an important formative evaluation function and has evolved into an approach that can be usefully applied at all program stages (Thurston and Potvin, 2003; Trevisan and Walser, 2015).
We situate the process described in this article within an overarching evaluability assessment framework because as a design process it is inherently about planning, and through the design process, we surface tensions, discrepancies, and anomalies that could impact feasibility and utility. This process stops short of theory testing and rather provides a platform from which further program development or evaluation can proceed. As with evaluability assessment, decisions about how to proceed follows interrogation of the program context, theory of change, supporting theoretical or empirical research, and methodological considerations with intended users during which potential problems can be diagnosed.
The eclectic aspects
As signaled earlier, we have also borrowed a number of principles, techniques, and tools from other approaches that we have found useful for managing interpersonal dynamics, increasing evaluative thinking, and improving the evaluation design. However, we have done so in a way that is disconnected from their evaluation models, making our approach eclectic as well as integrative. For instance, we incorporate strategies from Fetterman’s (2003) empowerment evaluation because he offers valuable strategies for facilitating group dynamics and thinking critically about evaluative judgments, but we have strayed from the overarching principles of empowerment evaluation. With empowerment evaluation, the evaluator takes the role of “critical friend” and facilitator that supports stakeholders to direct and implement all stages of the evaluation (Fetterman, 2003). We have not fully embraced empowerment evaluation as a model in part because of the evaluation context we operate in as detailed above. A resource-constrained climate can create a paradox with respect to the traditional approach to empowerment evaluation—extensive involvement can become disempowering because it does not align with stakeholder needs (Deane and Harré, 2016). In addition, critics of empowerment evaluation emphasize that relinquishing all control to stakeholders jeopardizes the credibility of the evaluative conclusions as these likely reflect self-interested biases (Stufflebeam, 1994); thus, in some stages of our approach, the evaluation team works independently.
Readers may also recognize the synergy of the program theory-driven evaluation science aspects of our approach with core tenets of realistic evaluation (Pawson and Tilley, 1997) given a primary focus is to develop a program theory that provides an explanatory account of how a program works by uncovering the program mechanisms of change and surfacing the contextual conditions that influence the likelihood that the mechanisms will work as expected. We also incorporate a direct logic analysis which, like realistic evaluation (Pawson and Tilley, 1997), comes from a post-positivist perspective that does not fit well with constructivist methodology because scientific evidence is prioritized over the experiences of those involved in the program (Brousselle and Champagne, 2011). Our approach differs in this respect and eclectically draws on fourth-generation evaluation principles that are founded on constructivism. We argue for a need to value stakeholder experiences alongside existing established evidence and to carefully consider the reason for major discrepancies between the two when they arise. We do not presume it is possible to generate empirical evidence from a value-free position. In alignment with a constructivist approach, responsive focusing—a foundational principle of fourth-generation evaluation (Guba and Lincoln, 1989)—through which the evaluator orchestrates a collaborative process that enables deep consideration and negotiation of different stakeholders’ values and perspectives is central to our evaluation design approach.
We agree with Preskill and Torres (2000) that evaluation approaches that incorporate collaborative reflection and dialogue are more likely to produce transformational organizational and individual learning (i.e. process use) than when collaborative processes are circumvented or eschewed. Our aim is to capitalize on the widespread knowledge of the various individuals in attendance to design a useful and credible evaluation design, thus the workshop sessions in this process are interactive. However, we also endorse the need to consider the power dynamics that exist between stakeholders and to be transparent about the perspectives that are privileged and marginalized within the evaluation process. Accordingly, decisions on how to proceed in the face of strong divergence across perspectives (whether across stakeholders or with the evidence base) should be made in consultation with stakeholders but in consideration of the power dynamics at play, and grounded in a strong rationale. If bringing diverse stakeholder groups together for this purpose is seen to be too contentious or may exacerbate power differences across stakeholder groups, we suggest delivering parallel processes with the different groups, synthesizing the feedback independently, and disseminating the collective findings to all those involved for further consideration, as seen in fourth-generation evaluation (Guba and Lincoln, 1989).
Our epistemological position
The seemingly discordant and eclectic bridging of post-positivist with constructivist features reflect the position of the first author as a critical realist in designing this process. Critical realists assert that an objective reality exists outside human perceptions. While this reality can never be determined with full accuracy, we can approximate it by the identifying common mechanisms and conditioning influences that produce patterned effects. Although acknowledged to fit within a post-positivist position, the critical realist also appreciates that there may be commonalities in the perceptions and experiences of this reality across individuals but these are likely to differ vastly due to varied social positions. This gives rise to different socially constructed realities that are equally important to understand (Houston, 2010), particularly in the deeply political arena of evaluation practice. In this sense, collaborative interpretation of the evaluand only ever reflects a form of situated realism wherein aspects change according to the context and the players involved. We revisit the implications of this epistemological positioning after an explanation of each stage of the process as implemented with a case example.
Overview of the staged process with a case example
In addition to illustrating the theoretical integration and technical eclecticism of the process, Table 1 presents an overview of the workshop stages with the central guiding questions, session activities, and aims associated with each stage, which we describe in detail below in relation to the case example. In keeping with a responsive focus, we have designed the process to be staged and flexible. An initial consultation with the intended users should determine at which stage to begin to ensure best use of their time and to avoid potential replication of evaluative activities that have already been conducted. The stages can be delivered within a short but intensive period or stretched out over several months to accommodate needs and schedules.
The case described here involved a partnership with the Auckland-based Great Potentials Foundation to design an evaluation for their MATES Junior (Mentoring and Tutoring Education Scheme) mentoring program. MATES Junior targets middle school students (11–12 years of age) at risk of underachievement and disengagement from education at a critical period of school transition. The program matches students with university student mentors who provide support and guidance as they prepare to transition from middle to high school.
Stage 1: Prioritizing evaluation stakeholders and identifying their information needs
The purpose of the Stage 1 is to make decisions that will allow the intended users to focus the scope of the evaluation and tailor it to their priority evaluation stakeholders’ information needs. On one hand, all evaluations are constrained by time and resources that force decisions about certain courses of action which inevitably preclude addressing other important but less pressing issues. Different stakeholders may have different evaluation interests; therefore, prior to making these decisions, we need to assess the big picture of who (individuals or organizations) influences and who is influenced by the program and how a focus on one group’s priorities may detract from another stakeholder group’s needs. On the other hand, due to the interrelationships within the evaluation system, it is possible to focus priorities on questions that will address multiple stakeholder needs simultaneously if multiple stakeholder groups share similar evaluation questions and interests. Evaluators need an understanding of the evaluation context to facilitate well-informed decisions about how to focus the evaluation and understanding the evaluation context with respect to stakeholder relationships and priorities is a feature of utilization-focused evaluation (Patton, 2012), evaluability assessment (Thurston and Potvin, 2003; Trevisan and Walser, 2015), and program theory development (Gugiu and Rodriguéz-Campos, 2007). Organizational decision-makers and individuals closely connected to evaluation stakeholders are best placed to contribute at this stage.
In the case of MATES Junior, three senior leaders within the organization participated. The decision to restrict participation was largely due to time constraints and difficulty in finding a mutually convenient time for a large group. The first workshop stage consisted of two interactive sessions designed to identify some of the key design characteristics needed for a relevant evaluation of the program.
Session 1: Mapping and prioritizing evaluation stakeholder relationships
Session 1 focused on mapping evaluation stakeholder relationships. Creating a diagram of stakeholder relationships enables workshop participants to analyze existing relationships and make informed decisions about where to direct attention based on the importance of maintaining or strengthening some relationships over others, while also considering any potential effects these decisions may have on the other relationships. This also allows participants to make decisions about the scope and type of evaluation evidence needed based on the priority evaluation stakeholders’ information needs.
We first asked the participants to brainstorm a list of any stakeholders currently or potentially interested in an evaluation of MATES Junior. Next, they considered if stakeholder relationships were primary or peripheral, strong or weak, and positive or negative. We then created a stakeholder map depicting relationship categories and relative positioning by relationship strength.
Drawing on Fetterman’s (2003) strategies for conducting democratic, group-based decisions, once we created the stakeholder map, each participant voted on their priority evaluation stakeholders. Their votes clearly pointed to the desire for an outcome evaluation to satisfy the information needs of current and future program funders as a first priority.
Session 2: Identifying the evaluation audiences’ information needs
We then organized the evaluation stakeholder groups receiving votes by priority. The workshop participants then listed the evaluation questions thought to be of interest to each group, the type of evidence each group would find the most convincing (e.g. statistics, personal stories), and their preferred dissemination formats (e.g. written reports, photos). This allowed the participants to assess if a future evaluation would be able to meet the needs of multiple stakeholders simultaneously.
Stage 2: Collaborative development of the theory of change components
Stage 2 focuses on deconstructing the workshop participants’ tacit assumptions about how the program produces positive change, then collectively reconstructing an explicit program theory of change that incorporates their varied perspectives through discussion and debate. Those with intimate knowledge about how the program operates should be involved in Stage 2. To obtain a holistic picture of the program, the workshop participants should include individuals who hold diverse views of the program.
We identified potential participants for the Stage 2 workshop in consultation with two decision-makers from Great Potentials and the MATES Junior program who had participated in the Stage 1 workshop. The Stage 2 participants included three current staff members, two former mentors, and one individual who had been a teacher and principal at two of the MATES Junior middle school sites. Stage 2 consisted of six interactive sessions as outlined below.
Session 1: The program vision
This session is based on the first step of Fetterman’s (2003) empowerment evaluation process. Fetterman suggests that facilitating open discussion about the program vision with various stakeholders provides a refreshing look at the underlying unifying goal from different perspectives and aids consideration of divergent views. Having a clear understanding of the long-term goals, as a collective unit, is critical to achieving these goals. This session informs the development of the program theory because it provides the destination from which to map program processes and related outcomes. During the session, we prompted the participants to imagine and describe a future in which the program would no longer be needed and share their reflections as a group.
Session 2: The antecedent condition and the program participant profile
The vision provides one bookend to the program theory. To provide the other—the place from which the program started—the next session focuses on identifying the program’s antecedent condition. In their three-step approach to constructing program logic models, Renger and Titcomb (2002) argue that to develop sound program logic (or theory), stakeholders must begin by making explicit the underlying rationale of the problem the program is designed to address, the antecedent condition. Thus, in Session 2, we focused on elucidating what prompted the decision to develop MATES Junior. Collaboratively, the group developed an antecedent condition statement.
Although many programs have a stated target group, discussion about the antecedent condition helps to flesh out additional characteristics of the participant profile and facilitates reflection about whether the processes and goals of the program are connected to the right people and appropriate to their needs. After clarifying the antecedent condition, we prompted the participants to describe the typical MATES Junior mentee and to consider their descriptions in light of the stated target group, as communicated in program documentation, and the newly developed antecedent condition.
Session 3: Tracing the roots of the antecedent condition
Gugiu and Rodriguéz-Campos (2007) and Renger and Titcomb (2002) suggest using a process tracing strategy that entails asking participants to more fully describe the program’s antecedent condition by continually asking why the situation came about, until a backward process outlining the causal factors that led to the antecedent condition is depicted. In Session 5, we revisit this picture to identify links between causal factors and program processes, further clarifying the program’s presumed scope of impact.
In Session 3, participants engaged in paired discussions about the factors they felt contribute to the antecedent condition. The participants provided a significant list of contributing factors focused on the personal characteristics of mentees but also named external influences they felt impact these characteristics.
Session 4: Identifying the critical processes and outcomes
To illuminate the purpose of the activities and their intended results, in Session 4, we asked participants to consider the critical program processes (i.e. the mechanisms that underpin program activities and drive the program toward desired outcomes). Working in pairs, participants identified the most important program processes and provided a justification for why each process was critical in relation to associated outcomes. The participant discussion highlighted the complexity of MATES Junior; multiple processes contributed to each outcome and each process fed into multiple outcomes. The session encouraged participants to think critically about the intentionality of the program activities.
Session 5: Making connections and establishing the program scope
In this session, we help to identify how critical program processes and outcomes are linked to the antecedent condition, and assess the level of consensus among participants regarding these aspects. A group discussion during this session resulted in agreement on a set of factors contributing to the antecedent condition, primarily focused on the individual young person, which participants felt MATES Junior directly impacts.
Session 6: Identifying key moderating influences and potential unintended effects
Moderating influences are variables that affect the relationship between an independent and dependent variable (Baron and Kenny, 1986). In program evaluation, moderators are factors that influence relationships between program processes and outcomes. Identifying and accounting for these factors allow evaluators to establish a more nuanced understanding of program effects and are included in Weiss (1997) and Rogers’ (2007) theory of change approach. In Session 6, participants consider factors that may obstruct or facilitate program success, including individual, programmatic, or external influences.
The final discussion of Stage 2 revolves around potential unintended program effects. One critique of theory-driven evaluation is that by focusing on what the program intends to do, stakeholders may lose sight of any unanticipated outcomes (Donaldson, 2003). Thus, we believe consideration of unanticipated effects, particularly any potential for harm, is integral to a robust and ethical theory-driven process. Because the theory of change represents program intentions, it is not necessary to incorporate unanticipated effects in the visual overview. However, we do encourage the inclusion of caveats pertaining to any potential iatrogenic effects in the accompanying summary so that attention can be directed to mitigating harm. Although this discussion was challenging for the MATES Junior participants, openly and honestly addressing the possibility of unintended harm toward a vulnerable population is essential.
Stage 3: The preliminary theory of change and logic analysis
Stage 3 consists of two parts: deeper analysis of the Stage 2 workshop data to synthesize a coherent theory of change and conducting the direct logic analysis to assess the surface legitimacy of the theory. Recognizing that evaluators possess expertise in data and literature synthesis, systematic analysis, and evaluative decision-making, the evaluator (or evaluators) moves into the role of expert and generally works independently in Stage 3. While it is possible for the evaluator to coach stakeholders to lead this stage in a manner that reduces biased decisions (as promoted in empowerment evaluation, Fetterman, 2003), as stated above, our experiences suggest that few stakeholders have the capacity to engage in this time-intensive process. Furthermore, evaluator independence at this stage is likely to reduce criticisms regarding the credibility of an entirely stakeholder-led process.
Part 1: Thematic analysis of stage 2 data and theory of change synthesis
The evaluator analyzes the data produced during the Stage 2 workshop more thoroughly and produces a coherent visual depiction of the theory of change accompanied by textual summary explanation of each component and the interrelationships between components. The evaluator then feeds back the synthesized information to the workshop participants for member-checking and may make further modifications based on stakeholder feedback. This provides an opportunity for those who may not have been comfortable voicing their opinions within the collective conversation to offer their views. We advise stakeholders that the theory of change will never be perfect nor will it capture every important detail. Rather, it provides an overview of the core considerations for the program, and is a living document that should be revisited, if and when needed, as the evaluand evolves.
Guided by Braun and Clarke’s (2006) guidelines to thematic analysis, the second author analyzed the materials and audio-recorded Stage 2 discussions at a deeper level than was possible in the moment during the workshop. Analysis involved reading and re-reading session outputs, reviewing workshop notes and listening to the recording of the workshop; labeling data excerpts with codes representing initial ideas; organizing codes in combinations that best presented themes in the data; and reviewing the themes across the full data corpus to ensure they accurately represented the data.
Focusing on interrelationships between the program theory components discussed during the workshop, she refined the program theory themes and organized them into a coherent theory of change that visually depicts the antecedent condition, participant profile, critical program processes and outcomes arising from these, as well as key moderators. After producing an explanatory summary that distilled the details of each theoretical component and the interrelationships between them to accompany the visual depiction, we disseminated the preliminary theory of change to the six workshop participants for verification regarding its representativeness of the session discussions.
Part 2: Direct logic analysis
Another critique of theory-driven evaluation is that stakeholder assumptions of the program theory are often false (Astbury and Leeuw, 2010; Brousselle and Champagne, 2011; Rogers, 2007) thus, in Stage 3, the evaluator also engages in a verification process. Since it is generally not possible to empirically test all theoretical links concurrently or in a short amount of time (Weiss, 2000), we encourage initial verification of the program theory’s surface legitimacy. This is done by assessing alignment of the theory components with program documents, any previous research or evaluation reports, and if possible, observations of the program in action. The evaluator then conducts a direct logic analysis (Brousselle and Champagne, 2011).
Logic analysis evaluates the legitimacy of a stakeholder-derived program theory by assessing the proposed theoretical links against scientific evidence or expert opinion. Since stakeholder assumptions of how a program produces change may be misguided, Brousselle and Champagne (2011) argue that stakeholder assumptions should be validated using the relevant scientific evidence base before expending further resources on analysis of the program’s effects. A thorough review of existing evidence or consultation with experts has the added benefit of providing information on how the program may be improved. In contrast to a reverse logic analysis, which involves identifying a range of alternative intervention approaches to produce the desired outcomes, a direct logic analysis identifies whether the processes and conditions of the program as designed are likely to produce the intended effects.
In accordance with Donaldson’s (2007) program theory-driven evaluation science, in assessing the plausibility of the program theory, this process encourages drawing from a wide range of evidence—both qualitative and quantitative derived from various methods—from the relevant discipline(s) and any best practices for the field. Convergence across the theory of change information sources (the stakeholder-driven theory, program documents, existing evaluations, and the logic analysis findings) as well as any notable discrepancies should be recorded to prompt further critical reflection at Stage 4 of the process.
For this case, the logic analysis focused on whether each component of the program theory aligned with evidence on youth mentoring, and particularly school-based mentoring. Where evidence specific to youth mentoring was lacking, we drew on the positive youth development literature. Overall, the logic analysis demonstrated that the MATES Junior theory of change aligns strongly with current youth mentoring literature in that the model is comprised of well-established components supported by evidence. The logic analysis also highlighted gaps in the theory of change, including omission of some established best practices. In line with the problem diagnosis and decision-making focus of evaluability assessment (Thurston and Potvin, 2003; Trevisan and Walser, 2015; Wholey, 1987, 2010), we noted and raised these gaps with participants during the next stage; however, the degree of coherence, clarity, and agreement across the participating stakeholders in Stage 2 combined with the logic analysis findings led us to confirm that the theory of change was sound and progression to an outcome evaluation (as desired by the priority evaluation stakeholders identified in Stage 1) would be reasonable.
Stage 4: Dissemination and planning for the future
At Stage 4, stakeholders come back together to discuss the information synthesized from the previous workshop stages. The degree of discrepancy across all sources should be the primary basis for making evaluability assessment conclusions but the evaluator should give stakeholders an opportunity to respond to their observations. Discrepancies highlighted through the logic analysis should be carefully considered by the participants before proceeding with decision-making about whether program theory testing or outcome evaluation is warranted. During this stage, the participants should also revisit the primary evaluation stakeholders’ information needs and any discrepancies between these, the theory of change, and the logic analysis results. For instance, we discuss the potential consequences of evaluating an outcome that is not relevant to the theory of change but is desired by a priority evaluation stakeholder group. The discussion is helpful with respect to identifying areas for further improvement.
If there is broad alignment across the theory of change data sources and the logic analysis suggests that the theory is sound, the evaluator makes a recommendation to progress to theory testing or another form of outcome-focused evaluation. In line with the second step of Donaldson’s (2007) program theory–driven evaluation science approach and with evaluator support, the intended users should prioritize evaluation questions and make methodological decisions based on the theory of change, the primary evaluation stakeholders’ information needs and preferences, and the practical constraints of the organizational environment. The third step of Donaldson’s (2007) approach—testing the validity of the hypothesized links by means of robust but feasible empirical research—is not included in the process but this step can be easily advanced with the established foundation.
Four MATES Junior stakeholders attended the Stage 4 workshop, all of whom had attended at least one previous workshop. Three of the four were program decision-makers. Stage 4 consisted of the four workshop sessions we describe below.
Session 1: Dissemination, debate, and theory of change amendments
We first presented the preliminary theory of change constructed from the Stage 2 workshop, accompanied by an outline of the logic analysis and highlighted points of convergence and divergence. We emphasized that the theory of change is a living document and limited in its representativeness of all stakeholder perspectives, thus should evolve as perceptions of the program change, or to incorporate missing perspectives. In this session, we wanted to draw attention to and discuss potential changes to the theory of change. We highlighted areas of best practice to consider for inclusion in the theory. Presenting this information enabled the participants to acknowledge that they incorporate most of these aspects in current program practice and that the omissions were inadvertent. The discussion also allowed the participants to consider which of the best practices should be explicitly referenced in the theory of change and which were important but need not be, and to request further changes that would help clarify the most salient features of the MATES Junior program. The participants suggested changes to increase the specificity of several outcomes and moderators. This session highlighted the value of an iterative and dialogic approach where such checks and balances can be included.
Session 2: Comparing Stage 1 and Stage 2 findings
We then prompted the participants to reflect on the findings from Stage 1 (i.e. the priority evaluation stakeholders’ information needs) and Stage 2 (i.e. the program theory as described by the intended users) to facilitate decisions about evaluation priorities and needs. Purposely omitting Stage 1 findings from the Stage 2 discussion helps minimize the influence of the needs of external stakeholders on the theory of change. Our discussion with MATES Junior participants in this Stage 4 session focused on the priority stakeholder groups identified in Stage 1 and their primary interest in academic outcomes supported by quantitative evidence, as this illuminated a discrepancy with the range of outcomes and the individualized nature of MATES Junior illustrated in the theory of change. This discussion highlighted that an evaluation geared to surface individual growth trajectories may be better suited to MATES Junior.
Session 3: Prioritizing outcomes, processes, and moderators
During this session and informed by the discussion in the previous session, we again drew on Fetterman’s (2003) group decision-making strategy and asked the participants to vote on the program theory components each felt should be prioritized for evaluation. They each marked the components they wanted to endorse, including components added during Session 1 of this stage.
Session 4: Specifying indicators and timeframes
The purpose of this session was to identify measurable indicators of the components endorsed through the voting process, because the priority stakeholders were interested in quantitative evidence. We asked the participants to describe what would show achievement of an outcome, or the effect of a program process or moderator, in a measurable way. We also asked participants at what point during or after the program they would expect measurable effects to show.
Process wrap-up: Identifying benchmarks for program success
To conclude the process, we encouraged these stakeholders to consider the benchmarks associated with different degrees of program success. As advocated by Davidson and Chianca (2016), for an evaluation to truly be evaluative, conclusions need to move beyond the “what” to the “so what” of evaluative judgments. As a useful next step for MATES Junior, we suggested developing evaluation rubrics (King et al., 2013) based on the theory of change and the priorities identified in Stage 4. We also used the discussion to raise concerns about particular design limitations (e.g. resources and capacity required to implement a robust experimental or quasi-experimental outcome evaluation). Time constraints prevented us from progressing decision-making beyond these general recommendations for next steps. A full report outlining the process and findings, including the outputs generated from each session was later circulated to all stakeholders involved in the process.
Discussion
Addressing a gap in the evaluation literature, our aim was to describe a theoretically integrative evaluation design process. As illustrated above, this process integrates program theory-driven and utilization-focused evaluation within an evaluability assessment framework and eclectically draws on principles, methods, and tools from other evaluation models. Using a case example, we described how this integrative approach to evaluation unfolds across a series of staged interactive workshops. This is an approach that we have found works well to address our aims of designing an evaluation that is robust, responsive, and simultaneously increases process use for intended users.
Indeed, the feedback we obtained from our participating intended users emphasized that this process was valuable and illuminating. They valued hearing different stakeholder’s perspectives of the program and viewing the program with a more analytical lens. From an evaluator perspective, the process gave us a deeper understanding of the evaluand and the evaluation context, which was critical for facilitating their evaluative decision-making. It also created a strong foundation from which to design a credible, relevant, and sensitive outcome evaluation.
Limitations
We must, however, acknowledge limitations associated with the case example we described here. Importantly, these are primarily to do with logistical constraints rather than limitations associated with the theoretically integrative and eclectic features of the process. For instance, time and resource constraints meant our involvement ended when we had only begun to make decisions about a future outcome evaluation. We are unsure as to the influence the process had on the client’s or other stakeholders’ ongoing work and thus we are unclear about its instrumental utility.
This article also only provides a single example of the many trajectories this process can take. For MATES Junior, the interest, as determined in Stage 1, was to focus on outcomes and quantitative evidence of program effects. Because the evaluability assessment did not raise any concerns with respect to progressing an outcome evaluation, we moved in this direction in Stage 4 with a focus on measurable indicators of change. Had the evaluation stakeholders’ interests and information needs been different and had the evaluability assessment revealed a need for further theory refinement or program re-design, we would have taken a different direction in Stage 4; one that could have been more developmental or qualitative in nature.
There are also several implementation challenges to consider. The hectic schedules and extensive workloads of some stakeholders mean that commitment to a participatory evaluation process remains a challenge. Despite the considerable effort given to accommodate stakeholders’ schedules, it is difficult to find a time suitable for a wide range of stakeholders to come together for any length of time for the purposes of evaluation. Thus, it pays to consider how the process could be further expedited to better accommodate different stakeholder groups. Examples that might expedite the process include circulating questions in advance to prime ideas and expedite the interactive process (as suggested by one of our participants) and ensuring stakeholders begin the process at the stage that best meets their needs (e.g. starting at Stage 2 if time and resources are restricted and the priority evaluation stakeholder and their information needs are clear).
The first author has struggled, at times, to convince stakeholders of the value of engaging in this process when they are time-constrained. Many express a need for a quick outcome evaluation to satisfy funder requirements. Pressuring stakeholders to engage in a participatory process risks the “empowerment evaluation paradox” (Deane and Harré, 2016), whereby our desire to authentically involve stakeholders can result in feelings of disempowerment if there is a lack of responsiveness to stakeholders’ other needs. At the same time, when done well, theory-driven evaluation and evaluability assessment can save costs and time, and lead to better evaluation design (Donaldson, 2003; Wholey, 2010). Consequenty, other ethical tensions need to be navigated when evaluators are pressured to rush evaluation work.
It is also important to note that, because Stage 2 and 3 of the process formed the basis of a postgraduate research qualification for the second author (Dutton, 2014) and the first author was interested in obtaining feedback on the utility of the process from the MATES Junior participants, the workshops were offered at no cost to the organization. Organizational uptake is likely to differ when costs are associated.
We again draw attention to the need to consider who is and is not represented in the process. In the case of MATES Junior, the decision-making power was held by the participants who were invited to and could attend the workshops. The ultimate power was thus held by the client, who directed decisions about who was invited; therefore, the views represented in this case inherently leaned toward their values and perspectives. Pragmatic considerations of availability and timeframe largely drove their decisions and while four different stakeholder groups were represented at Stage 2, the views of important stakeholders with minority status were still obscured (e.g. the youth mentees and their family members, who are predominantly economically disadvantaged). This need not be the case and we encourage future users to ensure the inclusivity of diverse perspectives. Walls, Deane and O’Connor (2016) describe another case where a modification of this process was used to involve marginalized youth experiencing serious mental health challenges in evaluation co-design, including strategies used to facilitate a sense of safety and authentic engagement of the young people using arts-based methods.
Intentional inclusion of a broader representation of stakeholders, particularly those in minority positions, more closely mirrors fourth-generation (Guba and Lincoln, 1989) and deliberative-democratic (House and Howe, 2000) approaches to evaluation. We suggest that deeper integration of these theoretical perspectives would further enhance the process, if pragmatics allow. This brings us to considering the implications of grounding our approach in a critical realist epistemology. In our experiences, there are few drawbacks to theoretical integration, particularly when approaches are synergistic with respect to the orienting research philosophy. However, we appreciate that there may be more consternation regarding the eclectic blending of features arising from polemic paradigmatic traditions for those for those who are not critical realists. In which case, we contend that the staged nature of the approach makes it easy to adapt to fit with other epistemological beliefs. As signaled earlier, one could guide stakeholders to drive the entire process including the analysis stages, as one would expect with an empowerment evaluation approach. Positivist aspects (e.g. the logic analysis) could also be omitted if taking a constructivist approach.
Conclusion
In outlining this evaluation design process with a case program and including our personal rationale for theoretical integration and eclecticism at each stage of the process, we endeavored to provide an example of integrative evaluation practice. We offer this account in response to recent recognition that, as a field, “we have a dire need to understand how theories can be deconstructed and combined most advantageously” (Gargani, 2013: 83). While some may not agree that approaches evolving from perspectives at opposing paradigmatic ends should be combined (e.g. Kushner, 2002), the complexity of evaluation practice necessitates flexible and agile responses that can meet multiple goals and address diverse needs. Having a broad theoretical knowledge base and methodological toolkit upon which to draw enables this. In line with our argument, the process outlined here supports process use by facilitating evaluative thinking and decision-making and, in this way, builds stakeholders’ evaluation capacity. Furthermore, the process produces a platform from which to advance relevant and robust theory testing and outcome evaluation. We thus offer the process to others who are interested in embracing a theoretically integrative approach to evaluation design.
Footnotes
Acknowledgements
The researchers express their deep thanks for, and acknowledgment of, the Great Potentials Foundation for their partnership in this research. In addition, the authors give thanks to the individual participants for their commitment and contributions to the process, and the three reviewers who provided useful feedback on an earlier version of this manuscript.
Declaration of conflicting interests
K.L.D. and P.B. have both been involved in supporting the Great Potentials Foundation and the MATES Junior program as academic teaching staff involved in the service-learning course that provides training for some MATES Junior mentors. H.D. was a student in the service-learning course and mentor in the MATES Junior program prior to engaging with this research project. However, the focus of this article is on the process and not on the findings of the evaluability assessment. The purpose of including MATES Junior as a case example is to illustrate the process.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/ or publication of this article: This research was supported in part by University of Auckland Māori and Pacific Graduate Scholarship awarded to H.D.
