Including Diverse Stakeholder Voices in Youth Character Program Evaluation

Abstract

Although experts agree that diverse stakeholder inclusion enhances quality and equity in evaluation designs and implementation, diverse voices are often omitted. Particularly antithetical to principles of youth character development, evaluations for these programs should strive to include voices from various social, economic, community, and demographic perspectives. One innovative national evaluation capacity building initiative, the Partnerships for Advancing Character Program Evaluation (PACE) project, paired practitioners from youth programs in community-based organizations with evaluation professionals to enhance stakeholders’ roles in evaluation. PACE promoted stakeholder identification and inclusion through group exercises, partnership work, and coaching sessions. Using a mixed methods design with interviews, retrospective pretest–posttest surveys, and observational data, triangulated data addressed diverse stakeholders in the evaluation process, diverse perspectives on program performance, and connecting diverse input to evaluation design. Postprogram findings indicate that participants included more varied and diverse stakeholder perspectives in all the three areas. Implications for programs and evaluations are discussed.

Keywords

evaluation design evaluation capacity building evaluation use stakeholder inclusion diversity

According to the American Evaluation Association (AEA) Guiding Principles for Evaluators (2018), evaluators should promote equity and justice. Notably, the principles encourage evaluators to be wary of “exacerbating historic disadvantage or inequity” (AEA Guiding Principles for Evaluators, 2018). One strategy to integrate this approach is to include diverse stakeholders throughout the evaluation process. Experts agree that one critical element of high-quality evaluation is the inclusion of stakeholders at various points in the program evaluation process, including during evaluation planning, implementation, and utilization (e.g., Bryson et al., 2011). Engaging multiple perspectives from an array of representative stakeholders (Freeman, 1984; Greene, 1987) can benefit myriad aspects of evaluation, including program planning, evaluation design and implementation, buy-in, credibility, and use of results (Crane, 2018; Greene, 1987; Johnson et al., 2009; Jones & Wicks, 1999). Moreover, welcoming different stakeholder voices can illuminate underutilized perspectives from social, economic, demographic, and community lenses, including those that have been historically disadvantaged. Diverse stakeholders can surface the social and political systems within which programs operate as well as the cultural influences, norms, and values held by their organizational and individual allies (Bryson et al., 2011).

Despite the importance of including diverse stakeholder interests, needs, and priorities, they are often excluded from the evaluation process, ultimately threatening the quality and opportunity for equity in evaluation. Moreover, using exclusive approaches can privilege certain stakeholder perspectives and silence others; this potentially supports the hierarchical power of the evaluator’s voice and may further disenfranchise communities or reinforce narratives of oppression. From a practical standpoint, failing to include diverse stakeholders can also lead to inaccuracies, insensitivities, and insufficient information for program knowledge and improvement (Bryson et al., 2011). Without linking stakeholder input to evaluation designs, investigations might not ask resonant evaluation questions, include the right participants, or apply the most appropriate methods (Archibald et al., 2018). Further, without linking stakeholder input to program performance, it may be unclear whether programs are meeting needs or expectations of their constituents and community (Buckley et al., 2015).

For youth character development programs in particular, whose principles theoretically encourage youth voice, equity, and ethics, program evaluations should reflect such values. For example, since youth are the target population in these programs, welcoming youth as stakeholders can ensure that their satisfaction, needs, and concerns are considered (Urban, 2008); moreover, such inclusion may provide opportunities for leadership, empowerment, and other experiences connected to character exemplars. If the inclusion of diverse stakeholder perspectives is critical to program evaluation—especially for youth character programs—one might expect it to be a commonplace. However, since the vast majority of these programs have not been formally evaluated (Roth et al., 1998), their overall effectiveness and, as a result, inclusion of diverse stakeholder voice are unclear.

Rethinking Stakeholder Inclusion Through the PACE Project

One innovative national evaluation capacity building (ECB) initiative, the Partnerships for Advancing Character Program Evaluation (PACE) project, aimed to address these and other challenges in youth character program evaluation. The goal of the PACE project was to teach practitioners from character development programs to evaluate, improve, and potentially seek additional funding for their programs. PACE aimed to move beyond evaluation skill-building activities to equip program practitioners (PPs)—that is, the staff who run and implement programs that serve youth, families, and communities—with the tools to address the more foundational conditions that drive evaluation work forward. Evaluators were matched with practitioners from youth-serving character programs across the United States, with the goal of fostering true partnerships between evaluators and character PPs. Over 15 months, PACE sought to enhance ECB with participants from youth programs in community-based organizations and partnering evaluation professionals to promote the examination of their effectiveness (see https://www.montclair.edu/ryte-institute/pace-project/). The PACE project addressed the evaluation process overall and all the components of an evaluation from planning to utilization but focused considerable time on the often underattended early stages of evaluation—the planning. The PACE project included two types of participants: PPs who worked at youth-serving character development programs and evaluation capacity builders (ECBers), evaluators dedicated to expanding evaluation capacity. The PACE project recruited 8 ECBers and 16 character development programs and their PP teams (one program lost funding and had to withdraw from PACE, leaving a final set of 15 programs).

To make authentic change in youth character program evaluation, particularly as it relates to stakeholder inclusion, voices from both evaluators and practitioners matter; accordingly, the PACE project content was largely taught to PPs and ECBers together in the same sessions. ECBers and PPs participated in four types of activities: in-person workshops and the PACE culminating conference, webinars, evaluation partnership work, and consultation and coaching with a PACE lead facilitator. An element of PACE’s multifaceted approach included attention to stakeholders through group exercises, partnership work, and coaching sessions. Some aspects of the project’s activities offered PPs strategies and practices to enhance the role of and diversity among stakeholders in evaluation designs. At PACE’s conclusion, PPs had completed program evaluation profiles that included a program description, mission statement, program context, program assumptions, pathway model (theory of change), stakeholder map, life cycle analysis, evaluation purpose statement, and evaluation questions. A subset of 13 programs also did a complete evaluation plan including an evaluation design, sample, measures, analysis plan, and time line.

This article examines the inclusion of diverse stakeholders in evaluation practices in youth character programs. Evaluation findings from three subsets of data from the PACE project are shared, integrating qualitative and quantitative methods. The data address three different aspects of diverse stakeholder inclusion: stakeholder inclusion in the evaluation process, perspectives on program performance, and stakeholder input to the evaluation design. Specifically, inclusion in the evaluation process entails soliciting stakeholder input during evaluation planning, implementation, and utilization. Considering diverse perspectives on program performance refers to, for example, including stakeholder voices in addressing the extent to which the program demonstrates meaningful impact. Finally, linking diverse stakeholder input to evaluation design includes considering which aspects of the program’s theory of change are most important to key stakeholders. Evaluation often needs to be responsive to multiple stakeholders, such as program participants, staff, and funders. Stakeholder groups often do not have the same needs or interest in all aspects of the program. Including stakeholder input in the evaluation design means being deliberate in ensuring the evaluation design will satisfy multiple stakeholder needs and interests.

Using ECB to Promote Stakeholder Inclusion, Assess Program Performance, and Enhance Design

During the last two decades, a number of ECB definitions, models, and approaches have appeared in the evaluation literature. The most commonly cited definition of ECB is offered by Stockdill et al. (2002): “ECB is the intentional work to continuously create and sustain overall organizational processes that make quality evaluation and its uses routine” (p. 14). A more recent definition, based on a research synthesis of the ECB literature, describes ECB as “an intentional process to increase individual motivation, knowledge, and skills, and to enhance a group or organization’s ability to conduct or use evaluation” (Labin et al., 2012, p. 308). ECB practitioners and scholars agree that the ultimate purpose of ECB is to improve program outcomes (Labin, 2014; Nelson et al., 2019; Suarez-Balcazar & Taylor-Ritzler, 2014; Wandersman, 2014). In the PACE project, ECB was taught using elements of relational systems evaluation (RSE), including evaluative thinking, and evolutionary evaluation. RSE is a theoretically grounded framework that situates programs within an evolutionary and ecological context and works through evaluator–practitioner partnerships to integrate diverse sources of expertise and build evaluation capacity (Urban et al., 2014). RSE is operationalized using the Systems Evaluation Protocol (SEP), designed to guide evaluators and program managers in the planning, implementation, and utilization of an evaluation of virtually any type of program or intervention (Trochim et al., 2012; Urban et al., 2014; Urban & Trochim, 2009). The SEP is complemented by the Netway, an online cyberinfrastructure that facilitates the development of SEP products (e.g., logic models, pathway models, stakeholder analysis). Key elements of the SEP include stakeholder analysis that involves identifying stakeholders in the broader system within which a program is situated, completing knowledge flow diagrams that ask practitioners to articulate the flow of information to and from stakeholders, and identifying stakeholder priorities in the pathway model (theory of change). RSE also emphasizes evaluative thinking and the potential for organizational change that results in a culture of evaluation (Archibald et al., 2018; Buckley et al., 2015).

ECB can address each of the three areas of the study’s focus via different activities. For example, ECB can promote diverse stakeholder input into the evaluation design. One common aspect of ECB is determining the roles and perspectives of stakeholders in proposed evaluation plans. Evaluators cannot determine precisely what is and is not considered part of a program without perspective-taking from the view of key stakeholders, which requires identifying stakeholders; their level of connection to the program; and their respective interests, values, and expectations of the program to be assessed (Patton, 1997). In PACE, including stakeholder perspectives was encouraged through activities like stakeholder mapping and analysis, coaching, and input from a partnering ECBer. Moreover, ECB can explain the benefit of considering stakeholder perspectives on program performance, for instance, through the illustration of causal links between activities and potential participant changes in knowledge, attitude, and/or behavior. By understanding stakeholder perspectives on program need and impact, appropriate theories of change can be generated and targeted areas can be assessed. The PACE project did this through structured lectures, hands-on modeling, and workshop activities, as well as real-time work between PPs and ECBers. Finally, ECB can support the application of stakeholder input to the evaluation design. During the PACE project, coaching sessions between PPs and ECBers as well as coaching sessions with project PPs, their ECBer, and the PACE lead facilitator discussed the importance of and methods to link stakeholder input to the evaluation design, ensuring that various stakeholder perspectives were central rather than peripheral to the evaluation work. Stakeholder input was a factor in every step of the process: in the design of evaluation questions, creation of evaluation plan, and selection of appropriate evaluation tools and methods.

Current Study

The current study examined three research questions:

Research Question 1: Did the PACE project change participant perspectives on the inclusion of diverse stakeholders in the evaluation process?

Research Question 2: Did the PACE project change participant perspectives on the inclusion of diverse stakeholders in the program performance?

Research Question 3: Did the PACE project change participant perspectives on the inclusion of diverse stakeholders in the evaluation design?

Method

Participants

The data in the current manuscript are a subset of the larger PACE study (see Chauveron et al., 2021) wherein participants were recruited and consented. PACE participants were recruited through a national request for proposals process. PPs were selected based on the extent their program focused on character development, the need for ECB, the extent of support for evaluation from program leadership, and the extent to which their motivation for participation was in line with PACE goals. Evaluators were selected based on prior evaluation experience, knowledge of social science research methods, and interest in ECB. Ultimately, up to two staff were selected to participate in PACE from each of 16 youth-serving character development programs of the 30 that applied from across the country; these 31 PPs represented 12 states. There were also eight ECBers from seven states selected of the 32 applicants. Two cohorts were created, each with eight programs. The ECBers were partnered with one program from each cohort. Each of the two cohorts included one or two representatives from eight organizations; the same eight ECBers participated in activities with each cohort. Over the 15-month project, participants met twice for multiday sessions with their cohorts and once for a multiday event including all participants simultaneously. The first in-person meeting was a 3.5-day intensive workshop with content that addressed ECB, youth character frameworks, evaluative thinking, and stakeholder mapping and analysis. The stakeholder activities comprised approximately 8 hr of training, where each program pair completed a program stakeholder map (see https://www.evaluationnetway.com for examples of stakeholder maps) that received input from ECBers and the PACE facilitators. Later, the map was linked to the creation of a program pathway model, a visual representation of how each organization’s program activities link to outcomes (Urban & Trochim, 2009). PPs were partnered with an ECBer for the duration of the project, who helped them refine the pathway model based on feedback from their organization and their two to three PACE facilitator coaching sessions. During 6 hr of another day-and-a-half training for each cohort and the ECBers—Cohort 1 during an in-person session and Cohort 2 through an online meeting platform—content addressed linking the pathway model to evaluation questions and design. Then, PPs met virtually with their ECBers at least quarterly, but often more frequently, for the duration of PACE to create an evaluation design for their organization. The PP/ECBer teams also met quarterly with one of the PACE lead facilitators. To conclude the program, all PPs and ECBers jointly participated in a 2-day culminating conference to share their work, network, and take part in a keynote talk and panel sessions given by fellow PACE participants.

All ECBers completed the program; however, seven of the original PPs were unable to complete the project due to staff turnover or budget cuts (job or program elimination) at their respective organizations. Three of those were replaced by new hires, each of whom completed the preinterview and presurvey prior to joining PACE. One PP who participated in the culminating conference was unavailable for posttest interviews or surveys. Demographics were not collected on applications in an effort to avoid bias; rather, they were collected later in the process at which point data were available for 23 of the 34 participants (PPs and ECBers) who completed the PACE project and participated in both pre- and posttest data collection. Participants were primarily female (81.8% female, 18.2% male), between 25 and 74 years of age (with most respondents between 35–44 years). Participants identified as Hispanic or Latino/a/x (9.1%), Black or African American (13.6%), White (95.5%), and multiracial (9.1%; participants could select multiple options). Respondents had either a college (38.1%) or graduate (61.9%) degree, and most had worked for their organization for 1–3 years (30%) or 4–6 years (40%). An additional 30% had been at their organization for 7 or more years, and just 4.8% had been in the position for less than 1 year. Based on self-reports, one PP indicated that they entered PACE with no evaluation knowledge; the rest had very limited (68.8%) or somewhat strong (31.2%) knowledge. ECBers reported they entered PACE with somewhat strong (20.0%), strong (60.0%), or very robust (2.0%) knowledge (18% skipped this question).

Procedure and Design

Each type of mixed methods data was collected and analyzed separately, while results and corresponding insights were integrated during interpretation. A traditional pretest–posttest design was used for the interviews, a pretest–posttest with a retrospective pretest design was used for the survey measure, and a post-only assessment was used for the observational measure. (A pretest–posttest with a retrospective pretest design is a survey that includes questions addressing items before the program [the pretest] and after it [the posttest] on the same instrument, explained in detail below.) Both qualitative and quantitative data were collected at baseline and post-PACE participation. ECBers completed interviews, and PPs completed both surveys and interviews before PACE began and again after the culminating conference. An observational measure was also administered at the culminating conference.

Interviews

Interviews were used to understand the complexity of PP and ECB experiences within the contexts, relationships, and structures that shape participant, staff, and evaluator experiences in youth character programs. A semi-structured interview protocol was administered to both PPs and ECBers. Doing so helped honor diverse participant voices, a strength of using qualitative methods in evaluation work (Greene et al., 2001). All PACE participants received a consent form through a SurveyMonkey link and were invited to schedule appointments for interviews. Each interview took an average of 45–60 min to complete. Researchers administered the phone-based protocol before the first PACE workshop and again after the culminating conference. The data include a total of 34 matched pre–post interviews from 8 ECBers and 26 PPs.

Survey

In a retrospective pretest–posttest design, both pre- and postmeasures are administered at the same single time point. To assess change, the self-assessment instrument includes questions about perceptions before and after the program content is administered, but both sets of questions are asked at a single time point at the program conclusion. The instrument directs respondents to think back to their perception of each item before the program (in other words, retrospectively) and then to consider their conceptualization after program participation. Some research suggests these evaluations may better capture self-assessment changes than traditional pretest–posttest assessments (Geldhof et al., 2018; Skeff et al., 1992). The concept was developed to reduce the threats to internal validity produced by self-assessments (Howard, 1980; Howard, Dailey & Gulanick, 1979) while also being sensitive to potential “response shift bias,” a change in the respondent’s internal standard used to reply to items from the pre- to posttest administration because of their newfound understanding of the concept assessed (Howard, 1980; Howard, Ralph et al., 1979). In fact, when trainings address complex topics, such bias is more likely to occur (Rockwell & Kohn, 1989). Response shift bias may vary within the same program as indicated on some items (rather than all; Pratt et al., 2000) or for some participants (Manthei, 1997). Some suggest that the retrospective pretest–posttest should be supplemental to a traditional pretest–posttest design (e.g., Howard, Ralph et al., 1979) to find response shift biases; others have suggested the tool can be used in lieu of a traditional pretest–posttest design (Allen & Nimon, 2007; Lamb & Tschillard, 2003). In this study, we use the supplemental approach, which allowed us to account for intervention change using both a traditional and bias-adjusted perspective as points of comparison for the posttest. Thus, in the PACE project, we have three points of survey data: a traditional pretest response, a retrospective pretest response, and a traditional posttest response. A total of 26 PPs responded to the baseline and retrospective pretest–posttest survey. Of that, 25 PPs provided responses at all time points to the item addressing stakeholders; thus, the final sample of survey respondents is 25.

Observation

Observations are an effective tool to capture the difference between what participants claim to do through self-reporting (on tools like surveys) and what they actually do (Patton, 2012). Thus, overt observations were conducted. At the project’s culminating conference, PPs gave a brief presentation of a poster they created explaining their program’s application of PACE skills and concepts. In that 10-min presentation, each organization’s PPs described their application of the PACE concepts including ECB, evaluative thinking (Archibald et al., 2018; Buckley et al., 2015), the SEP (Trochim et al., 2012; Urban et al., 2011; Urban et al., 2014; Urban & Trochim, 2009), evolutionary evaluation (Urban et al., 2014), and youth character development (see https://sites.google.com/site/paceproject2017 for more information). The information presented at the culminating conference pertained directly to the design of their organization’s evaluation. Two raters observed the presentation and independently completed an observational rubric.

Measures

The study measures addressed different aspects of including perspectives from diverse stakeholders; specifically, interviews captured stakeholder inclusion in the evaluation process, surveys addressed perspectives on program performance, and the observation focused on connecting perspectives to evaluation design.

Diverse stakeholder inclusion in the evaluation process

The interview measured participants’ knowledge of ECB and the components of high-quality evaluation addressed in the PACE training. The interview questions were developed prior to the program’s start by the PACE team to include items unique to participant roles (PP or ECBer) and universally applicable items. For instance, questions for PPs included, “How would you describe the role of an evaluator working with program staff?” Sample questions for ECBers included, “How would you describe the role of program staff working with an evaluator?” Sample questions for both participant types included, “What are the elements of high-quality evaluation?” For both PPs and ECBers, the protocol included questions with broad conceptualizations of stakeholders to capture the wide views of stakeholders and probing. Data were coded using an axial to relate data on stakeholders from the interviews and used an a priori coding for different stakeholder categories and themes. The process was led by two trained coders from the PACE research team. Data were categorized by theme. In addition, stakeholder inclusion was considered within the context of each program, and from those data, it became apparent the data were best categorized by emergent themes including role and representation.

Diverse stakeholder perspectives on program performance

On the survey, the Evaluation Capacity Assessment Instrument (ECAI) captured inclusion of stakeholders as well as other aspects of individual and organization-level evaluation practices (Taylor-Ritzler et al., 2013). For the current study, only one ECAI item was used because it specifically addressed stakeholders: “My program gathers information from diverse stakeholders to gauge how well the program is doing.” The item measured agreement on a 4-point Likert-type scale ranging from strongly disagree (1) to strongly agree (4). The ECAI was administered in two ways: first, as a traditional pretest completed before the PACE program began to capture baseline data, and second, as a retrospective pretest–posttest survey to assess pre–post changes and capture any response shifts that emerged from PPs as they gained familiarity and understanding of evaluation concepts and approaches (Rohs, 1999).

Connecting stakeholder input to the evaluation design

The research team aimed to understand how participants communicated about the connection of stakeholder interests and perspectives with their evaluation design; this was assessed via an observational measure. Two PACE raters simultaneously listened to each PP’s culminating conference presentation and then independently rated their explanation of the connection of stakeholder input to their evaluation design. The single-item observational tool asked the following question: “How much did the PPs thinking about their program’s key stakeholders factor into the decision about which evaluation questions to focus on?” Probes were used to determine how conceptualizations of key stakeholders informed choices regarding the selection of evaluation questions. Ratings were scored on a 5-point Likert-type scale where 1 = demonstrates little or no recognition of stakeholder priorities or interests in their choice; 2–4 = can articulate the connection between stakeholder priorities or interests and their choice (at low, medium and high levels of detail, respectively) BUT do not provide a detailed explanation, and 5 = gives well-articulated and detailed explanation for how consideration of stakeholder priorities and/or interests connected to their choice.

Interrater reliability was achieved through a process of score comparison between the two raters for the 15 organizations present at the event (one organization’s representatives did not attend the event as funding was cut to their program). Differences between the raters were found on scores for four organizations; however, the intraclass correlation between the two scorers was .913 with a 95% confidence interval [.742, .971], F(14) = 11.543, p < .001, and k = .659, 95% CI [.526, .792], p < .001, indicating that the rater perspectives were reliable and in strong alignment. The differences were small on the two items where they existed and were resolved through conversations between raters until alignment was achieved on all items.

Analysis

First, we assessed participants’ thoughts about diverse stakeholder inclusion in the evaluation process as measured in the interview. We coded the interview data using an iterative process with three trained coders in two rounds: first, using a theory-driven approach and then using an ad hoc approach. Second, we assessed diverse stakeholder perspectives on program performance as measured by the ECAI item. To do so, survey responses were analyzed with paired samples t tests to determine whether change occurred after participating in PACE as captured by measuring differences in the baseline and the retrospective pretest, the baseline and posttest, and the retrospective pretest and posttest. Third, we investigated participants’ ability to connect stakeholder input with their evaluation design. We did so by counting the number of scores that fell into each rating on the observation tool scale.

Results

The findings indicate that, overall, participants displayed a change in their conceptualization about how much their program incorporates stakeholder perspectives in the evaluation process, to assess program performance, and to guide evaluation design. We found behavior change regarding stakeholder inclusion from before to after participating in PACE as determined through the lens of each research question.

Research Question 1

First, the findings regarding the Research Question 1 indicated that the PACE project changed participant perspectives on the inclusion of stakeholders in the evaluation process. The qualitative interview data allowed us to further examine the inclusion of stakeholder perspectives in the evaluation process by determining whether and where they fit into the program evaluation process. The results show that PPs and ECBers both left PACE with more interest in including stakeholders in the evaluation process and using broader conceptualizations of who they considered to be stakeholders, described below.

First, the findings indicate both PPs and ECBers had different post-PACE thoughts and valuing of stakeholders in programs and evaluations. Specifically, 10 PPs and 3 ECBers went from not mentioning stakeholders at all in their ideal or current practice of program evaluation at pretest to mentioning stakeholders at posttest. Another 15 deepened their responses from pretest to posttest, showing a connection between stakeholder inclusion and the evaluation planning process. For example, one PP said: “…high quality evaluation has to have stakeholder buy-in.” One ECBer echoed the sentiment by noting that “evaluation planning starts with including all your stakeholders.” Another ECBer explained that after PACE, this person had an expanded perception of high-quality evaluation and the relationship between its different aspects, saying, “In general I tend to think of high quality evaluation in the context we deal with where the context and purpose and the stakeholders and resources and methodology and the ultimate uses are all aligned.”

In addition, PPs and ECBers placed more value on using stakeholder input. One PP said:

Including stakeholders in the process; pulling in various people—beneficiaries who’ve been involved. I guess you know all of that, but it should be valuable to the people who it’s intended for. So, it should have some use or some value; it should be provided back to people.

Six ECBers agreed as exemplified by what one said:

…it’s about identifying who all the stakeholders are, but then it’s thinking about who we are, where is the volatility, where is the greatest concern or greatest challenge…. What could we learn from various stakeholders?

Next, from pretest to posttest, most participants showed a more developed concept of who are considered their program stakeholders in both role and representation. Nearly all participants indicated that they considered who their stakeholders are in a more extensive way after PACE.

Stakeholder role

In terms of role, all but two participants mentioned funders as program stakeholders on pretests, but many had no or limited responses regarding others. However, at the posttest, all but five participants mentioned more primary and secondary stakeholders than at the pretest, including youth, families, various levels of program staff, community partners and associated staff (e.g., classroom teachers for school-based programs), community members, and the wider youth development field. It is worth noting that four PPs realized after PACE that they had excluded or underutilized youth voice in their previous evaluations and were now rectifying that after participating in PACE. A few PPs mentioned including youth in evaluations in some way in the distant past, and one or two included families, teachers, or program staff, but none included any of those voices in recent evaluation work. Moreover, very few included more than one primary stakeholder in the evaluation process simultaneously at any point in their organization’s program evaluations.

Many PACE participants broadened their ideas about the pool of possible stakeholders from primary and secondary roles to include in their evaluations, expanding post-PACE to include different and more diverse voices than in the past. As illustrated by one ECBer: “I think initially my perception was that the [program] founder was the critical stakeholder. And through this process, that turned out to absolutely not be the case.” Nearly all participants also recognized the need to intentionally include other diverse voices previously excluded from their work, as one interview excerpt illustrates:

Participant

: …considering all the parties involved, all the stakeholders we didn’t even think about before PACE.

Interviewer

: Who are some of the stakeholders you didn’t think about before PACE?

Participant

: Well, sometimes we…, say, people in other youth serving organizations, other people in the administration, the parents of the kids, so just some simple things but then some bigger ones, too, just how it affects the community at large. Sometimes we didn’t consider all of those people.

Stakeholder representation

In terms of representation, a wider range of social, economic, community, and demographic perspectives from stakeholders was fostered. Since PPs served programs across the country, their communities include a wide range of diverse stakeholders. At the pretest, very few participants mentioned including stakeholder voices from underutilized or historically disadvantaged perspectives, but after PACE, nearly a half did. For instance, a PP from a youth program for homeless youth in the Northeast indicated that after PACE, the program was intentionally engaging lesbian, gay, bisexual, transgender, and queer/questioning youth and youth of color in their evaluation planning since those experiences represented a large portion of youth they served. One reason the organization had been hesitant to intentionally engage such representatives was largely logistical. Given the transient nature of their participants, the program staff thought it would be too challenging to include such stakeholders and instead thought that staff could serve as a proxy to their perspective. However, after PACE, the PP changed her mind and indicated that an intentional engagement effort would be made to capture the diversity of the clients served and the community in which the program operates.

One PP shared how considering different perspectives could inform program decision-making:

I think by doing the stakeholder analysis and checking in on different viewpoints of what the program is doing well, checking on the goals that each of the stakeholders believe the program should be achieving and how effectively they’re meeting those goals and then making decisions.

In some cases, PPs recognized that in their programs, using representative community members as stakeholders could provide a richer view of their program. One PP stated:

I think pulling more of our staff and stakeholders together to look at programs.

Asking the questions of how they’re doing, what stage they’re in, where we’re going with it. More of these questions from a multi-perspective point of view is something we have [recently] done and will continue to do.

Moreover, at the posttest, nearly all PPs considered stakeholders a key part of a comprehensive evaluation and results utilization, a considerable increase over those at the pretest. One PP illustrated this point:

…in order for us to really get a good picture of our program I feel like we need to be able to have some results for all those different groups and, not only that, but we would want to get information from the other stakeholders that was mentioned before to get a full picture of a student’s character development; we can’t be the only people who are seeing it. So, getting information from parents and classroom teachers and community members and things like that to see what changes they are noticing, I think, also would provide a more complete picture [that’s] more comprehensive.

Some participants explained that by including stakeholder views in terms of results use would aid the evaluation design and application. One PP explained:

Making sure you know who your audience is, so thinking about utilization right from the beginning so that you can get information that’s gonna be useful, and that means also thinking about who your stakeholders would be.

An ECBer also said: “…identifying the stakeholders and thinking about their evaluation needs and thinking about ways you can report, I think that’s really useful.” Five PPs also discussed sharing information with various stakeholders; for example, one PP explained:

…after the evaluation is done it’s important to share that information with stakeholders—internal, external, various stakeholder groups—and get feedback from them about what the evaluation means to them, what it says to them.

Including diverse stakeholders may have implications for the evaluation as well as the program design and funding, as was the case for one program in PACE. One PP indicated that the processes and strategies taught in PACE helped his program understand the unmet needs of participants. The organization served youth in need and court-involved youth through volunteerism in the Midwest. Through their PACE participation, they uncovered the need for more workforce readiness training for participants, which they then added to their program; resultantly, they found more funders willing to fund the program.

Three PPs did not directly discuss stakeholders at either time point, and two demonstrated no change on responses between pretest and posttest interviews, as they both began and ended the program with high levels of inclusion of stakeholder perspectives at both time points; notably, their survey responses picked up increases. Thus, using both quantitative and qualitative approaches allowed us to develop a more complete understanding of participants’ perceptions of the role of stakeholders.

Research Question 2

Next, the findings that addressed the Research Question 2 revealed that the PACE project changed participants’ perspectives on the inclusion of stakeholders in assessing program performance. On the survey, paired samples t tests showed that between the pretest taken at baseline and the retrospective pretest question asked after the program on the ECAI measure, there was no difference, indicating that no response shift bias was present. A significant difference was found when comparing the retrospective pretest (M = 2.64, SD = .81) to the posttest, M = 3.20, SD = .71; t(24) = 4.58, p < .001; similarly, a significant difference was found between responses from the baseline pretest (M = 2.44, SD = .77) to posttest, M = 0.3.20, SD = .71; t(24) = 3.41, p < .01. Thus, using the traditional pretest–posttest design produced similar results to the retrospective pretest–posttest. Results of both approaches show that PPs agreed more that their programs collect information from diverse stakeholders after PACE than before it. Further, the descriptive statistics offer some additional insight, indicating that at the baseline pretest, 52% of respondents agreed or strongly agreed that their program gathers information from diverse stakeholders to gauge how well the program is doing, 60% agreed or strongly agreed at the retrospective pretest, and 92% agreed or strongly agreed at the posttest. Thus, it appears a shift in organizational inclusion of diverse stakeholders occurred between the start and end of PACE (see Figure 1).

Figure 1.

Comparison of baseline (pretest), retrospective pretest and posttest responses to the question “My program gathers information from diverse stakeholders to gauge how well the program is doing.”

Additionally, when examining individual scores on the same ECAI item, 10 PPs rated their organizations the same in terms of inclusiveness at both the baseline and the retrospective pretest 1 year later. Interestingly, 15 PPs shifted their responses from baseline to retrospective pretest. Specifically, nine people scored higher on the retrospective pretest indicating that their practice was actually more inclusive than they had estimated at the baseline. Notably, six people scored their organizations lower on the retrospective pretest responses than at baseline, reflective of the process of learning about work with stakeholders.

Research Question 3

Finally, to address the third Research Question 3 about stakeholder input to the evaluation design, observational scores on PP explanations of how stakeholder input was connected to evaluation questions were used. Observational scores from both raters were averaged, and results indicate representatives from all organizations included stakeholders in the process of developing their evaluation questions, and most could explain how with some level of detail. Although representatives from two organizations could not make connections between their stakeholder input and evaluation questions, representatives from eight could articulate the connection between stakeholder priorities or interests and their choice (three at low, three at medium, and two at high levels of detail), and representatives from five organizations offered well-articulated, detailed explanations for how consideration of stakeholder priorities and/or interests connected. One possible reason for the limits of explanations of the connection between stakeholders and organizational evaluation questions may be that the project exercises did not provide in-depth attention to the link between stakeholder input and evaluation implementation but rather focused on connections to evaluation design.

Conclusion

In an effort to promote equity and justice in high-quality evaluations of youth character programs, this study examined the inclusion of diverse stakeholders in the evaluation process, perspectives on program performance, and stakeholder input to the evaluation design. The study builds on evidence showing that a central requirement of high-quality evaluation is the inclusion of various stakeholder perspectives throughout the program evaluation process (Bryson et al., 2011; Crane, 2018; Freeman, 1984; Greene, 1987; Johnson et al., 2009; Jones & Wicks, 1999).

By design, youth character development programs should incorporate youth’s satisfaction, needs, and concerns (Urban, 2008); this study revealed that with so many youth character programs not utilizing youth in their evaluation planning, their key constituency was inadvertently being demoted or even silenced, while funders were largely privileged. While it is critical to consider the funder perspective when designing evaluations, it is also important for the recipients of programs to have a voice. This is particularly true for youth-serving, equity-minded programs. The exclusion of youth voice threatens to further disenfranchise communities or even reproduce oppression, key consequences that the AEA indicates should be avoided through professional evaluation. Similarly, the perspectives of other primary and secondary stakeholders such as families, program staff, community partners and their staff (e.g., classroom teachers for school-based programs), community members, and the wider youth development field have been underrepresented in evaluation. Accordingly, stakeholder representation was a key component to consider, including social, economic, demographic, and community viewpoints excluded in the past as well. The results indicate that including such diverse stakeholders may have implications for the evaluation as well as the program design and funding opportunities.

To support equity and justice, youth character programs and their evaluations must engage diverse stakeholders; the results of the current study show that practitioners and evaluators alike support this aspiration and see its benefit to their work. With some training, input from different stakeholders was linked to youth character program evaluations with the potential to generate reflective findings useful for program growth and development.

The findings indicate that many PPs in PACE did not initially consider the inclusion and utilization of stakeholder perspectives for program betterment or evaluation. They did, however, show increased awareness, interest, and commitment to doing so after PACE. PPs were able to shift their thinking about stakeholders—who should be included in terms of both role and representation, what value their participation brings, and in what part of the evaluation process they should engage—after participation in PACE. Future ECB initiatives should address the consequences of omitting key stakeholders from program evaluations. They should also include attention to linking stakeholder perspectives to the implementation of the evaluation plan.

As was true for PPs, ECBers gained a richer understanding of the connection between diverse stakeholder perspectives and the evaluation process. Although all had varying practices around stakeholder input before PACE, they all still deepened previous and developed new knowledge about the relationship between stakeholder input and high-quality evaluation. Similar to the PPs, the ECBers expanded their ideas of who program stakeholders should and could be, enhanced their knowledge of what diverse voices can add to evaluation, and indicated an interest in adjusting their practice to be more inclusive of diverse stakeholders.

In some cases where respondents did not communicate directly about diverse stakeholders through the interview, the survey results were able to identify their knowledge gains. Thus, the use of mixed methods was critical to learning about the process of enhanced stakeholder participation in evaluation. In addition, while the survey findings showed promising changes, the depth of PACE participant learning was only fully understood when supplemented with interview data. As such, we encourage the use of mixed methods designs for future professional development projects addressing diverse stakeholder involvement. We also encourage triangulation of measures as we did through a survey, interview, and observation to capture the range of knowledge and application of stakeholder inclusion in youth character programs.

In addition, our study found that using a retrospective pretest–posttest design was as beneficial as using a traditional pretest–posttest design. Paired samples t tests show that there was no difference between the pretest taken at baseline and the retrospective pretest question asked after the program. Although the baseline and retrospective pretests were not statistically different from each other, the significant change between the retrospective pretest to the posttest was similar to that from the baseline pretest to posttest. Accordingly, while some users of the retrospective pretest–posttest design have suggested it is best used as a supplement to a traditional pretest–posttest design (e.g., Howard, Ralph et al., 1979), the current study findings also support the perspective of using the former in lieu of the latter (e.g., Allen & Nimon, 2007).

There are limitations of the study that should be noted. While this study focuses on diverse stakeholders, the demographic data of PACE participants were incomplete. In an attempt to reduce bias, the PACE applications did not include demographic information, and when it was sought later, about a third of the participants did not supply it. This limited some of the investigations possible within the current study, which is problematic since diversity is a primary focus. Moreover, those PACE participants who did offer demographic data included many White respondents, with some Black and multiracial respondents, and no data were collected about other diverse experiences. Also, our measure of stakeholder perspectives on program performance only included a 1-item survey measure.

In sum, the PACE project advanced knowledge, skills, and future practices by PPs and evaluators, while encouraging equity and justice in evaluation through the inclusion of diverse stakeholders. With similar efforts, evaluations of youth character development programs can assess programmatic effectiveness by including the voices, needs, and interests of diverse stakeholders. The findings show that PACE participants were empowered to generate high-quality evaluations that incorporate equity and justice. Specifically, by promoting the link between diverse stakeholder input and evaluation, going forward, PACE participants will be better able to develop evaluations that resonate with multiple stakeholders (Archibald et al., 2018). Additionally, by linking stakeholder input to program performance, PACE participants are equipped to determine whether their programs meet constituent and community needs or expectations (Buckley et al., 2015). Such involvement also positions PACE participants to improve the quality of their evaluations by reducing the inaccuracies and insensitivities that prevent the generation of program knowledge and improvement (Bryson et al., 2011). Ultimately, with more high-quality formal program evaluations of youth character programs, the lack of information about their impacts and effectiveness will be reduced (Roth et al., 1998), ensuring that more youth, families, and communities benefit. Moreover, this study aims to inspire evaluators, researchers, and practitioners to question and improve the equity and inclusiveness of their own practices. Doing so will bring us collectively closer to the goals established by the AEA’s guiding principles and, more importantly, produce better, more equitable work.

Footnotes

Acknowledgment

We thank the John Templeton Foundation Grant #60483 for supporting the PACE project implementation and evaluation.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: John Templeton Foundation Grant #60483 supported the PACE Project implementation and evaluation.

ORCID iD

Lisa M. Chauveron

References

Allen

J. M.

Nimon

(2007). Retrospective pre-test: A Practical technique for professional development evaluation. Journal of Industrial Teacher Education, 44(3), 27–42.

American Evaluation Association. (2018, August). American evaluation association guiding principles for evaluators. https://www.eval.org/p/cm/ld/fid=51

Archibald

Sharrock

Buckley

Young

(2018). Every practitioner a “knowledge worker”: Promoting evaluative thinking to enhance learning and adaptive management in international development. New Directions for Evaluation, 158, 73–91. https://doi.org/10.1002/ev.20323

Bryson

J. M.

Patton

M. Q.

Bowman

R. A.

(2011). Working with evaluation stakeholders: A rationale, step-wise approach and toolkit. Evaluation and Program Planning, 34(1), 1–12. https://doi.org/10.1016/j.evalprogplan.2010.07.001

Buckley

Archibald

Hargraves

Trochim

W. M.

(2015). Defining and teaching evaluative thinking: Insights from research on critical thinking. American Journal of Evaluation, 36(3), 375–388. https://doi.org/10.1177/1098214015581706

Chauveron

L. M.

Urban

J. B.

Linver

M. R.

Samtani

Buckley

Hargraves

(2021). Promoting evaluation in youth character development through enhanced evaluation capacity: Empirical findings from the PACE Project. New Directions for Evaluation, 169, 79–95.

Crane

(2018). Revisiting who, when, and why stakeholders matter: Trust and stakeholder connectedness. Business & Society. https://journals-sagepub-com-s.web.bisu.edu.cn/doi/abs/10.1177/0007650318756983?journalCode=basa.

Freeman

R. E.

(1984). Strategic management: A stakeholder approach. Pitman Publishing.

Geldhof

G. J.

Warner

D. A.

Finders

J. K.

Thogmartin

A. A.

Clark

Longway

K. A.

(2018). Revisiting the utility of retrospective pre-post designs: The need for mixed-method pilot data. Evaluation and program planning, 70, 83–89. https://doi.org/10.1016/j.evalprogplan.2018.05.002

10.

Greene

J. C.

(1987). Stakeholder participation in evaluation design: Is it worth the effort? Evaluation and Program Planning, 10(4), 379–394. https://doi.org/10.1016/0149-7189(87)90010-3

11.

Greene

J. C.

Benjamin

Goodyear

(2001). The merits of mixing methods in evaluation. Evaluation, 7(1), 25–44. https://doi.org/10.1177/13563890122209504

12.

Howard

G. S.

(1980). Response-shift bias. Evaluation Review, 4(1), 93–106. https://doi.org/10.1177/0193841x8000400105

13.

Howard

G. S.

Dailey

P. R.

Gulanick

N. A.

(1979). The feasibility of informed pre-tests in attenuating response-shift bias. Applied Psychological Measurement, 3(4), 481–494. https://doi.org/10.1177/014662167900300406

14.

Howard

G. S.

Ralph

K. M.

Gulanick

N. A.

Maxwell

S. E.

Nance

D. W.

Gerber

S. K.

(1979). Internal invalidity in pre-test-post-test self-report evaluations and a re-evaluation of retrospective pretests. Applied Psychological Measurement, 3(1), 1–23. https://doi.org/11.1177/014662167900300101

15.

Johnson

Greenseid

L. O.

Toal

S. A.

King

J. A.

Lawrenz

Volkov

(2009). Research on evaluation use. American Journal of Evaluation, 30(3), 377–410. https://doi.org/10.1177/1098214009341660

16.

Jones

T. M.

Wicks

A. C.

(1999). Convergent stakeholder theory. Academy of Management Review, 24(2), 206–221. https://doi.org/10.5465/AMR.1999.1893929

17.

Labin

S. N.

(2014). Developing common measures in evaluation capacity building: An iterative science and practice process. American Journal of Evaluation, 35(1), 107–115. https://doi.org/10.1177/1098214013499965

18.

Labin

S. N.

Duffy

J. L.

Meyers

D. C.

Wandersman

Lesesne

C. A.

(2012). A research synthesis of the evaluation capacity building literature. American Journal of Evaluation, 33(3), 307–338. https://doi.org/10.1177/1098214011434608

19.

Lamb

Tschillard

(2003). An underutilized design in applied research: The retrospective pretest. Atlanta, GA.

20.

Manthei

R. J.

(1997). The response-shift bias in a counsellor education programme. British Journal of Guidance and Counselling, 25(2), 229–237. https://doi.org/10.1080/03069889708253804

21.

Nelson

A. G.

King

J. A.

Lawrenz

Reich

Bequette

Pattison

Kollmann

E. K.

Illes

Cohn

Iacovelli

Cardiel

C. L. B.

Ostgaard

Goss

Beyer

Causey

Sinkey

Francisco

(2019). Using a complex adaptive systems perspective to illuminate the concept of evaluation capacity building in a network. American Journal of Evaluation, 40(2), 214–230. https://doi.org/10.1177/1098214018773877

22.

Patton

M. Q.

(1997). Utilization-focused evaluation: The new century text (3rd ed.). Sage.

23.

Patton

M. Q.

(2012). Qualitative evaluation and research methods (2nd ed.). Sage Publication.

24.

Pratt

C. C.

McGuigan

W. M.

Katzev

A. R.

(2000). Measuring program outcomes: Using retrospective pre-test methodology. American Journal of Evaluation, 21(3), 341–349. https://doi.org/10.1177/109821400002100305

25.

Rockwell

S. K.

Kohn

(1989). Post-then-pre evaluation. Journal of Extension, 27(2), 19–21.

26.

Rohs

F. R.

(1999). Response shift bias: A problem in evaluating leadership development with self-report pre-test-post-test measures. Journal of Agricultural Education, 40(4), 28–37. https://doi.org/10.5032/jae.1999.04028

27.

Roth

Brooks-Gunn

Murray

Foster

(1998). Promoting healthy adolescents: Synthesis of youth development program evaluations. Journal of Research on Adolescence, 8(4), 423–459. https://doi.org/10.1207/s15327795jra0804_2

28.

Skeff

K. M.

Stratos

G. A.

Bergen

M. R.

(1992). Evaluation of a medical faculty development program: A comparison of traditional pre/post and retrospective pre/post self-assessment ratings. Evaluation & the Health Professions, 15(3), 350–366. https://doi.org/10.1177/016327879201500307

29.

Stockdill

S. H.

Baizerman

Compton

D. W.

(2002). Toward a definition of the ECB process: A conversation with the ECB literature. New Directions for Evaluation, 2002(93), 7–26. https://doi.org/10.1002/ev.39

30.

Suarez-Balcazar

Taylor-Ritzler

(2014). Moving from science to practice in evaluation capacity building. American Journal of Evaluation, 35(1), 95–99. https://doi.org/10.117/109821401349944

31.

Taylor-Ritzler

Suarez-Balcazar

Garcia-Iriarte

Henry

D. B.

Balcazar

F. E.

(2013). Understanding and measuring evaluation capacity: A model and instrument validation study. American Journal of Evaluation, 34(2), 190–206. https://doi.org/10.1177/109821401247142

32.

Trochim

Urban

J. B.

Hargraves

Hebbard

Buckley

Archibald

Johnson

Burgermaster

. (2012). The guide to the Systems Evaluation Protocol. Cornell Digital Print Services.

33.

Urban

J. B.

(2008). Components of youth development programs: The voices of youth-serving policymakers, practitioners, researchers, and adolescents. Applied Developmental Science, 12(3), 128–139. https://doi.org/10.1080/10888690802199400

34.

Urban

J. B.

Hargraves

Trochim

W. M.

(2014). Evolutionary evaluation: Implications for evaluators, researchers, practitioners, funders, and the evidence-based program mandate. Evaluation and Program Planning, 45, 127–139. https://doi.org/10.1016/j.evalprogplan.2014.03.011

35.

Urban

J. B.

Trochim

W. M.

(2009). The role of evaluation in research-practice integration: Working toward the “golden spike.” American Journal of Evaluation, 30(4), 538–553. https://doi.org/10.1177/1098214009348327

36.

Wandersman

(2014). Moving forward with the science and practice of evaluation capacity building (ECB): The why, how, what, and outcomes of ECB. American Journal of Evaluation, 35(1), 87–89. https://doi.org/10.1177/1098214013503895