Evaluator and Program Manager Perceptions of Evaluation Capacity and Evaluation Practice

Abstract

The evaluation community has demonstrated an increased emphasis and interest in evaluation capacity building in recent years. A need currently exists to better understand how to measure evaluation capacity and its potential outcomes. In this study, we distributed an online questionnaire to managers and evaluation points of contact working in grantee programs funded by four large federal public health programs. The goal of the research was to investigate the extent to which assessments of evaluation capacity and evaluation practice are similar or different for individuals representing the same program. The research findings revealed both similarities and differences within matched respondent pairs, indicating that whom one asks to rate evaluation capacity in an organization matters.

Keywords

evaluation capacity evaluation practice evaluation capacity building research on evaluation

In 2014, Preskill challenged evaluation scholars to address “the hard stuff” of evaluation capacity building (ECB). As part of this call to action, she encouraged the community to invest in studying ECB activities especially in light of recent reviews that indicate only half of all ECB activities are evaluated (Labin, Duffy, Meyers, Wandersman, & Lesesne, 2012). Other authors, in particular, Suarez-Balcazar and Taylor-Ritzler (2014), offer similar calls to action and request that the evaluation community provide a better understanding of the conditions under which ECB occurs and that they examine the effects of ECB. At the heart of these calls to action is the science and art of measuring evaluation capacity within organizations, a topic that has been largely unexplored to date in the empirical literature on evaluation.

When measuring organizational evaluation capacity with perhaps the most common data collection method used in the literature—a survey instrument—one nontrivial decision point is the selection of who will complete the instrument on behalf of the organization. To date, scholars vary in their choices regarding instrument administration. For example, in a study examining a model of evaluation capacity and demand in the Dutch Government, Nielsen, Lemire, and Skov (2011) administered their Evaluation Capacity Index to one or more evaluators per municipality. Taylor-Ritzler, Suarez-Balcazar, Garcia-Iriarte, Henry, and Balcazar (2013) requested responses to their evaluation capacity assessment instrument from one individual within each Chicago-area nonprofit organization included in their study. The respondents answering on behalf of each organization varied and included executive directors/administrators, managers, and service workers/clinical staff.

Decisions about the number and type of respondents to administer an organizational evaluation capacity assessment instrument to are based on a variety of factors including but not limited to the study or project purpose, organizational context, and feasibility. But how might these decisions affect the picture we ultimately paint regarding organizational evaluation capacity? In this article, we present findings from analyses of data collected through an online questionnaire about organizational evaluation capacity and evaluation practice to address the question, “To what extent are assessments of evaluation capacity and evaluation practice similar or different for individuals within the same program?” Given the limited literature on measurement in ECB, the analysis presented is primarily exploratory and the findings are intended to provide the evaluation community (both researchers and ECB practitioners) with empirical evidence to guide future measurement endeavors. In brief, our findings suggest that whom one asks to rate evaluation capacity in an organization matters.

Studying the Measurement of Evaluation Capacity

As part of a larger study finalized in 2012 (Fierro) that was designed to clarify the constructs of evaluation capacity and the intended consequences of building capacity, we performed a thorough review of the literature published between 1960 and 2010 on ECB in the United States. Found in the literature are writings related to describing evaluation capacity, descriptions of the approaches used in building evaluation capacity within organizations, and the factors that affect capacity building efforts. There is little in the literature about the potential consequences of building evaluation capacity within organizations.

Working with the extant literature on ECB, the first step in the current study was to identify the constructs of evaluation capacity by reviewing, inventorying, and coding frameworks of evaluation capacity or ECB (which included existing data collection instruments for measuring evaluation capacity). Frameworks included those published by Milstein and Cotton (2000), Preskill and Boyle (2008), and Boyle, Lemaire, and Rist (1999). Data collection instruments inventoried included the Pan Canadian Survey (Cousins et al., 2008), the Readiness for Organizational Learning and Evaluation (Preskill & Torres, 2000), the Evaluation Capacity Assessment Instrument (Taylor-Ritzler, Suarez-Balcazar, Garcia-Iriarte, Henry, & Balcazar, 2013), and the ECB Checklist (Volkov & King, 2007).

Because there was limited information in the extant literature about the consequences of evaluation capacity in organizations, the second step in the research process involved developing a data-driven set of constructs to describe the type of evaluation practice that would emerge in an organization that has effective evaluation capacity. Seven individuals representing a comprehensive sample (Goetz & LeCompte, 1984) of ECB scholars were asked to participate in semistructured telephone interviews. Individuals eligible for inclusion were those who published one or more articles on evaluation capacity or ECB within the past decade that pertained to theoretical or conceptual ideas on these topics in a well-recognized evaluation journal (i.e., American Journal of Evaluation, Canadian Journal of Program Evaluation, Evaluation and Program Planning, and New Directions for Evaluation). Four scholars agreed to participate. Interview recordings were transcribed and manually coded using deductive and inductive coding techniques.

We operationalized the evaluation capacity and evaluation practice constructs derived from the document review and interviews into a draft survey instrument that was tailored to the public health context (see Table 1 for general descriptions of the operationalization of each construct). The questions developed for this survey were newly created; items from existing instruments were not used verbatim. Next, as recommended by Ryan, Gannon-Slater, and Culbertson (2012), hybrid cognitive interviewing procedures were used to detect issues related to the clarity and interpretation of instructions, questions, and response options as well as with the questionnaire layout and ordering of questions (Willis, 2005). The findings from two rounds of cognitive interviews were used to refine the questionnaire. The questionnaire was then administered to managers and evaluation points of contact in four public health programs funded by a large, federal public health agency. For the purposes of this study, an organization was defined as the specific state, territorial, local, or tribal public health programs funded by the federal public health agency.

Table 1.

Description of Evaluation Capacity and Evaluation Practice Constructs as Operationalized.

Evaluation Capacity	Evaluation Practice
Access to information about evaluation—Staff have the ability to access information about best practices in evaluation or findings from evaluations of similar programs. Collective learning opportunities—Structure that is in place, which facilitates the sharing of, and learning about evaluations and their findings—a routinely scheduled meeting during which program and evaluation staff convenes to discuss evaluations conducted to date. Framework—An existing framework that guides the conduct of the program’s evaluation activities. Memory/repository of evaluations—A centralized, electronic repository (e.g., a shared drive) where findings from past evaluations of the program are stored. Opportunities for training on evaluation—Staff have access to professional development about program evaluation internal or external to their organization and have participated. Policies/procedures supportive of evaluation—General procedures or policies within the organization that support evaluation—an expectation that new hires receive an orientation on program evaluation, an expectation that managers and staff (all) are involved in program evaluation, leadership encourages staff to attend professional development in evaluation, and staff have program evaluation training written into their professional development plans. Resources—Means for supporting evaluation are present—monetary funds, people (internal evaluation team/group, 1/2 time internal evaluator, and external evaluator), external consultants, information technology for data collection, qualitative and quantitative data analysis software, existing data about program, and time for staff to engage in evaluation. Supervisor(s) engages in and uses evaluation—Supervisor(s) has actively engaged in program evaluation and used evaluation findings to make decisions. Supervisor(s) supportive of evaluation—Supervisor(s) openly expresses importance of program evaluation and encourages staff to participate in program evaluation and to use findings in their decision-making. Staffs’ collective attitudes toward evaluation—General attitude of staff regarding evaluation—belief that it provides useful information, that it should be an important program component, that it can add value to the program’s work, and that it can help deliver a better program. Staffs’ collective knowledge and skills—Number of individuals within a program that have the ability to actively contribute to performing an evaluation by engaging stakeholders, developing logic models, developing evaluation questions, selecting an appropriate evaluation design, determining what data collection methods to use, designing qualitative and quantitative data collection strategies, analyzing qualitative or quantitative data, designing performance metrics, synthesizing/interpreting findings, and communicating and reporting results.	Conduct—Program evaluation performed by internal or external evaluator during the past year. Share—Internal program staff received interim or final results from evaluation activities conducted on the program 1 or more times during the past year. Internal program staff who were not involved in an evaluation can readily access findings (interim or final) from evaluation activities performed in the past 2 years. Learn—Frequency with which internal program staff learned something new from the program’s evaluation activities performed over the past year. Use—Indications of the extent to which instrumental use of evaluation findings as well as process use occurred among program staff in the past year. Motivation—Reasons for performing evaluation relate to internal program needs for information rather than external requirements or accountability. Diffuse responsibility—Many, if not all, staff are actively engaged in program evaluation. Frequency—How often internal program staff or external contractors worked on program evaluation in the past year. Mainstream/embedded—Evaluation activities are seamlessly integrated into the program activities. Evaluations are ongoing, staff ask questions about whether activities will be evaluated and how, and evaluation is included on agendas of program meetings.

Evaluation Capacity

Evaluation Practice

Access to information about evaluation—Staff have the ability to access information about best practices in evaluation or findings from evaluations of similar programs.

Collective learning opportunities—Structure that is in place, which facilitates the sharing of, and learning about evaluations and their findings—a routinely scheduled meeting during which program and evaluation staff convenes to discuss evaluations conducted to date.

Framework—An existing framework that guides the conduct of the program’s evaluation activities.

Memory/repository of evaluations—A centralized, electronic repository (e.g., a shared drive) where findings from past evaluations of the program are stored.

Opportunities for training on evaluation—Staff have access to professional development about program evaluation internal or external to their organization and have participated.

Policies/procedures supportive of evaluation—General procedures or policies within the organization that support evaluation—an expectation that new hires receive an orientation on program evaluation, an expectation that managers and staff (all) are involved in program evaluation, leadership encourages staff to attend professional development in evaluation, and staff have program evaluation training written into their professional development plans.

Resources—Means for supporting evaluation are present—monetary funds, people (internal evaluation team/group, 1/2 time internal evaluator, and external evaluator), external consultants, information technology for data collection, qualitative and quantitative data analysis software, existing data about program, and time for staff to engage in evaluation.

Supervisor(s) engages in and uses evaluation—Supervisor(s) has actively engaged in program evaluation and used evaluation findings to make decisions.

Supervisor(s) supportive of evaluation—Supervisor(s) openly expresses importance of program evaluation and encourages staff to participate in program evaluation and to use findings in their decision-making.

Staffs’ collective attitudes toward evaluation—General attitude of staff regarding evaluation—belief that it provides useful information, that it should be an important program component, that it can add value to the program’s work, and that it can help deliver a better program.

Staffs’ collective knowledge and skills—Number of individuals within a program that have the ability to actively contribute to performing an evaluation by engaging stakeholders, developing logic models, developing evaluation questions, selecting an appropriate evaluation design, determining what data collection methods to use, designing qualitative and quantitative data collection strategies, analyzing qualitative or quantitative data, designing performance metrics, synthesizing/interpreting findings, and communicating and reporting results.

Conduct—Program evaluation performed by internal or external evaluator during the past year.

Share—Internal program staff received interim or final results from evaluation activities conducted on the program 1 or more times during the past year. Internal program staff who were not involved in an evaluation can readily access findings (interim or final) from evaluation activities performed in the past 2 years.

Learn—Frequency with which internal program staff learned something new from the program’s evaluation activities performed over the past year.

Use—Indications of the extent to which instrumental use of evaluation findings as well as process use occurred among program staff in the past year.

Motivation—Reasons for performing evaluation relate to internal program needs for information rather than external requirements or accountability.

Diffuse responsibility—Many, if not all, staff are actively engaged in program evaluation.

Frequency—How often internal program staff or external contractors worked on program evaluation in the past year.

Mainstream/embedded—Evaluation activities are seamlessly integrated into the program activities. Evaluations are ongoing, staff ask questions about whether activities will be evaluated and how, and evaluation is included on agendas of program meetings.

Participants

Programs from the federal public health agency of interest were eligible for inclusion if they (1) had staff who were active in program evaluation (evidenced by their participation in the American Evaluation Association (AEA) annual conference or the AEA/Centers for Disease Control and Prevention (CDC) Summer Evaluation Institute or their membership in a team, branch, or division focused on program evaluation); (2) funded states or territories to implement a public health program that included a requirement for evaluation; and (3) provided technical assistance or guidance on program evaluation to their funded entities. To identify candidates for inclusion, we searched abstracts in the 2011 AEA conference program for presenters from the federal public health agency. A total of eight programs were identified as high priority based upon the extent to which the abstracts mentioned ECB or related activities.

Four programs from the federal public health agency agreed to participate in the study. These programs provided funds for over 200 grantee programs of specific interest to this study, and they assisted in efforts to obtain contact information for a manager and an evaluation point of contact for each program. Two hundred and eighty-nine questionnaires were distributed to individuals, representing 107 pairs. One hundred and sixty-two individuals completed a survey (56%) on behalf of 119 grantee programs. We received completed questionnaires from 43 respondent pairs (40%)—one individual who provided managerial support and another who served as the evaluation point of contact.

Overall, respondents fairly evenly represented evaluation (n = 65, 40%) and managerial perspectives (n = 83, 51%), with some representing both (n = 14, 9%). As shown in Table 2, the majority worked for government agencies (n = 99, 61%) and were not new to working with the grantee programs of interest—only 10% (n = 16) had done so for less than 1 year. Most had master’s degrees (n = 96, 58%) or 4-year college degrees (n = 28, 17%). These individuals most often had degrees in public health (n = 67, 44%) with a concentration in epidemiology (n = 18, 30%) or social and behavioral sciences (n = 17, 28%). Respondents had also received degrees in fields such as nursing, public administration, business, education, and psychology, among others. Most reported “very good” or “good” levels of knowledge about evaluation (n = 115, 71%), though 86% (n = 140) reported spending 50% or less of their time on these activities during a typical workweek.

Table 2.

Respondent Characteristics.

Respondent Characteristic	Frequency (%)
Role (N = 162)
Program director	56 (35)
Program manager/coordinator	52 (32)
Internal evaluator	42 (26)
Data manager	27 (17)
Other	23 (14)
External evaluator	10 (6)
Length of time worked for/with program (N = 162)
Less than 1 year	16 (10)
1–5 years	68 (42)
More than 5 years	78 (48)
Percentage of time spent on evaluation (workweek; N = 162)
Less than 25%	99 (61)
25–50%	41 (25)
51–75%	9 (6)
More than 75%	13 (8)
Type of organization for which they work (N = 162)
Government agency	145 (90)
Academic institution	7 (4)
Other	6 (4)
Nonprofit organization	3 (2)
Private sector	1 (0.6)
Educational attainment (N = 162)
Some college	2 (1)
2-year college degree	2 (1)
4-year college degree	28 (17)
Master’s degree	96 (58)
Doctoral degree	20 (12)
Professional degree (JD, MD)	10 (6)
Other	4 (4)
Years of experience with evaluation (N = 159)
Less than 1 year	17 (11)
2–5 years	64 (40)
6–10 years	40 (25)
More than 10 years	38 (24)
Self-rated knowledge of evaluation (N = 161)
Very good	40 (25)
Good	75 (47)
Fair	40 (25)
Poor	5 (3)
Very poor	1 (0.6)

Grantee programs for which at least one individual completed a questionnaire represented all but five states in the United States, five territories or Pacific Islands, five American Indian/Alaska Native organizations, and two major metropolitan cities. The number of staff within programs ranged from 0 to more than 21. The mean number of program staff was 6 (SD = 4.4), with between two and five staff members being most common (n = 68, 57%).

Procedures

The four federal programs worked with us to obtain contact information for grantees to extend them an invitation. The manner in which this contact information was provided differed between the four programs: Two federal programs provided access to a complete listing of their grantee evaluation points of contact and program managers, one federal program referred us to a third-party committee that provided a listing of grantee program evaluation points of contact, and the last federal program requested permission from their grantee program managers for a researcher to contact them. In the last two cases, after contact with an evaluation point of contact or program manager was made, we requested participation from the grantees’ respective counterpart.

Invitations to participate in the online questionnaire were disseminated after the federal program provided the respondent list and distributed an advance e-mail to grantees. Each respondent was provided a unique link to the online questionnaire that allowed the researchers to track his or her progress and completion. All respondents were assigned a unique identification code as well as a “pair code”—a variable that allowed the researcher to match individuals from the same program during data analysis. The majority of individuals invited to participate had at least 1.5 weeks to complete the survey. Between one and two reminders were distributed to each potential respondent.

Analytic Procedures

We assigned values to available responses for each questionnaire item based upon the review and synthesis of the document review and interviews with ECB scholars. For questions designed to capture information about evaluation capacity, we assigned higher values to responses that are indicative of greater evaluation capacity. For example, respondents indicating that all of their staff members have access to information about practices associated with conducting program evaluation were assigned a value of 2, whereas those indicating some or none of their staff have access were assigned a value of 1 or 0, respectively. Similarly, we assigned higher values to responses that were in greater alignment with the vision scholars expressed about what evaluation practice might look like in an organization with good evaluation capacity.

In some instances, a value was assigned while taking into account a referent value. For example, in the case of questions relating to instrumental use, the baseline rate with which programmatic activities (e.g., making changes to the program) occurred had to be accounted for to properly interpret how frequently one might anticipate program staff using evaluation findings to inform the programmatic activity. Responses of “don’t know” or “not applicable” were recoded as missing values. Additional details regarding the assignment of scores to items can be found in Fierro (2012).

Instances of agreement or disagreement between members of a pair were identified by subtracting the value associated with the manager’s response for a single questionnaire item from the evaluator’s response on that same item. For the majority of questions, agreement was considered complete agreement with a difference score of 0 on the item. However, some questions included a fairly long ordinal response scale (five or more response options available), where a difference of 1 point on the scale may not represent a meaningful difference in an assessment of evaluation capacity or practice (e.g., a 1-point difference on a 7-point Likert-type scale). Three questions capturing information about evaluation practice had qualitative descriptors for each value on an ordinal scale (i.e., very easy/easy/difficult/very difficult/impossible/don’t know; always/usually/about ½ the time/seldom/never). In these instances, we allowed for a 1-point discrepancy in the paired responses as long as those differences were not opposite to each other—for example, if members of a pair responded “very easy” and “easy,” we considered them to be in agreement, whereas if they responded “easy” and “difficult,” they disagreed.

Findings

Across questions about evaluation capacity and evaluation practice, the mean percentage of pairs in which the evaluation point of contact and manager provided the same response was 57%. The range of pairs in agreement was very broad—with a minimum of 19% of pairs agreeing on the answer and a maximum of 98% of pairs agreeing. Most frequently, the percentage of pairs agreeing on the response to a question ranged between 41% and 70% (n = 48, 65% of questions).

When examined by the type of question—evaluation capacity or evaluation practice—the distribution of agreement between members of pairs differed. On average, more pairs agreed on the response for questions designed to capture information about evaluation capacity (M = 62%, SD = 12). For most evaluation capacity, items between 51% and 80% of pairs agreed on the response to the question (n = 34, 78% of questions). Across the 30 evaluation practice questions included in this analysis, the mean percentage of pairs in agreement was 49% (SD = 19), with a minimum of 19% and a maximum of 98%. The highest frequency of pair agreement for evaluation practice items was fairly evenly distributed between 21% and 70% (n = 25, 83% of the questions).

Evaluation Capacity Concordance/Discordance

Table 3 provides information for the frequency with which pairs agreed on each question related to evaluation capacity and the direction of disagreement when it occurred. Responses to questions under four of the constructs of evaluation capacity exhibited frequent agreement between members of pairs (i.e., higher than the mean value of 62%). Pairs frequently agreed about whether there was a routinely scheduled meeting for discussing evaluations (n = 27, 66%); whether there was an existing framework to guide the conduct of evaluation activities (n = 27, 68%); whether a centralized, electronic repository was available to store findings from past evaluations (n = 29, 76%); and on the collective attitudes of program staff toward evaluation (78–83% of pairs agreeing).

Table 3.

Frequency of Concordance and Discordance—Evaluation Capacity Items.

Construct and Associated Questionnaire Item(s)	Concordance	Discordance Evaluator Less EC	Discordance Evaluator More EC
Construct and Associated Questionnaire Item(s)	Frequency (%)	Frequency (%)	Frequency (%)
Access to information about evaluation
Staff members have access to information about best practices associated with conducting program evaluation. (None/Some/All/DK/NA; N = 39)	16 (41)	13 (33)	10 (26)
Staff members have access to information about findings from program evaluations of activities (e.g., interventions, services, or policies) similar to the ones our program conducts. (None/Some/All/DK/NA; N = 40)	16 (40)	13 (33)	11 (28)
Collective learning
Program currently has a routinely scheduled meeting during which program and evaluation staff convene to discuss evaluations conducted to date. (Yes/No/DK; N = 41)	27 (66)	10 (24)	4 (10)
Framework
An existing framework that guides the conduct of your program’s evaluation activities. (Yes/No/DK; N = 40)	27 (68)	4 (10)	9 (23)
Memory/repository of evaluation
A centralized, electronic repository (e.g., a shared drive), where findings from past evaluations of the program are stored. (Yes/No/DK; N = 38)	29 (76)	6 (16)	3 (8)
Opportunities for training on evaluation
Staff members have access to professional development about program evaluation that is offered by program colleagues within our organization. (None/Some/All/DK/ NA; N = 32)	18 (56)	6 (19)	8 (25)
Staff members have access to professional development about program evaluation that is offered outside of our organization. (None/Some/All/DK/NA; N = 37)	20 (54)	14 (38)	3 (8)
Staff members have participated in professional development about program evaluation in the past 12 months (e.g., preconference training sessions, webinars, internal brown bags specific to evaluation, or similar). (None/Some/All/DK/NA; N = 39)	20 (51)	12 (31)	7 (18)
Policies/procedures supportive of evaluation
Program currently has an expectation that all new hires will receive an orientation to program evaluation. (Yes/No/DK; N = 38)	20 (53)	15 (40)	3 (8)
Program currently has expectation that all program personnel (managers and staff) are involved in program evaluation. (Yes/No/DK; N = 38; Question 5g)	25 (66)	12 (32)	1 (3)
Staff members are encouraged by leadership to attend professional development sessions about program evaluation. (None/Some/All/DK/NA; N = 39)	17 (44)	16 (41)	6 (15)
Program evaluation training is included in staff members’ written professional development plans. (None/Some/ All/DK/NA; N = 32)	18 (56)	6 (19)	8 (25)
Resources
Dedicated funds to support program evaluation. (Yes/No/DK; N = 36)	26 (72)	6 (17)	4 (11)
An internal team or group that is responsible for program evaluation. (Yes/No/DK; N = 42)	30 (71)	8 (19)	4 (10)
At least one internal staff position dedicated to program evaluation a minimum of half-time. (Yes/No/DK; N = 42)	29 (69)	7 (17)	6 (14)
An evaluation contractor who is external to the program. (Yes/No/DK; N = 42)	36 (86)	4 (10)	2 (5)
One or more individuals with whom your program consults with about program evaluation (do not include individuals you counted in Questions 5c or 5d). (Yes/No/DK; N = 41)	21 (51)	10 (24)	10 (24)
Information technology to support the collection of evaluation data. (Yes/No/DK; N = 38)	34 (90)	1 (3)	3 (8)
Which of the following types of data analysis software does your program currently have access to (if any)? (Qual/Quant/Both/Neither/DK; N = 38)	22 (58)	7 (18)	9 (24)
Data about the activities that our program engages in. (None/Some/A lot/DK/NA; N = 41)	27 (66)	9 (22)	5 (12)
Data about the individuals who are directly reached through our program. (None/Some/A lot/DK/NA; N = 41)	25 (61)	10 (24)	6 (15)
Data about populations affected by system-level interventions conducted or affected by our program. (None/Some/A lot/DK/NA; N = 35)	16 (46)	14 (40)	5 (14)
Staff members are provided with time to engage in program evaluation. (None/Some/All/DK/NA; N = 42)	27 (64)	11 (26)	4 (10)
Supervisor(s) engages in and uses evaluation^a
The supervisor(s) is/are actively involved in program evaluation. (Yes/No/DK; None/Some/All/DK; N = 27)	15 (56)	10 (37)	2 (7)
In the past 12 months, the supervisor(s) has/have used data from program evaluation to inform their decision-making process. (Yes/No/DK; None/Some/All/DK; N = 25)	16 (64)	7 (28)	2 (8)
Supervisor(s) is/are supportive of evaluation^a
In the past 12 months, the supervisor(s) has/have openly expressed the importance of program evaluation. (Yes/No/DK; None/Some/All/DK; N = 29)	20 (69)	7 (24)	2 (7)
The supervisor(s) encourage(s) their staff to participate in planning or conducting program evaluation. (Yes/No/DK; None/Some/All/DK; N = 29)	18 (62)	9 (31)	2 (7)
The supervisor(s) encourage(s) staff to use program evaluation findings when making decisions about the program. (Yes/No/DK; None/Some/All/DK; N = 28)	15 (54)	10 (36)	3 (11)
Attitudes toward evaluation (collective)
Please indicate the extent to which internal program staff hold the following beliefs about program evaluation….^b
When conducted well, evaluation can provide useful information for our program. (Not at all/To a very great extent—7-point scale; N = 41)	33 (81)	6 (15)	2 (5)
Evaluation should be a very important component of our program. (Not at all/To a very great extent—7-point scale; N = 41)	32 (78)	5 (12)	4 (10)
Evaluation has the potential to add value to the work we do. (Not at all/To a very great extent—7-point scale; N = 40)	31 (78)	7 (18)	2 (5)
Evaluation has the potential to help us deliver a better program. (Not at all/To a very great extent—7-point scale; N = 41)	34 (83)	5 (12)	2 (5)
Knowledge and skills (collective)
Imagine that your program is about to engage in a program evaluation that will be presented at a professional conference next year. An outside entity is organizing an internal evaluation team comprised of your internal program staff. This outside entity is, however, unfamiliar with the evaluation skill set of your internal program staff. Based upon your understanding of your internal program staff’s skills, please indicate how many of these individuals have the necessary experience to actively contribute to the program evaluation in the following ways. Please include yourself in your responses below (unless you are an external evaluator of this program)
Engage stakeholders in an evaluation (0/1/2–5/>5/DK; N = 40)	27 (68)	7 (18)	6 (15)
Develop a program logic model (0/1/2–5/>5/DK; N = 40)	23 (58)	10 (25)	7 (18)
Develop evaluation questions (0/1/2–5/>5/DK; N = 40)	30 (75)	4 (10)	6 (15)
Select an appropriate evaluation design (e.g., experimental, quasi-experimental, and nonexperimental/observational; 0/1/2–5/>5/DK; N = 39)	21 (54)	8 (21)	10 (26)
Determine what data collection methods to use (0/1/2–5/>5/DK; N = 41)	24 (59)	6 (15)	11 (27)
Design quantitative data collection procedures/ instruments (0/1/2–5/>5/DK; N = 41)	21 (51)	9 (22)	11 (27)
Design qualitative data collection procedures/protocols (0/1/2–5/>5/DK; N = 40)	21 (53)	10 (25)	9 (23)
Analyze qualitative data (0/1/2–5/>5/DK; N = 38)	17 (45)	8 (21)	13 (34)
Analyze quantitative data (0/1/2–5/>5/DK; N = 39)	17 (44)	10 (26)	12 (31)
Design performance measures (0/1/2–5/>5/DK; N = 41)	27 (66)	5 (12)	9 (22)
Synthesize/interpret findings (0/1/2–5/>5/DK; N = 41)	23 (56)	9 (22)	9 (22)
Communicate/report evaluation findings (0/1/2–5/>5/ DK; N = 41)	26 (63)	8 (20)	7 (17)

Note. EC = evaluation capacity; DK = don’t know; NA = not applicable.

^aTwo questions—one that accommodated contexts in which there was one supervisor and one that accommodated contexts with more than one supervisor—were combined for this analysis.

^bAgreement is defined as no difference or a difference of 1 point on a 7-point Likert-type scale.

The percentage of pairs agreeing was low for all questions contributing to two constructs of evaluation capacity—access to information about evaluation and opportunities for training on evaluation. Members of pairs frequently disagreed about the extent to which staff had access to information about best practices associated with conducting program evaluation (n = 23, 59%) and the extent to which they have access to findings from program evaluations of similar activities to the type their program conducts (n = 24, 60%). They also tended to disagree frequently about the extent to which staff members have access to professional development about program evaluation within (n = 14, 44%) and outside of their organization (n = 17, 46%).

For several constructs of evaluation capacity, the frequency with which pairs agreed on the response to specific questions under each construct varied. Pairs frequently agreed for most questions related to the resources their program has to perform evaluation. High levels of disagreement occurred with respect to whether the program consults with external sources on program evaluation (n = 20, 49%), the types of data analysis software the program currently has access to (n = 16, 42%), and whether they have data about the populations affected by system-level interventions performed by the program (n = 19, 54%). Pairs typically disagreed for the majority of questions designed to assess the extent to which the organization has policies and procedures supportive of evaluation, the collective knowledge and skills of the program staff, and the extent to which supervisors are supportive of evaluation and are engaged in and use evaluation.

When disagreements occurred, the evaluation point of contact was more likely to provide a lower rating of evaluation capacity than the program manager. Specifically, evaluation points of contact provided lower ratings of evaluation capacity than program managers for 30 out of the 44 items (68%). Also, evaluation points of contact frequently indicated lower evaluation capacity than the program manager on items related to policies and procedures associated with supporting evaluation capacity, the ability for staff members to access external program evaluation professional development, availability of data about populations affected by system-level interventions related to the program, and supervisor(s) active involvement in program evaluation.

Evaluation Practice Concordance/Discordance

Table 4 provides information for the frequency with which pairs agreed on each question related to evaluation practice. Pairs agreed most frequently (n = 40, 98%) for the question associated with one construct—conduct. This is not surprising since we expected all programs enrolled in the study to be engaged in some form of program evaluation. Low levels of agreement for constructs operationalized into 1 item occurred for the frequency with which staff learn something new from evaluation (n = 6, 21%), the extent to which responsibility for evaluation is diffused throughout the staff (n = 17, 40%), and how frequently internal staff or external consultants perform evaluation for the program (n = 16, 39%).

Table 4.

Frequency of Concordance and Discordance—Evaluation Practice Items.

Construct and Associated Questionnaire Item(s)	Concordance	Discordance Evaluator Less EP	Discordance Evaluator More EP
Construct and Associated Questionnaire Item(s)	Frequency (%)	Frequency (%)	Frequency (%)
Conduct
Internal program staff or external contractors worked on program evaluation? (>Never/Never/DK; N = 41)	40 (98)	1 (2)	0 (0)
Share
Internal program staff received interim or final results from evaluation activities conducted on your program? (Weekly/Monthly/Quarterly/Twice a year/Annually/Never/DK; N = 40)	12 (30)	18 (45)	10 (25)
In general, how easy or difficult do you think it is for internal program staff who do not actively participate in program evaluation to directly obtain information about findings from your program’s evaluation activities that have taken place over the past 2 years (please consider interim or final results)? (Very easy/Easy/Difficult/Very difficult/ Impossible/DK; N = 38)^a	29 (76)	6 (16)	3 (8)
Learn
Internal program staff learned something new from your program’s evaluation activities? (Weekly/Monthly/ Quarterly/Twice a year/Annually/Never/DK; N = 28)	6 (21)	10 (36)	12 (43)
Use
In the past 12 months, has the internal program staff been engaged in planning or conducting program evaluation? (Yes–all/Yes–some/No; N = 43)	29 (67)	7 (16)	7 (16)
After engaging in, planning, or conducting program evaluation over the past 12 months, how (if at all) did the internal program staff’s understanding, abilities, and attitudes change? (Increased/Remained the same/Decreased)
Understanding of program evaluation (N = 37)	19 (51)	12 (32)	6 (16)
Ability to plan program evaluation (N = 37; Question 14bb)	18 (49)	15 (41)	4 (11)
Ability to conduct program evaluation (N = 37; Question 14bc)	19 (51)	14 (38)	4 (11)
Positive attitude toward program evaluation (N = 37; Question 14bd)	20 (54)	10 (27)	7 (19)
In the past 12 months, with what frequency has the internal program staff performed each of the following activities? (Weekly/Monthly/Quarterly/Twice a year/Annually/Never/DK)
Answered questions about the program (N = 35)	14 (40)	12 (34)	9 (26)
Made changes to the program (N = 32)	8 (25)	14 (44)	10 (31)
Applied for new funding (not continuation funds; N = 29)	12 (41)	11(38)	6 (21)
Provided evidence for why this program should receive continued funding (N = 31)	10 (32)	10 (32)	11 (36)
In the past 12 months, with what frequency has the internal program staff used findings (interim or final) specifically from your program’s evaluation activities to inform each of the following activities? (Weekly/Monthly/Quarterly/ Twice a year/Annually/Never/Don’t know/NA—no evaluation results available in past 12 months)
Answered questions about the program (N = 32)	6 (19)	16 (50)	10 (31)
Made changes to the program (N = 29)	7 (24)	13 (45)	9 (31)
Applied for new funding (not continuation funds; N = 27)	11 (41)	11 (41)	5 (19)
Provided evidence for why this program should receive continued funding (N = 29)	12 (41)	10 (35)	7 (24)
In the past 12 months, has your internal program staff used program evaluation findings (interim or final) to inform or support programmatic activities other than those listed in [previous question]? (Yes/No/DK; N = 19)	7 (37)	5 (26)	7 (37)
Diffuse responsibility
Imagine that an individual was hired onto your internal program staff last week. They ask you who among your internal program staff they should speak with about program evaluation. Which of the following best represents the type of response you are most likely to provide? (No one/One person/A few people/Anyone; N = 43)	17 (40)	23 (54)	3 (7)
Frequency
Internal program staff or external contractors worked on program evaluation? (Weekly/Monthly/Quarterly/Twice a year/Annually/Never/DK; N = 41)	16 (39)	14 (34)	11 (27)
Mainstream/embedded
When new programmatic activities are discussed, how often does at least one internal program staff member ask about whether or not the activity will be evaluated? (Always/Usually/About half the time/Seldom/Never; N = 42)^a	16 (38)	17 (40)	9 (21)
In the past 12 months, how often has program evaluation been included on the agenda of your internal program meetings (e.g., staff meetings)? (Always/Usually/About half the time/Seldom/Never; N = 41)^a	25 (61)	8 (20)	8 (20)
In your opinion, which of the following statements best represents how program evaluation is conducted within the context of your program? (Program evaluation is conducted: Continuously/Frequently/Occasionally/Never; N = 42)	23 (55)	10 (24)	9 (21)
Motivation
Which of the following statements best describes the amount of program evaluation your program currently performs relative to what your funders require? Please base your response on your general understanding of the funding requirements. (We do: More than required/ Amount required/Less than required/DK; N = 40)	19 (48)	12 (30)	9 (23)
In general, how influential are each of the following factors currently in your program’s decision to conduct (or fund) program evaluation? (Not/Somewhat/Very/DK)
Be accountable to entities that fund the program (N = 42)	36 (86)	2 (5)	4 (10)
Learn about how to improve the program (N = 41)	31 (76)	8 (20)	2 (5)
Produce information that will be helpful in acquiring additional funding (N = 41)	25 (61)	7 (17)	9 (22)
Respond to health department requests for information about the program (N = 41)	18 (44)	14 (34)	9 (22)
Understand the outcomes/impact of our program for our own knowledge (N = 42)	27 (64)	13 (31)	2 (5)
Understand the outcomes/impact of our program for our funder (N = 42)	29 (69)	5 (12)	8 (19)

Note. EP = evaluation practice; DK = don’t know; NA = not applicable.

^aAgreement is defined as no difference or a difference of 1 point on these ordinal scales when the qualitative response is not in opposition (e.g., easy/difficult, always/usually).

The percentage of pairs in agreement varied on the questions comprising the constructs of mainstream/embedded, motivation, and use. Pairs had fairly high agreement (relative to the mean pair agreement of 49%) for two out of the three questions comprising mainstream/embedded—one relating to the frequency with which program evaluation is included on internal meeting agendas (n = 25, 61%) and another relating to the regularity of evaluation activities in the context of the program (n = 23, 55%). Pairs frequently disagreed, however, with respect to the frequency with which internal program staff ask about whether an aspect of the program will be evaluated (n = 26, 62%).

Pairs frequently agreed on many of the questions that suggest the source of motivation for performing evaluation within the program. Over 49% of pairs agreed on five of the seven questions comprising this construct. Lower levels of agreement were seen when members of a pair were asked about whether their program performs more, less, or the amount of evaluation required by their funders (n = 19, 48%) as well as about the level of influence responding to health department requests about program information has on performing (or funding) evaluation (n = 18, 44%).

Extensive disagreement among pairs was seen across the majority of items comprising the construct of evaluation use. The frequency of pair disagreement was highest for questions contributing to instrumental use—including questions that asked about the frequency with which several common activities that could benefit from program evaluation findings were conducted by the program in the past 12 months (59–75% of pairs disagreeing on response) and the frequency with which evaluation findings were used for these activities in the past 12 months (59–81% of pairs disagreeing on response). Pairs were more likely to agree on items relating to process use.

Similar to what was found with evaluation capacity, when disagreements occurred, the evaluation point of contact was more likely to provide a lower rating of evaluation practice than the program manager. The evaluation point of contact provided a lower rating for 73% (n = 22) of the questions on evaluation practice. Specifically, the evaluation point of contact was more likely than the program manager to indicate that program staff did not increase in their ability to plan or conduct program evaluation after engaging in evaluation activities in the past 12 months, that the responsibility for evaluation was diffused across fewer program staff, and that understanding the outcomes/impact of their program for their own knowledge was not as influential in the program’s decision to conduct or fund evaluation.

Other Patterns in Discordance

One pattern that emerged in the data that deserves mention is a low response rate for pairs on questions associated with some constructs. In the case of evaluation capacity, two constructs related to supervisors (engage in and use evaluation, supportive of evaluation) had low response rates—the number of pairs contributing data to the difference calculation ranged from 25 to 29 (58–67% of pairs from whom we received surveys). The low response rate for these questions primarily stems from natural skip patterns in the survey. In order for a respondent to receive these questions, they first had to note that there was at least one supervisor in the program—there were three instances where the evaluation point of contact and four instances where the program manager indicated there were no supervisors. Additionally, members of four pairs did not agree on a screener question regarding the number of supervisors in the program and therefore received separate questions that could not be compared (one specific to programs with one supervisor, another specific to programs with more than one supervisor).

Low response rates for pairs also occurred for the evaluation practice construct of use, specifically questions related to instrumental use. The primary reason for this low response rate is the high frequency with which the evaluation point of contact for the program indicated that they did not know the response to the question at hand. We posed several questions to indicate whether instrumental use occurred—four questions about the frequency with which several common programmatic activities were performed over the past 12 months (i.e., answer questions about program, made changes to program, applied for new funds, and provided evidence for continuing funds) and four questions about the frequency with which results from program evaluations were used for each of these activities in the past 12 months. The evaluation point of contact frequently responded that they did not know whether the program performed the activities listed—the frequency of “don’t know” responses range from 14% (n = 6) to 22% (n = 9). In contrast, between 2% (n = 1) and 7% (n = 3) of program managers indicated they did not know the response to these same questions. Respondents who indicated that they were external evaluators more frequently responded “don’t know” than those who noted they were internal evaluators.

We saw similar patterns for questions asking about the frequency with which evaluation findings (interim or final) were used to inform these programmatic actions—“don’t know” or “not applicable” responses from evaluation points of contact ranged from 20% (n = 8) to 49% (n = 20). Program managers much less frequently responded that they did not know the response—3% (n = 1) to 12% (n = 5) selected this response for one of the associated questions. Program managers never selected the response “not applicable.” Similar to reports regarding baseline activities, respondents who indicated they were external evaluators more frequently responded “don’t know” regarding instrumental use than those who noted they were internal evaluators.

Does Individual Perspective Matters?

This study provides a glimpse into the potential outcomes of administering a single questionnaire designed to assess evaluation capacity (and potentially related practice) to individuals who hold different roles in an organization. Our findings indicate that who is asked to assess organizational evaluation capacity and practice matters. There were several constructs for which respondent pairs had fairly high levels of agreement—typically with respect to evaluation capacity. However, there was not agreement for all pairs on any 1 item and, in general, when agreement was found, it was with 50–70% of the sample. Furthermore, when pairs did not agree on items, evaluation points of contact typically provided less favorable ratings of evaluation capacity and practice than their managerial counterparts.

There are several potential explanations for our study findings. One explanation relates to the extent to which a given item is directly observable by most individuals within an organization. For example, we saw relatively high levels of agreement for several items related to the construct of resources—funds to support evaluation, presence of individuals who perform evaluations (internal or external), and available information technology to support data collection. Many individuals within an organization, particularly the program manager and evaluation point of contact, are likely able to observe whether each of these items exist. Alternatively, we might anticipate witnessing discordance when the respondents in a pair have different day-to-day experience within an organization. For instance, a program manager may be more likely than an evaluator to know whether instrumental use occurred, particularly whether evaluative findings were used to apply for new funding. Such will almost certainly be the case in instances where evaluators are not directly involved in program planning and implementation efforts.

Another factor that may help explain our findings, particularly the less favorable ratings provided for items by evaluation points of contact compared to program managers, relates to differing expectations of program managers and evaluation points of contact about what constitutes a sufficient level of capacity. For some items, a large majority of discordant pairs followed a pattern such as what was found with the item, “Program currently has expectation that all program personnel (managers and staff) are involved in program evaluation (yes/no/don’t know).” In this case, discordance between the evaluation point of contact and manager may be explained by what each respondent counts as a sufficient “expectation” for involvement—an evaluator, particularly one who employs a highly participatory approach may have higher expectations than a program manager. In addition, these expectations may be the result of differing individual levels of evaluation capacity between the evaluation points of contact and program manager.

Potential Implications for Assessing Evaluation Capacity and Practice

Our findings suggest that methods used to assess organizational evaluation capacity and practice stand to benefit greatly from the triangulation of respondents within an organization and the triangulation of data collection methods. ECB practitioners or researchers might organize dialogues among respondents who have completed an assessment instrument on behalf of the same organization. These exchanges would help elucidate the reasons for discordance and concordance between respondents and as such may highlight more accurately where evaluation capacity and practice is strong and where opportunities for improvements (including at the individual level) exist. In addition, these discussions could serve as an ECB strategy, as participants are likely to learn more about evaluative terminology and what is necessary to support evaluation practice within an organization (i.e., capacity) and arrive at a more representative and accurate vision of what evaluation capacity and practice could look like in their organization.

ECB practice and scholarship may also benefit from employing a mixed methods approach to data collection. Such efforts might include coupling survey administration with direct observations of program activities or document reviews—allowing for the corroboration or expansion of findings. Ideas about obtaining input from a heterogeneous mix of stakeholders and the use of mixed methods are not unfamiliar to evaluators (Alkin, 2012; Shadish, Cook, & Leviton, 1991). However, such principles have not been fully incorporated into how we assess evaluation capacity and the intended consequences within organizations.

Conclusions

Given the regularity with which evaluators now engage in ECB (Fleisher, Christie, & LaVelle, 2008; Fleischer & Christie, 2009; Manning, Bachrach, Tiedemann, McPherson, & Goodman, 2008) and the call to increase evaluations of ECB interventions (Preskill, 2014), it is important for the evaluation community to consider ways to improve how we measure evaluation capacity and the intended outcomes of having this capacity. The directionality of discordance we identified in our study—with evaluation points of contact typically providing less favorable ratings of evaluation capacity and practice than their managerial counterparts when discordance occurred—is consistent with findings from a previous study performed by Cousins et al. (2008). Our collective findings point to a potential for systematically under- or overestimating evaluation capacity depending upon who one asks to provide the assessment for an organization. Such systematic error in estimating reality can lead to a portfolio of research on ECB studies that are not necessarily comparable or provide puzzling patterns that simply result from how each researcher elected to measure capacity and suggest a strong call to action for improving the measurement of organizational evaluation capacity-related constructs.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Alkin

M. C.

(2012). Evaluation roots: A wider perspective of theorists’ views and influences (2nd ed.). Thousand Oaks, CA: Sage.

Boyle

Lemaire

Rist

R. C.

(1999). Introduction: Building evaluation capacity. In Boyle

Lemaire

(Eds.), Building effective evaluation capacity: Lessons from practice (pp. 1–19). New Brunswick, NJ: Transaction.

Cousins

J. B.

Elliott

Amo

Bourgeois

Chouinard

Goh

S. C.

Lahey

(2008). Organizational capacity to do and use evaluation: Results of a pan-Canadian survey of evaluators. Canadian Journal of Program Evaluation, 23, 1–35.

Fierro

L. A.

(2012). Clarifying the Connections: Evaluation Capacity and Intended Outcomes (Doctoral dissertation). Claremont Graduate University, California.

Fleischer

D. N.

Christie

C. A.

(2009). Evaluation use: Results from a survey of U.S. American Evaluation Association members. American Journal of Evaluation, 30(2), 158–175.

Fleisher

D. N.

Christie

C. A.

LaVelle

K. B.

(2008). Perceptions of evaluation capacity building in the United States: A descriptive study of American Evaluation Association members. Canadian Journal of Program Evaluation, 23(3), 37–60.

Goetz

J. P.

LeCompte

M. D.

(1984). Ethnography and qualitative design in educational research. Orlando, FL: Academic Press.

Labin

S. N.

Duffy

J. L.

Meyers

D. C.

Wandersman

Lesesne

C. A.

(2012). A research synthesis of the evaluation capacity building literature. American Journal of Evaluation, 33, 307–338.

Manning

Bachrach

Tiedemann

McPherson

M. E.

Goodman

I. F.

(2008). American Evaluation Association internal scan report to the membership. Fairhaven, MA: Goodman Research Group. Retrieved from www.eval.org

10.

Milstein

Cotton

(2000). Defining concepts for the presidential strand on building evaluation capacity. Retrieved from www.eval.org

11.

Nielsen

S. B.

Lemire

Skov

(2011). Measuring evaluation capacity—Results and implications of a Danish study. American Journal of Evaluation, 32, 324–344.

12.

Preskill

(2014). Now for the hard stuff: Next steps in ECB research and practice. American Journal of Evaluation, 35, 116–119.

13.

Preskill

Boyle

(2008). A multidisciplinary model of evaluation capacity building. American Journal of Evaluation, 29, 443–459.

14.

Preskill

Torres

(2000). Readiness for organizational learning and evaluation (ROLE). In Russ-Eft

Preskill

(Eds.), Evaluation in organizations (2nd ed., pp. 491–504). Boston, MA: Perseus Books.

15.

Ryan

Gannon-Slater

Culbertson

M. J.

(2012). Improving survey methods with cognitive interviews in small- and medium-scale evaluations. American Journal of Evaluation, 33, 414–430.

16.

Shadish

W. R.

Cook

T. D.

Leviton

L. C. L.

(1991). Foundations of program evaluation. Newbury Park, CA: Sage.

17.

Suarez-Balcazar

Taylor-Ritzler

(2014). Moving from science to practice in evaluation capacity building. American Journal of Evaluation, 35, 95–99.

18.

Taylor-Ritzler

Suarez-Balcazar

Garcia-Iriarte

Henry

D. B.

Balcazar

F. E.

(2013). Understanding and measuring evaluation capacity: A model and instrument validation study. American Journal of Evaluation, 34, 190–206.

19.

Volkov

King

J. A.

(2007). A checklist for building organizational evaluation capacity. Retrieved from http://www.wmich.edu/evalctr/checklists/

20.

Willis

G. B.

(2005). Cognitive interviewing: A tool for improving questionnaire design. Thousand Oaks, CA: Sage.