Evaluation Policy in a Nonprofit Foundation

Abstract

This study explores the relationship between evaluation policies and evaluation practice. Through document analysis, interviews, and a multiple case study, the research examined the explicit and implicit policies overarching the evaluation work commissioned by the Robert Wood Johnson Foundation (RWJF) and explored how these policies are implemented in the field. This examination of evaluation policies at RWJF has pointed out some significant strengths, including emphasis on the importance of evaluation; collegiality in defining, formulating, and monitoring evaluations; using a variety of evaluation products to communicate results; and use of evaluation advisory committees to strengthen evaluation approaches. However, these policies have evolved somewhat haphazardly over time. Consequently, some written policies are absent or inadequate and some policies are followed with less consistency than others. The findings point to the importance of a comprehensive and integrated set of evaluation policies grounded in intended outcomes and the need for additional studies on this topic.

Keywords

evaluation policy nonprofit foundations evaluation goals evaluation administration

Nonprofits have made strides to become more systematic in describing the effects of what they do, but the public still has very little reliable information that speaks to the results of the programs and interventions (Flynn & Hodgkinson, 2001; Liket, Rey-Garcia, & Maas, 2014). Specifically, foundations are criticized—largely by trustees and executives—for the lack of useful evaluation results (Behrens & Kelly, 2008). These criticisms are partially magnified because of the role foundations play as brokers of funds to social services providers. Porter and Kramer (1999) argued that to compensate for their position as expensive “middlemen,” foundations should create benefit in ways that extend beyond their purchasing power by improving, among other things, the performance of grant recipients and the state of knowledge and practice.

To accomplish these things, foundations need reliable information about how and why the programs that they fund work to effect change—information gathered through evaluations. Evaluation policies concerning preferred methods, levels of stakeholder involvement, available resources, and information about project management will affect the type and quality of data received (Trochim, 2009). Such policies might be by design or unintentional; they may be explicitly articulated or implicitly understood. Even when invisible, evaluation policies both “enable and constrain the potential contributions evaluations can make” (Mark, Cooksey, & Trochim, 2009, p. 3). They can communicate what an organization values and influence the level of resources devoted to evaluation efforts, theoretical predispositions toward evaluation, and how evaluation findings are used (Trochim, 2009).

Despite these assertions that evaluation policy is a critical issue facing the field, there have been limited empirical investigations of these issues to date. The present study provides an in-depth, field-based exploration of evaluation policies, including their content and implementation. Through a qualitative study that included record analysis, interviews, and case studies, this study explored the evaluation policies of a particular nonprofit organization to examine how they are implemented in real-world settings. The study was guided by the following research questions:

What are the evaluation policies of the Robert Wood Johnson Foundation (RWJF)? How are they developed and communicated to key stakeholders? How are they interpreted?

What is the extent to which evaluations are implemented as described by the Foundation’s policies? What are the barriers to and supports for successful implementation?

Evaluation Policy Context

Evaluation policy is an important matter facing the field of evaluation today (Trochim, 2009). It encompasses all other aspects of evaluation practice—topics of debate among practitioners and scholars including methods, stakeholders, use, dissemination, and so on. Moreover, evaluation policies have direct effects on what social programs are funded and their perceived value (Datta, 2009). Regardless of how intentionally policies are articulated, every organization that engages in evaluation has evaluation policies, and thus all evaluations are in some way influenced by those policies (Trochim, 2009). But, all too often, too little attention is expressly paid to developing the policies that guide evaluation efforts (Mark et al., 2009).

Evaluation policy has been defined as “any rule or principle that a group or organization uses to guide its decisions and actions when doing evaluation” (Trochim, 2009, p. 16). There has been some consideration in the literature of the specific components that might make up an evaluation policy. Mark, Cooksey, and Trochim (2009) noted that methods are the most obvious candidate for elaboration, but evaluation policy can describe myriad evaluation features. The American Evaluation Association (AEA)’s Evaluation Policy Task Force outlined seven areas of evaluation policy: definitions, requirements, methods, human resources, budgets, implementation, and ethics (AEA, 2007). Trochim’s (2009) evaluation policy framework includes eight complementary, yet slightly different elements. Specifically, he noted that evaluation policy can describe (1) desired outcomes; (2) who is involved in evaluations, when, and under what circumstances; (3) the intention of the organization as building capacity to conduct evaluations; (4) the allocation of resources (financial and otherwise) and oversight; (5) varying responsibilities of participants; (6) procedures; (7) how results will be communicated and acted upon; and (8) how organizations might periodically assess the quality of their evaluations.

These delineations provide useful categories for thinking about the different facets of evaluation policy and practice, but they fail to account for how these components might interact with and influence one another. Using a “systems lens” facilitates broader consideration of how evaluation policies might be constructed and leveraged to inform evaluation practice. Meadows (2008) describes a system as an “interconnected set of elements that is coherently organized in a way that achieves something” (p. 11). This definition captures two important ideas about evaluation policy: (1) It notes the interconnectedness of the various elements and (2) it surfaces the idea that these elements work together in the aim of accomplishing something. In short, a systems approach to evaluation policies encourages a consideration of why organizations prefer and prioritize certain evaluation practices. Constructing evaluation policies as a system geared toward reaching a particular outcomes allows for a deeper examination of how evaluation policies influence evaluation practice, and ultimately, how organizations might more deliberately develop evaluation policies.

While the field of evaluation has engaged in considerable speculation about the importance of evaluation policy as it relates to evaluation practice, few studies with empirical evidence exist to support these ideas. Importantly, the AEA (2007) and Trochim (2009) frameworks were both generated from a theoretical rather than empirical standpoint. While some studies have broadly examined the influence of evaluation policies on evaluation practice (e.g., Christie & Fierro, 2012; Summa & Toulemonde, 2002), to date, there has not been an intensive study about how evaluation policies are operationalized in an organization. This research investigated evaluation policy as it is described and enacted in situ. Specifically, this study examined the creation and enactment of evaluation policy at an organization that has made a strong commitment to evaluation and has developed and articulated an evaluation policy. The research provides an empirical compliment to the theoretical work that has been done thus far about evaluation policy.

This study sheds light on the ways in which evaluation policies interact with one another and relate to evaluation activities. Furthermore, the research explores the ways in which evaluation policies contribute to organizational learning and the practice of evaluation more generally. Finally, the article offers recommendations supporting the development of evaluation policies as a system rather than a set of discreet tasks and rules.

Research Method

Study Setting

This study was conducted at the RWJF, which has been in operation for more than 40 years, and is the largest philanthropy dedicated solely to the nation’s health. RWJF has long been committed to evaluation efforts and is considered a leader among its peer organizations (Hall, 2004). The Foundation dedicates approximately 20% of its granted dollars to research and evaluation efforts. The Foundation’s tasks are divided into departments, some of which primarily function along program lines (e.g., research and evaluation and communications) and others that provide a broader base of general support (e.g., human resources and information technology). The primary unit of focus for the current study is the Research and Evaluation (R&E) department.¹

RWJF supported this study partially due to its alignment with The Foundation’s organizational philosophy and orientation toward learning. The Foundation and the Research Evaluation, and Learning leaders in particular, have an interest in how results from this study might inform and shape future evaluation practice. Furthermore, RWJF has a long-standing tradition of sharing and learning from evaluation results regardless of whether or not they reflect positively on the organization’s funded programs. The present study was no exception. The leadership of RWJF offered valuable feedback on the study design and assisted by providing all requested documentation and making introductions and scheduling interviews with study participants.

Phase I: Policy Identification Through Records Analysis and Interviews

To understand the RWJF’s evaluation policies, the investigation draws upon a wide array of internal and public records, including reports, meeting minutes, anthologies or retrospective reports, framing documents, videos, and interviews. Other sources of evidence include publications in academic and practitioner journals, press coverage, and external studies about Foundation practices. Thirteen of the sixteen staff members in R&E participated in individual semi-structured interviews to illuminate informally held evaluation policies. Five Foundation leadership members, including the chief executive officer, chief of staff, and the associate vice presidents, responsible for the two major strategic initiatives of the Foundation also participated in the interviews. Participants had been at the Foundation from just below 1 year to nearly 17 years. Interviews ranged from 30 to 60 min. A second round of interviews revealed recurring patterns of themes, thus meeting the criteria of adequacy (Fossey, Harvey, McDermott, & Davidson, 2002).

Phase II: Policy Implementation in Three Case Studies

To explore the field implementation of evaluation policies, three RWJF-funded evaluations were selected for close examination.² The Foundation staff assisted the researchers in the purposeful sampling (Patton, 2002) of these cases to represent a range of complexity and perceived success. Researchers and Foundation staff selected these cases based on consensus concerning strengths and challenges and the availability of information. The researchers explored each case through a combination of document analysis and interviews (Yin, 2009). The Foundation maintains an internal program information and management system, which tracks funded programs and the records produced under its auspices. Documents analyzed in the case studies—including requests for proposals (RFPs), responses to those requests, contracts or agreements, interim and final reports, and published products—were primarily drawn from this system. Key players in each evaluation participated in interviews, including Foundation program management staff, R&E staff responsible for overseeing the evaluations, and external evaluators contracted by the Foundation. Some Foundation leaders and program operators also participated. With few exceptions, case study interviews were 60 min in length and were conducted via telephone. Table 1 presents an overall summary of the cases. Each case is also described in detail.

Table 1.

Comparison of Case Study Elements.

Case	Smooth	Challenging	Complex
Program description	School-based program intended to affect student-level outcomes through inclusive play	Child-care program designed to reduce risk factors promoting drug use through strengthening communities	Building community-wide multistakeholder partnerships to improve patient health and health care
Program start date	1996	1992	2006
Evaluation time period	2004–present	2001–2005	2006–present
Percentage of program funds spent on evaluation	6	32	12
Evaluator selection method	Implementation: sole-sourced impact: invited bid	Invited bid, vetted through evaluation advisory committee	Sole sourced
Implementation evaluation	Compare program implementation across school sites	Document model and implementation processes to inform scale-up	Implementation and impact evaluations are concurrent and ongoing and examine extent to which communities report quality and effects that has on consumer decision-making
Impact evaluation	Randomized control trial	Quasi-experimental design
Findings	Program resulted in positive changes for students and schools	Inconclusive, could not demonstrate clear effects	In process

Case 1: Smooth case

Fun and Games is a school-based, inclusive play sports program designed to increase student engagement and improve school climate. The program received several million dollars from the Foundation to replicate in four cities over 3 years. At the time, Fun and Games had little experience with evaluation, and the program leadership was skeptical of its value. Early on, the Foundation did not attach an evaluation to the project, and the program relied heavily on satisfaction surveys and anecdotal evidence. After the initial 3-year period, the Foundation decided to expand the program to 27 cities through a considerable grant, and it was at this point that it became a significant enough investment to warrant an evaluation.

The Foundation initially conducted a 1-year, mixed-methods implementation study to assess and compare procedures at 12 schools. Once interim findings were reported, the Foundation commissioned a randomized controlled trial (RCT) designed to measure program effectiveness. The R&E program officer worked closely with the implementation evaluator to develop the RFP, which included a requirement that selected evaluators partner with the research center involved in the implementation evaluation. The implementation evaluator helped to determine which firm would conduct the impact study.

The RCT impact assessment included teacher and student surveys, physical activity trackers, and school records. The implementation evaluator submitted a list of anticipated deliverables for the project, including policy briefs and final reports to the Foundation, study participants, and publicly available data repositories. The project is ongoing and has continued to expand. Information from evaluations has been shared via the web, and at the time of data collection, several journal articles were being prepared for publication.

Case 2: Challenging case

The Drug Free Families program sought to prevent substance abuse by strengthening at-risk communities and reducing risk factors through early childhood centers. The program began as a pilot in five geographically diverse sites. Families received case management, peer mentoring, and counseling. Community efforts included advocacy, forums, and community action projects; neighborhood revitalization; and agreements with substance abuse organizations to treat referrals from the program. A pilot evaluation documented the development of the model and the implementation process, with an eye toward broader implementation. Interview participants indicated that the program was struggling to define its activities and develop a concrete model, and in hindsight, the program was not ready for a large-scale roll out or an impact study. Nevertheless, despite some reservations, the Foundation proceeded with the demonstration phase and attendant evaluation.

An evaluation advisory panel oversaw both the program and evaluation designs. The program director, who had a strong research background, also contributed to the evaluation design. The advisory panel members believed that having a more uniform intervention in place across sites would allow for a more rigorous investigation into the program’s impacts. The evaluator from the implementation evaluation offered recommendations to inform the demonstration program and evaluation. Program implementation varied widely from site to site. Moreover, the national political context surrounding the child-care centers changed drastically (including a shift in focus and a significant decrease in funding), and site staff lacked experience with, and capacity for, developing large-scale programs.

Fourteen of the fifteen demonstration sites participated in the impact evaluation. They were matched to comparison sites to facilitate a quasi-experimental design. Unfortunately, program activities were not well aligned to the stated goals and proved difficult to measure. The ambitious scope of the program and the multiple and complex contextual factors that influenced implementation and the potential for impact further complicated matters. Program personnel were concerned that the evaluation model was too “academic,” and that it failed to capture the program’s “real-world impacts.” Evaluators (including the advisory panel) felt that the research design was rigorous and appropriate, but that the inconsistencies across implementation “made it difficult for the evaluators to evaluate the conceptual model uniformly.”

Even though implementation varied widely, most sites were able to execute various strategies with some degree of success, and the program contributed to establishing community partnerships between target organizations and participating child-care centers. On measures of impact, however, there was little or no evidence that treatment sites were more able than comparison sites to bring about changes to families. The program closed after the demonstration phase.

Case 3: Complex case

The Holistic Health program is one of the Foundation’s largest programs and evaluations currently in operation. It “builds multistakeholder partnerships—among insurers, providers, purchasers, and consumers—for the purpose of improving patient health and health care.” The program was initially in 4 markets and is now in 16. The Foundation scaled up after less than 1 year and asked communities to submit proposals for the expansion. The evaluator explained, “[The Foundation] basically said, ‘we are going to scale this up’ before there was ever an opportunity to even learn from the pilots.” Implementation sites and program operators had varying prior experience with evaluation.

Unlike in the other two cases, one evaluation team was commissioned for the duration of the program. Implementation and impact studies were integrated, and the evaluation continually evolved to match the changing and expanding program. The primary evaluator was a university faculty member who was directly identified by the Foundation before any of the individual sites were funded, and ideas for how the program would operate were still nascent. Because the program was so large, the evaluator assembled a team of experts; as many as 40 people have worked in some capacity on the evaluation. Program and evaluation activities are ongoing. The evaluation team has published multiple issue briefs and approximately 60 peer-reviewed articles.

Analytic Procedures

Data analysis occurred in iterative phases during and after data collection. Two existing frameworks informed the coding scheme: the AEA’s definition of evaluation policy and Trochim’s (2009) evaluation policy wheel (the primary components of both are described above). These frameworks identified and defined the key concepts of evaluation policy. The coding scheme was organized according to steps and components commonly found in typical evaluations, from contracting and planning to data collection and analysis, and reporting findings. Four raters assisted in piloting the framework through a preliminary examination of the Foundation’s policy records. The raters were assigned different documents to code. At in-person meetings over a 6-week period, the framework was refined and finalized by deleting, adding, and/or more specifically defining individual codes (Dillman, 2014).

Analysis of records and interviews began with attribute coding (Saldaña, 2009) to note basic descriptive information. The process began with a predetermined “start list” of codes (Miles & Huberman, 1994), which were “revised, modified, deleted, or expanded to include new codes” (Saldaña, 2009, p. 144) as needed. Using Dedoose Version 6.1.18 software, information about record attributes and selected and tagged excerpts with provisional codes were indicated. Next, elaborative coding was employed to analyze the data contextually and further develop the theory. This coding approach is appropriate for qualitative studies that build on previous investigations to “support, strengthen, modify, or disconfirm the findings” (Saldaña, 2009, p. 229).

The theoretical propositions that guided the research questions framed the case study analysis (Yin, 2009). The first step was descriptive to present a comprehensive account of the relevant features in these cases. Theoretical analysis followed with application of the general coding categories to excerpts from case study documents and interviews. This allowed for a cross-case synthesis according to different evaluation policy domains and facilitated a robust discussion of the implementation of various Foundation evaluation policies.

Findings: The Evaluation Policies of the RWJF

The first set of findings describes the policies that the Foundation has articulated. The RWJF uses instructional videos and documents to communicate its evaluation policies to its employees, evaluators, grantees, and the general public. These sources include explicit policy information, but have various primary foci, including operations, reporting, and stakeholders. It is also common for the Foundation to articulate its evaluation policies more informally. Thus, the findings in this section are drawn from formal records (videos and documents) and also from interviews with Foundation staff.

Evaluation Goals and Assumptions

Formal records and discussions reveal that of primary importance to the Foundation is the use of its evaluation findings whether in practice or policy. Foundation representatives described the importance of assessing the strengths and weaknesses of the Foundation’s strategies, understanding the impact of its work, spreading effective approaches, ensuring its own credibility, and promoting social change. The R&E staff assume evaluation is a challenging activity that encounters many obstacles as the work unfolds, including the political nature of all evaluation activities. The Foundation also assumes that stakeholder involvement (particularly in the development of evaluation questions) leads to more meaningful results that will be implemented with less resistance.

Foundation discussion of evaluation policy went far beyond these goals and assumptions, however, to address more practical aspects of the work, including decisions about what/when to evaluate, selection of evaluators, design and execution of the evaluation, administration, stakeholders, and reporting of findings. These areas are discussed in turn.

Deciding What and When to Evaluate

The Foundation is strategic about what types of evaluations should be scheduled and when. Implementation evaluations are desirable in early stages of program development. Once a program is mature, the Foundation undertakes outcome evaluations. When multiple mature programs with similar objectives have been operating for a while, the Foundation may conduct a cross-initiative evaluation to enhance understanding of particular ideas or approaches to certain problems.

Earlier in the Foundation’s history, staff considered evaluation as separate from program activities, and evaluations were funded to see if a program “worked” as activities were drawing to a close. A program officer explained that this model “had a certain elegance to it because you get a lot of benefit from hindsight.” By the time results from those evaluations came in, however, it was often too late for them to be actionable, and the Foundation’s priorities and objectives might have changed in the intervening time. More recently, evaluation activities begin in concert with the start of program operations, and evaluators act more as collaborators with the program operators. As a program officer shared, “Once [evaluators] discover something, shouldn’t they share that with the program that we’re trying to make better?” Even though Foundation staff agreed this is a positive change, there are some challenges related to the fact that programs frequently evolve in the early part of their existence.

Evaluations are sometimes tied to program budgets (e.g., any program with an operating budget over US$400,000 would receive an evaluation) or to the connection between the program and the strategic objectives of the Foundation. The Foundation makes every effort to inform programs in careful detail about the expectation that they must fully participate in research and evaluation activities in order to secure Foundation funding.

Selecting and Hiring Evaluators

The Foundation’s written policies address logistical aspects of selecting evaluators, including how RFPs should be developed and the criteria upon which they will be evaluated. The policies also describe the importance of realistic timetables and other project management issues. The Foundation currently dedicates 20% of grant dollars to research and evaluation. While not an explicit policy, this figure is quoted in the framing document that describes the Foundation’s research and evaluation efforts. According to interviews with Foundation staff, budgets for evaluation are determined on a team-by-team basis whether through a top-down or bottom-up approach.

The Foundation assigns one of its R&E program officers to each project being evaluated, and this individual selects an independent evaluator to enhance external credibility and avoid conflicts of interest. R&E program officers identify evaluators either through an invited bid process or by direct contact. There is no clear written guidance about when to use each method; however, there seems to be an implicit preference for the invited bid process. Beyond having a track record for balancing on-time completion of deliverables, there was little discussion of evaluator characteristics in the records. Foundation leadership expressed a concern that the currently available pool lacks diversity in terms of ethnicity, socioeconomic status, and so on. One individual noted it is important for evaluators to match the population they are evaluating; relatedly, with a limited pool, ideas and evaluation approaches are also less diverse. At the time of this study, the Foundation was undergoing the process of identifying and vetting firms for inclusion in an evaluator database, envisioned as a resource for Foundation R&E officers to consult when inviting evaluators to bid on projects to partially address this concern.

RWJF evaluations are collaborative between evaluators, Foundation program officers, and Foundation R&E officers. Several program officers noted this provides richer data and better communication, which results in greater program staff buy in and support of evaluation findings. Previously, there had been a “firewall” between the evaluators and the program. This led to several challenges, however, and so the current collaborative model was introduced.

Designing and Conducting the Evaluation

Codes pertaining to design and conduct of evaluations were the most represented in the overall framework. The specific categories nested within this idea provide a chronological walk through the process of conducting an evaluation—from understanding the program to design, data collection, and analysis.

The Foundation places great value in logic models—visual depictions of program inputs, activities, outcomes (short-, medium-, and long term), and the logical connections between those elements. They believe stakeholders should be involved in logic model development to increase clarity about the program and reach a common understanding to inform the evaluation questions: “Good questions, when they are thoughtful and well informed given the range of perspectives that went into developing them, are more likely to yield findings that are useful, relevant, and credible.” R&E program officers generally initiate the evaluation design process, often developing research questions before identifying an evaluator to do the work.

The Foundation views the randomized control trial (or RCT) as the most credible type of outcome study, although it recognizes that contextual factors may prevent appropriate use of this design. However, in all cases, the Foundation maintains that the approach should be rigorous and geared toward reducing bias. Some interview participants indicated specific methodological preferences based on their training and noted that intended audience for the evaluation findings might also influence design choices and methods. Not surprisingly, the Foundation prefers claims that are well supported by evidence, conclusions that fit the analysis, and descriptions that include enough information such that readers can draw inferences. In short, the Foundation promotes evaluations with findings that are constructive, impartial, useful, relevant, and credible.

Administration

The Foundation expects that proposals will include “detailed information regarding how the project will be organized [and] which employee is responsible for assuring adherence to project schedules, monitoring expenditures, and addressing delays.” Evaluators work with Foundation staff to determine research questions and design but are ultimately responsible for setting the mission and vision of the project. Evaluators are expected to be able to “see both the forest and the trees.”

The Foundation encourages program personnel to play an active part in the rollout of evaluation activities. Programs are expected to provide evaluators with data to support the evaluation and, in some cases, to assist with primary data collection. Program officers do not interfere with evaluation activities once the work is underway, and particularly as conclusions are being reached and findings are being shared. The Foundation assumes that greater frequency and higher quality communication between program personnel and evaluators create a smoother and more effective process. RWJF wants to promote reciprocal relationships between the evaluator and the program characterized by trust and mutual respect through an equitable balance of power, timely feedback about the program, formalized work plans and communication protocols, and clear procedures for resolving misunderstandings.

Stakeholders

The Foundation emphasizes giving stakeholders decision-making power in evaluations to promote their eventual use of evaluation findings. Stakeholders might be program managers or developers, or those who are in some other way responsible for the initiative’s success. The Foundation believes that the process of selecting stakeholders to participate in the evaluation should be attentive to the attitudes, beliefs, and knowledge of individuals. Stakeholders ideally provide buy in and support for evaluation activities and have great interest in the issues being examined. The Foundation values diverse perspectives and cultural, religious, ethnic, and geographical backgrounds. It recognizes that those with more direct experience are better informants about program operations, and their involvement can strengthen and solidify a common program understanding among all relevant groups.

Reporting Findings

The RWJF engages in several types of reporting at the program level and annually across programs. Its policies indicate to whom and how particular aspects of the evaluation should be shared. Of primary importance is “providing high-quality, objective information.” Results should document successes and challenges, measure the magnitude of these effects, and assess what the effects mean for the organization and other parties. Audiences are internal (program officers, management, board of trustees, and key program stakeholders) and external (public and private decision-makers, academics, researchers working in similar areas, the general public, and stakeholders in other programs).

The Foundation stipulates that reports should be formatted in an understandable way, including an executive summary, a brief and clear body, and an appendix that includes the more technical details. The Foundation also seeks to share findings in peer-reviewed journals and through the Foundation website. For larger Foundation objectives, reporting should speak to the health of program development, services to grantees, and the Foundation’s impact on its areas of interest. For program-level reporting, the Foundation provides a specific and detailed outline of an ideal report. At the conclusion of each program and evaluation, a “Program Results Report,” compiled by Foundation officers not directly involved in the particular program or its evaluation, describes what happened in the program and what was found out about it. These reports are shared publicly on the Foundation website.

Overall Evaluation Policies

These findings illuminate specific policy preferences held by the Foundation, but the degree to which they are articulated as such varies across categories. Often, when more consistently articulated policies emerge, there is an explanation underlying why particular choices are preferred. For example, a high degree of stakeholder involvement is preferred because of the assumed connection between involving stakeholders and the ultimate use of evaluation findings. The Foundation prefers more rigorous evaluation designs because of a preference for publishing findings in academic journals. Articulation seems to be strengthened when strong reasoning is provided along with the policies, suggesting that individual explicit articulation of policies might not be as important as consideration of policies as an integrated system grounded in the intended outcomes.

Findings: The Implementation of Evaluation Policy

The case studies have a high degree of consistency in implementation of evaluation policy goal statements. There is, however, less alignment concerning operational policies, which are implemented on a case-by-case basis and are largely informal and unwritten. Evaluation policies are implemented differently across the three cases. As a program officer summarized, “I think the policy is to try to use the best design for the situation.”

That programs interpret policies differently points to an alternate understanding of the utility of evaluation policies. Rather than a specific articulation of rules and guidelines, the Foundation might consider evaluation policies as a system based on guiding principles. Such a system might enhance the utility of evaluation policies without making them too restrictive.

Implementation of Evaluation Policy Goals

The evaluation policy goals reflected in the case studies reveal the Foundation’s emphasis on use of findings. For example, Holistic Health represented a substantial investment of Foundation assets, resulting in significant efforts to use evaluation findings to inform the field. Even though the pilot and implementation study phase of program was truncated, there were opportunities to use evaluation findings to support program improvements. The evaluator explained, “[The program has] made programmatic changes … I think our work has contributed to their thinking and the program changed as a result.”

Likewise, findings from the initial implementation study for Fun and Games, the smooth case, were used in a variety of ways to inform both specific improvements in program delivery and the strategy used to scale up the program. Part of the reason that this model was so successful was that program leadership was receptive to the information. The evaluator shared: “[The program] didn’t want to hear that everything is great and they’re doing an awesome job, but they really wanted to hear what was working.” The program was also able to use evaluation findings to aid its marketing, funding, and expansion efforts. The evaluator explained—and program personnel confirmed—“There is a lot of collateral impact from that evaluation.”

Drug Free Families proved to be a more challenging example. The context surrounding the program suggested there was an opportunity to directly influence federal policy and appropriations through demonstration of an effective program, and the desire to capitalize on this moment strongly influenced the design of the evaluation. Ultimately, the program director felt the evaluator’s interests in using the findings to influence high-level policy decisions were at odds with the Foundation’s more practical interests. The ability to capture more nuanced findings that suggested positive program impact was sacrificed. The findings could not support the larger political aim, and the program director’s interests for what the evaluation could accomplish limited opportunities for the program and Foundation to learn from the evaluation.

If we were, instead, to consider evaluation policies as a system, the evaluation policy goals would anchor the development of the remaining guiding principles for evaluation practice. The RWJF is a model in this area—the Foundations’ commitment to what they hope to achieve through evaluation is consistently understood and implemented by staff. The Foundation could use these widely understood goals as the basis for developing an evaluation policy system, consisting of interrelated components that detail various aspects of evaluation practice. The evaluation policy system would then describe how the RWJF’s preferred evaluation practices are intended to work together to achieve the stated goals.

Implementation of Policies on Deciding What and When to Evaluate in Practice

As noted earlier, it is now Foundation policy to begin evaluations with the start of program activities. This worked well for Fun and Games. An initial implementation study was commissioned to inform the program as it moved into the demonstration phase; as this occurred, an impact study was commissioned, informed by the earlier work. The Foundation required that the implementation evaluators be included as partners in the impact study to ensure a smooth transition across the two efforts. In the more complex case of Holistic Health, the program scaled up much faster than originally planned, which meant the evaluation had to be modified significantly to account for programmatic changes. Although ultimately everyone was very satisfied with the process, it was not as smooth as in the Fun and Games case.

The evaluation of the Drug Free Families program occurred prior to the modification in the policy, and thus, the evaluation was commissioned after program activities were underway. The Foundation first funded an implementation study when the program was in its pilot phase at five sites. An impact study followed, once the program expanded to 15 sites during the demonstration phase. The implementation evaluation results had very little bearing on the program scaling efforts or the impact evaluation that followed. The order of implementation study followed by impact study did technically follow the Foundation’s policy, but it did not play out successfully in the field.

This examination of the implementation of policies related to deciding what and when to evaluate affords insight into what happens as evaluation policies change. The case studies show the evolution of the timing of evaluations related to the launch of a program—a decision that was again rooted firmly in an underlying philosophy. This philosophy—that evaluations are better suited to facilitate program improvements when they start as programs are initially implemented—serves as a guiding principle for the decisions about evaluation that follow.

Implementation of Policies on Funding and Selecting Evaluators

Foundation research and evaluation framing documents indicate that 20% of overall spending be devoted to research and evaluation efforts (although the division of monies between these two activities is not specified). In the smooth case, Fun and Games, evaluation costs represented 6% of the overall budget. In the challenging case, Drug Free Families, 32% of total costs went to evaluation, and in the large and complex case, Holistic Health, evaluation costs made up 12% of the overall budget. These percentages do not tell the complete story, however, as they represent portions of program budgets that vary considerably. Considering the amount spend in actual dollars across the three evaluations reveals that nearly 4½ times more resources were expended on the complex case than the challenging case and nearly 13 times more than on the smooth case.

Fun and Games, the smooth case, had two separate evaluators. The evaluator for the implementation study was sole sourced by the R&E program officer based on expertise and experience. The impact study was contracted through an invited bid, and RFP development was led by the R&E program officer in collaboration with program personnel and the evaluator from the implementation study. The selected evaluators were required to work with the research center that conducted the implementation study. A team member explained, “Because we were so familiar with the program already, [the Foundation] wanted to keep us in there.” According to the R&E officer, this involvement meant the evaluator “would be comfortable immediately so you wouldn’t have all kinds of problems.” Once the evaluator was selected, the Foundation and the evaluator agreed on a “précis” that described the scope of work, evaluation activities, project deliverables, and budget.

The more complex Holistic Health evaluation was sole sourced. When the Foundation was doing some advance work to introduce the program, the evaluator was informally consulted on program design and, at one point in this process, was asked to write an evaluation proposal. Over the life of the program and the evaluation, there were three separate contracts. Each was followed with a précis following the same basic format described above.

For Drug Free Families, the evaluation advisory panel oversaw the selection of the evaluator. They interviewed two teams, and the process was described as both formal and participatory, involving the advisory committee and the program director. According to one Foundation leader, “Everything about this program was elaborate … It was a big investment and it was something that everybody cared about.” Of the three cases, this was the most rigorous selection process with the greatest number of proposals (10–12) solicited. This case most closely matched the Foundation’s stated preferred method for contracting an evaluator. The evaluation also ended up being the most problematic of the three investigated in this study, however, in part because—as the program director explained—the evaluator was not the right fit. Furthermore, among the three programs, the greatest proportion of resources was dedicated to the evaluation, while the evaluation with the smallest relative budget was the smooth case. Although it is not possible to generalize based on only three instances, this does suggest that, in some cases, evaluation funding level might not be an important determinant of evaluation quality.

Selecting evaluators, however, is an area that might benefit from a higher level and intentional systems approach to developing evaluation policies. Enforcing specific rules that dictate the exact process for selecting evaluators would prove too cumbersome. Rather, The Foundation could focus its energy on making decisions about the conditions in which they desire particular approaches to selecting evaluators as well as the ultimate goals for the particular evaluation. This could bring more consistency to the selection of evaluators and potentially allow for more diversity in the pool of people and organizations that conduct evaluations.

Implementation of Policies on the Designing and Conducting Evaluations

The design of the evaluation for Drug Free Families was challenging from the outset, in part because the ultimate desired outcomes (less substance abuse) could not be immediately ascertained. The program’s interventions were directed at very young children as a means of preventing substance abuse later in their adolescent and adult years. As one member of the Foundation leadership put it, “That’s sort of an evaluator’s nightmare: The ultimate effects are in flux.” Drawing on input from the program office and the advisory panel, the evaluation instead sought to measure intervening indicators at the family and community levels.

The most rigorous aspect of the Drug Free Families evaluation was a quasi-experimental impact study that used matched communities to measure process and impact. According to the evaluator, “We couldn’t do random assignment, but we wanted to have the strongest design in order to be able to have fairly strong conclusions at the end.” Foundation program and R&E officers, the chair of the evaluation advisory panel, other panel members, the evaluator from the implementation study, and the impact evaluators were all involved in the design development, which was described as a “negotiation.” The evaluator explained, “there was a lot of evolution of it before the design was finalized.” Ultimately, even though the results were disappointing, everyone except the program director considered the evaluation design to be strong. (The program director felt that the design was not sensitive enough to capture change.)

Fun and Games was initially evaluated through an implementation study described as “pretty open ended” and “qualitative.” The 1-year study was designed collaboratively among evaluators and program staff. The evaluator explained, “If your work is going to be useful, you have to find out from the partners what will be helpful to them.” The results informed the design of the impact study, which maintained a focus on program processes and added aspects of outcomes. The impact study design was finalized after the evaluator was selected. The Foundation, implementation evaluator, and program personnel were all involved in this process. A Foundation officer explained, “I think that’s a great lesson learned for how to think about structuring evaluations …. It’s not an evaluation designing the program; it’s an evaluation working alongside of the program strategy being developed.” According to the program director, “the goals were to document the outcomes using a randomized design so that we could communicate to the world about our impact.” Structures were in place to support the RCT design and to allow schools to be matched to one another prior to randomization. The design was further influenced by the desire to publish results in a national database. To qualify for the database, research needed to take place in 32 schools (vs. the original 20). After a presentation to Foundation staff, the R&E program officer was authorized to increase the evaluation budget to include the larger number of schools.

The evaluation of the Holistic Health program utilized a quasi-experimental design (often employed when randomization is not possible or appropriate). A specific challenge in this case was that the initiative was still evolving. The evaluator explained:

It’s trying to make academic and scientific sense of the messy real-world realistic evaluation …. This isn’t an experiment. Nothing is randomized here. So we’re trying to … [do] it in a way that is scientifically credible and will meet the merits of peer review.

The evaluation was designed collaboratively by Foundation staff, the program office, and evaluators but was also informed by the evaluator’s own goals: “The goal was to design a study that would have scientific merit and allow us to publish.” The early introduction of the evaluator facilitated participation in the early discussions about what the program was supposed to accomplish, allowing for easier adaptation to changing circumstances, such as when the Foundation decided to scale up the project in under a year.

Designing and executing evaluations is an area in which overly specific policies might prove to be more of a hindrance than a help to the Foundation’s evaluation efforts. Instead, a set of guiding principles that outline decisions that need to be made and the criteria that should be considered when making those decisions would better serve The Foundation’s efforts. These criteria could be rooted in some of the other policy areas including goals and stakeholder involvement to further strengthen the evaluation policies as a system.

Implementation of Policies on Collaborating and Communicating

In the Holistic Health program, a Foundation leadership member described the collaboration as “a constant engagement of people at the Foundation, between the people who are managing the program and the people who are evaluating it.” Fun and Games also benefitted greatly from collaboration largely because of the relationships between the implementation and impact evaluations and the ongoing participation of the program director in both evaluations. Relationships were characterized by respect and a focus on common goals.

Collaboration efforts were generally supported by ongoing communication typically brokered by the Foundation. All of the evaluations had some sort of preestablished plan about how and when various parties would communicate about evaluation process and progress. In all cases, there was generally more contact in the beginning when evaluations were being designed and at the end when devising a reporting strategy. In the Drug Free Families program, the points of contact increased in response to some of the difficulties experienced, but the collaboration fell apart as results from the evaluation were disappointing to both the Foundation and the program. Some evaluators across cases indicated that this level of communication was unusual compared to their other evaluation experiences. One noted,

I was used to the sort of prior grants where you get the grant, and then you’d go do your thing. I didn’t want anybody being able to influence that at the Foundation. But after going through it, it was critical because we would need information on program changes.

An example of how policies operate implicitly regardless of their explicit articulation is evident in that the Foundation clearly promotes regular and ongoing communication between evaluators, program operators, and R&E officers. Even though this area of evaluation is functioning consistently at the Foundation, articulating guiding principles about the expected frequency and content of communications might still prove beneficial. As an example, evaluators bidding on the contract would have more information that could help them develop their scope and budget for the work with greater accuracy.

Implementation of Evaluation Policies on Involving Stakeholders

The emphasis on specific strategies for stakeholder involvement in Foundation records was not mirrored in the case studies. Rather, involvement was determined based on specific program needs and the context at hand. In the Holistic Health program, for instance, R&E program officers identified the Foundation, the program communities, communities doing similar work, researchers in health systems, federal policy makers, and those trying to improve the quality of health systems as stakeholders. The evaluator largely echoed this list. The relationships between the evaluator and the intervention communities evolved naturally, as there was frequent and close contact over an extended period of time. The evaluator explained how the communities were engaged in the study, specifically through “sharing our research findings, and meeting requests that [the communities] have for information.” The evaluation team worked with stakeholders to share findings in speeches or at board meetings and consulted them for “feedback on different aspects of the evaluation and its design.” The R&E program officer said the evaluator had good relationships with Foundation staff and the national program office, and that developing these relationships was critical because it provided a means for collecting data. The program officer “worked really closely with the evaluator and his folks and tried very hard to facilitate direct contact between the program folks and the evaluator.”

In the Fun and Games case, the two evaluators identified similar lists of stakeholders, including the program, the Foundation, schools, and policy makers. In the implementation study, the program was considered the primary stakeholder, whereas in the impact study, it was the Foundation. The R&E officer assisted building relationships between groups early in the process. The nature of relationships and extent of involvement with other stakeholder groups was less clear. In some cases, the evaluator relied on school administrators for access to data, and evaluators provided district-level reports when interest warranted it. Relationships seemed to unfold as needed rather than as a result of a comprehensive stakeholder engagement strategy.

Many of the same broad categories of stakeholders were named in the Drug Free Families program, but there was a more explicit and direct focus on the federal government because the program was trying to influence and shape public policy. The relationship between the program director and the evaluators was contentious. Despite being directly involved in the selection of the evaluation team, the program director felt that the evaluation was not meeting the program’s needs. The program director had very specific goals in mind related to advocating for the program to the federal government. Despite oversight and involvement from an advisory panel that mediated some of the conflict, and the evaluators’ concerted efforts, the relationship issues were never resolved.

In considering involving stakeholders, we see an example of where evaluation policies are very explicit, yet do not lead to consistent implementation. However, inconsistent implementation does not suggest a failure of the policy. Involving stakeholders in evaluation activities is a strength of the Foundation, as seen in all of these cases. The inconsistent implementation instead suggests that policies as they are articulated might not be flexible enough to tolerate necessary modifications in stakeholder involvement across evaluation efforts. This again points to a need for guiding principles and a system rather than overly codified evaluation policies.

Evaluation Policy Implementation for Reporting Findings

The information gathered through the case studies suggests that many reporting decisions lie with the evaluators. In proposals, evaluators are asked to outline their deliverables. The consulted examples described expected deliverables described in broad and vague terms, although they did demonstrate that the Foundation prioritizes the publication of results in academic or peer-reviewed journals. Other types of reporting included interim and annual reports, policy briefings, presentations, and reports tailored to communities that received the intervention.

In Holistic Health, the primary method of reporting has been peer-reviewed journal articles. The evaluator estimated that they have produced approximately 60 publications. Not all are related to evaluation findings per se, but the articles build an evidence base around a strategy, which is a Foundation priority. The evaluation team has also submitted interim and annual reports to the Foundation and prepared issue briefs available on the web. They have made presentations to treatment communities and produced customized community-specific summaries. The R&E officer noted the evaluation team decides where to publish their findings.

In the smooth case, Fun and Games, the evaluator described the deliverables as “an interim, final report, and three issue briefs, multiple presentations, and a research article for a journal.” The evaluator reports findings simultaneously to the program and the Foundation. Additionally, several school districts have requested specific reports from the impact study. Much of the reporting has been geared toward policy makers, however. A Foundation leadership member noted, “we’ll have briefings with policy makers on capitol hill so that they know of the program.” In keeping with the overall feel of the project, the process of reporting findings has been collaborative. The evaluators made initial decisions about the content and structure of policy briefs, and the Foundation helped make them more publicly accessible in terms of length, appearance, and language. The program director explained that this was not without challenges: “Evaluators want to speak very carefully and conservatively about what the findings mean, and we [the program] want to speak very liberally and broadly.” The Foundation R&E officer has played an important role in helping the program and the evaluators determine language that met the needs of both parties.

The overall difficulties experienced in Drug Free Families extended to the reporting phase as well. Each year, the evaluators submitted an annual report. The chief product available that summarizes this program is the Program Results Report, which is on the Foundation website. There were far fewer products submitted for peer review than in the other two evaluations. When the final evaluation report was submitted, the evaluators offered to meet with the program office, but that meeting was never held due to exigent circumstances.

Reporting is another area, where RWJF has been more intentionally explicit. However, while the expected reporting format for the final report due to the foundation is clearly outlined, instructions are more implicit when it comes to dissemination beyond the Foundation. More guidelines about Foundation expectations might assist evaluators with their planning. Furthermore, upfront guidelines and conversations about dissemination activities would surface considerations of audience for eventual findings, which might influence evaluation design issues.

This examination of evaluation policies at RWJF has pointed out some significant strengths that include: a clear emphasis on the importance of evaluation; building evaluation into the major social improvement initiatives; very strong collegiality in defining, formulating, and monitoring evaluations; using a variety of evaluation products (reports, briefings, and point papers) to get evaluation results to users; formal and enlightened report writing guidelines; and use of evaluation advisory committees to strengthen evaluation approaches and methods. On the other hand, these policies have evolved piecemeal over time rather than as a result of a deliberate evaluation policy strategy. As a consequence, some written policies are absent or inadequate, and some policies are followed with less consistency than others. This might be an opportune time for The Foundation to reflect and engage in discussions about the trade-offs of formality, uniformity, and flexibility. An approach to building evaluation policies as a system with guiding principles rooted in why certain activities are more desirable than others would facilitate this conversation.

Evaluation Policies as a System

This study has implications for evaluation policy across several levels: (1) the ways individual policies affect evaluation practice, (2) the implications policies have at an organizational level, and (3) evaluation policies as a system.

Examining individual policies offers an opportunity to explore how those policies affect evaluation practice. For example, the Foundation’s policy of selecting evaluators through sole sourcing or RFPs sent to preselected evaluators or firms means they deal with a limited pool of evaluators. Again, this practice makes sense—as one program officer put it, “this isn’t amateur hour,” and this approach minimizes risk that the evaluators will be unqualified. An un- or underqualified evaluator, in most severe cases, may result in evaluation malpractice and wasted investment in very high stakes settings. Furthermore, processes involved in running completely open bids are time consuming and resource (both time and staff) prohibitive.

However, these concerns should be weighed against the consequences of drawing from a limited evaluator pool—that it may result in a lack of diversity in ideas. Moreover, research design ideas and evaluation questions almost always originate with the Foundation R&E officer. In spite of back-and-forth between the evaluator and the program to finalize the questions, there was no evidence in the case study data that questions ever changed dramatically. Thus, a similar group of people asks a similar set of questions, just across different contexts. Not only does this stand in contrast to the Foundation’s articulated appreciation for diversity, but a potential to explore innovative ideas and approaches is missed. The challenge here is developing policies that support a balance between ensuring high quality and resource efficiency in high-stakes evaluations and allowing enough flexibility to support diversity and innovation.

In considering the evaluation policies at an organizational level, we discover implications that speak to how organizational functioning is shaped through evaluation policies. At the Foundation, tremendous consistency runs across the Foundation’s documents and personnel when it comes to describing evaluation policy goals; however, evaluation operational decisions are decentralized. Each R&E program officer was able to make evaluation activity decisions based on his or her interests and objectives and the particular context at hand—they were not directly guided by a policy, explaining in part the variation across the three case studies. This approach makes sense for the Foundation, which hires experts with extensive training and experience. Rather than prescribing evaluation activities in a top-down, cookie-cutter fashion, the Foundation trusts that these experts will draw on their expertise to inform the best design and activities for a given situation. Overcodification might limit evaluators’ ability to be responsive to particular conditions. The result, however, is that when learning happens about evaluation practice, it happens in silos. While improving practice based on reflection is important, it can be challenging from an organizational perspective. The Foundation does not seem to have a mechanism to support learning about evaluation policies and practice in a top-down fashion or a way to benefit from shared learning.

The Foundation does have structures in place to support learning in other arenas. For example, R&E staff review findings from evaluations and survey key informants to prepare annual reports designed to help the Foundation learn about its progress toward strategic goals. These reports are made public and discussed internally at the Foundation. By contrast, much of the experiential knowledge gained from participating in research and evaluation lies with a single individual and is typically only shared informally.

Another way to examine evaluation policy implications is to consider the policies as a system—a set of connected points that make up a more complex whole and work together in the aim of specific objectives (Meadows, 2008). In this way, the policies themselves might not be the most important unit of analysis. Rather, the connections within policies and between policies and practice are important places to focus attention. Thinking of policies as a system has implications for both their development and the ways in which they bear on evaluation practice. To construct policies from a system perspective, rather than generating a series of isolated rules and guidelines, individual policies should be grounded in their desired outcomes.

This study offers several insights into this potential systems approach for developing evaluation policies. Overall, where the Foundation was very consistent in its language around evaluation policy goals, the operational policies were inconsistently implemented—goals and operations were two disconnected facets of an evaluation policy rather than a unified system. For example, the Foundation had significant and carefully detailed policies about stakeholder involvement, but those were inconsistently followed in practice. Instead, program officers and evaluators valued the use of stakeholders, but facilitated their involvement in different ways depending on the situation. A systems approach to evaluation policy would be tolerant of these variations, but be grounded in why stakeholders should be involved. For example, a why statement about evaluation policies governing stakeholder involvement might read, “evaluation findings are more likely to be used when stakeholders are involved in the evaluation process.” In an evaluation policy system, this statement describes the desired outcome (use of evaluation findings) that would guide and connect the operational policies around involving stakeholders. Even though involving stakeholders in evaluation practice is widely regarded as a good idea on its own merits, connecting an explicit stakeholder involvement policy to a why statement with an explicit goal points to how their involvement might be realized. Specifically, evaluators could base their decisions on stakeholder involvement on considering how to involve them in such a way to promote use of findings. In a systems approach to building and enacting evaluation policy, a key aspect of implementing and enforcing it would be a constant interrogation of the connections between policy goals and operational decisions.

The smooth case study offers another example of how this policy system might work. In this case, the way that the implementation and impact evaluations worked together profoundly and positively impacted the evaluation process. Rather than writing isolated rules that suggest closely coordinating implementation and impact evaluations as a best practice, there could be more explicit policies about how implementation and impact evaluations should work together, rooted in the why: That impact evaluations are strengthened when they are attentive to implementation issues. Thus, the implementation of a related policy would beg the question of how and in what ways could this particular evaluation be strengthened through closer connections across implementation and impact studies. The questioning of the connections between the goals and the actions taking place to fulfill them would serve as the mechanism that promotes the health of the evaluation policy system.

Finally, an evaluation policy system may have prevented the challenging case from ever happening. In this instance, there could be a basic policy goal detailing the why that describes a reason for ensuring that programs are evaluable—that evaluations can only produce accurate and actionable results when the evaluand is ready for rigorous study. In the challenging case, if the program officer and/or evaluator were to constantly and carefully consider how they might ensure the evaluand’s readiness, they might have realized the practical impossibilities much sooner and been able to stop the study or take a different evaluation approach entirely (examining processes and implementation more in depth rather than impacts). However, this may be an oversimplification—in some cases, the political context may be too powerful a force for thoughtful evaluation policies to counteract. The challenging case poses a reminder that even the most careful and well-thought evaluation policies exist within a larger context that bears on policy implementation.

Implications and Directions for Future Research

This study scratches the surface of our understanding of how evaluation policy affects practice. Although individuals who are hired to conduct evaluations will have, at best, limited ability to shape the policies that guide their work, an awareness of how they function and influence practice is helpful. For example, understanding an organization’s evaluation policy goals may help evaluators better respond to RFPs. Beyond this immediate implication, however, are broader ramifications for the practice and study of evaluation.

Constructing evaluation policies as a system within the organization could support organizational learning around evaluation practice. For example, an internal meeting at the conclusion of a project could provide a forum for sharing lessons that can then be synthesized and shared organization wide. Explicit policies about this type of information sharing would help ensure that it takes place.

Future research could explore how specific areas of evaluation practice interact with explicit evaluation policies. It would also be worthwhile to explore how evaluation policy can be codified while allowing flexibility that supports necessary adaptations in the field. Likewise, a simulation study in which evaluators are assigned to separate conditions—one where the evaluation is guided by very explicit policies over certain areas and the other in which fewer rules are imposed—and then asked to design evaluations could reveal how policies affect evaluation design. Furthermore, this type of study could reveal the effects of policies on program, organizational, and/or stakeholder engagement in evaluation in addition to the perceived quality and usefulness of evaluation practice and how evaluation results might eventually be used. The work presented here also points to more applied areas of future research: A similar study in a different nonprofit organization might help to replicate these results, and attention could also be turned to federal or state bodies that commission evaluations to build understanding of what evaluation policies mean for practical work.

Limitations

All types of inquiry have limitations. The study approach taken has the potential to be subject to researcher biases and particular ways of viewing the world. To address this, systematic records of all data collection efforts and sources were kept and different types of data sources were used within each phase of the study to triangulate findings. The analysis was also informed by previous scholarship, and the conclusions are linked to extant theories. Second, the three selected cases present only a snapshot of how policies were enacted within a particular context, and it is challenging to generalize these findings to the practices of the entire Foundation. Likewise, the RWJF is a unique organization, so these findings cannot be directly applied to other settings. Nevertheless, the conclusions offer insight into how, broadly speaking, evaluation policies might help influence evaluation practice. Third, Foundation staff provided access to internal documents and interview participants, leaving the study open to questions about whether these data might be biased. This potential limitation was initially addressed through the researcher establishing criteria for case selection that included examples of both positive and negative evaluation experiences. The Foundation supported these criteria and facilitated selection of the challenging case, which was not a reflection of the Foundation’s best efforts, as acknowledged by individuals within and outside of the Foundation. Additionally, one of the case studies concluded several years ago and some participants did not remember specifics. Thus, interview responses were corroborated by document analysis. Finally, the researcher presented both intermediate and final study findings to representatives at The Foundation to member-check results. The Foundation representatives offered additional insight into interpretations that are reflected in the study’s conclusions.

Conclusion

Program evaluation plays a significant role in aiding our understanding of the effectiveness of interventions. Evaluations operate under the auspices of evaluation policies shape aspects of evaluation design including research questions, data collection and analysis procedures, and reporting of findings. This study suggests that evaluation policies affect how evaluators and foundation staff do their work, but that this effect differs across programs and settings. Developing evaluation policies as a system might allow for both appropriate guidance and sufficient flexibility in their implementation. Designing organizational evaluation policies that are internally integrated and grounded in their intended outcomes could have great potential to increase the usefulness of evaluation work in these settings.

Footnotes

Acknowledgments

We thank the Robert Wood Johnson Foundation for the opportunity to conduct this research and specifically Laura Leviton and Denise Herrera for their support, guidance, and comments that greatly improved the article. We also thank the reviewers of this article who offered very thoughtful insights about the work, which resulted in a much improved article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

References

American Evaluation Association. (2007). AEA evaluation policy task force charge. Retrieved from http://www.eval.org/EPTF.charge.asp

Behrens

T. R.

Kelly

(2008). Paying the piper: Foundation evaluation capacity calls the tune. New Directions for Evaluation, 2008, 37–50. doi:10.1002/ev.267

Christie

C. A.

Fierro

L. A.

(2012). Evaluation policy to implementation: An examination of scientifically based research in practice. Studies in Educational Evaluation, 38, 65–72. doi:10.1016/j.stueduc.2012.05.003

Datta

(2009). Golden is the sand: Memory and hope in evaluation policy and evaluation practice. New Directions for Evaluation, 2009, 33–50.

Dedoose (Version 6.1.18). (2015). Web application for managing, analyzing, and presenting qualitative and mixed methods research data. Los Angeles, CA: Sociocultural Research Consultants, LLC. Retrieved from www.dedoose.com

Dillman

L. M.

(2014). Alignment between intention and implementation: A case study of the Robert Wood Johnson Foundation’s evaluation policies (Unpublished doctoral dissertation). University of California, Los Angeles, CA.

Flynn

Hodgkinson

V. A.

(2001). Measuring the contributions of the nonprofit sector. In Flynn

Hodgkinson

V. A.

(Eds.), Measuring the impact of the nonprofit sector (pp. 3–16). New York, NY: Kluwer Academic/Plenum.

Fossey

Harvey

McDermott

Davidson

(2002). Understanding and evaluating qualitative research. Australian and New Zealand Journal of Psychiatry, 36, 717–732.

Hall

P. D.

(2004). A historical perspective on evaluation in foundations. In Braverman

M. T.

Constantine

N. A.

Slater

J. K.

(Eds.), Foundations and evaluation: Contexts and practices for effective philanthropy (pp. 27–50). San Francisco, CA: Jossey-Bass.

10.

Liket

K. C.

Rey-Garcia

Maas

K. E. H.

(2014). Why aren’t evaluations working and what to do about it: A framework for negotiating meaningful evaluation in nonprofits. American Journal of Evaluation, 35, 171–188.

11.

Mark

M. M.

Cooksy

L. J.

Trochim

W. M. K.

(2009). Evaluation policy: An introduction and overview. New Directions for Evaluation, 2009, 3–11.

12.

Meadows

D. H.

(2008). Thinking in systems. White River Junction, VT: Chelsea Green.

13.

Miles

M. B.

Huberman

A. M.

(1994). Qualitative data analysis. Thousand Oaks, CA: Sage.

14.

Patton

M. Q.

(2002). Qualitative research & evaluation methods. Thousand Oaks, CA: Sage.

15.

Porter

M. E.

Kramer

M. R.

(1999). Philanthropy’s new agenda: Creating value. Harvard Business Review, 77, 121–130.

16.

Saldaña

(2009). The coding manual for qualitative researchers. Thousand Oaks, CA: Sage.

17.

Summa

Toulemonde

(2002). Evaluation in the European Union: addressing complexity and ambiguity. In Furubo

J. E.

Rist

R. C.

Sandahl

(Eds.), International atlas of evaluation (pp. 407–425). New Brunswick and London: Transaction.

18.

Trochim

W. M. K.

(2009). Evaluation policy and evaluation practice. New Directions for Evaluation, 2009, 13–32.

19.

Yin

R. K.

(2009). Case study research: Design and methods. Thousand Oaks, CA: Sage.