Nonprofit Organizations and the Evaluation of Social Impact: A Research Program to Advance Theory and Practice

Abstract

This article proposes a research program with two goals: (a) to support nonprofit leaders to productively engage evaluation and (b) to advance a meso-level theory of nonprofit evaluation that recognizes the diverse ways nonprofits contribute to social change. Such a research program is timely, as evaluation becomes increasingly institutionalized in the sector in ways that constrain nonprofit leaders from engaging productively with evaluation to advance their social impact. This research program brings existing nonprofit scholarship into conversation with evaluation scholarship and puts forward a research agenda organized around the practical dilemmas facing nonprofit leaders as they answer four key evaluation questions: what to evaluate, for what purpose, using which criteria, and with what evidence and methods. By anchoring a research program around these four questions, we seek to reopen the possibilities for how scholars can support nonprofit leaders in engaging evaluation to enhance their social impact.

Keywords

nonprofit evaluation nonprofit social impact performance measurement

Introduction

Over the last several decades, nonprofit leaders have struggled to answer a persistent and pressing question: How can they best evaluate whether their organization is making a difference? As we show in this article, a substantial body of nonprofit scholarship has emerged on this topic, offering a powerful set of explanations for why nonprofits struggle to evaluate their social impact. What is less clear from this scholarship is how we, as nonprofit researchers, can respond to such challenges: How might our research support nonprofit leaders to productively engage evaluation in ways that enhance their social impact and contribute to social equity? Such a question seems particularly salient now as we see a narrow approach to evaluation becoming institutionalized in the sector, one that focuses on quantifiable results from the implementation of program or project interventions, favors methods such as cost–benefit analysis and randomized control trials (RCTs) for assessing impact, and often prioritizes ends over means in valuations of social change. Moreover, as the demand for evidence continues to grow, new efforts at standardization are being put in place within certain subfields, raising additional concerns about the ability of nonprofits to be responsive to communities.

In this article, we set out a research program that aims to motivate studies where the findings not only have the potential to support nonprofit leaders, their staff, and communities to fruitfully engage in evaluation but also to move toward a meso-level theory of nonprofit evaluation, one that better aligns with the diverse ways nonprofits seek to make a difference in communities. To accomplish this task, we bring nonprofit scholarship into conversation with an equally expansive scholarship on evaluation to organize a research program around the practical dilemmas facing nonprofit leaders as they address four key questions: What to evaluate? For what purpose? Using which criteria? And with what evidence and methods? The evaluation scholarship has considered each of these questions in depth, showing alternative ways of answering each but has not taken an organizational lens to these questions and certainly not a nonprofit lens.¹ By combining these two bodies of scholarship, we seek to motivate empirical studies integrated around a common set of questions and grounded in nonprofit practice—with the potential to advance a meso-level theory of nonprofit evaluation that opens up diverse possibilities for how nonprofit leaders can engage evaluation to support their social impact.

Our article is organized as follows. First, we provide an overview of the primary streams of nonprofit scholarship on the topic of social impact evaluation: organizational effectiveness, organizational accountability, and institutional environments. We then turn to our proposed research program, which is organized around the four core evaluation questions noted above. For each question, we start by identifying the dominant or more institutionalized answer evident in nonprofit practice today. We point to studies by nonprofit researchers that reveal why nonprofits struggle to evaluate their social impact in these ways. We then introduce the reader to the evaluation scholarship, summarizing examples of key insights that have emerged from this literature around each question. We build on these two bodies of scholarship to conclude each section with suggestions for future research.

Before proceeding, we want to define the key terms in our analysis and how we employ them. By evaluation we mean the systematic gathering of information about an entity to determine its merit or worth, inform decision-making, and improve social impact (Patton, 1997; Weiss, 1998). Evaluation is more than a one-off assessment; it involves a set of practices that can include the ongoing collection of performance data by organizations to inform decision-making, and less frequent in-depth assessment of a program’s implementation or an organization’s impact. We use the term social impact to refer broadly to the “difference made” by nonprofits as mission-driven organizations seeking to make a change in the world. Central to our inquiry is the idea of “practice dilemmas.” A practice dilemma is an adaptive challenge with no easy fixed answers but rather requires thoughtful engagement and critical inquiry on a regular basis (Heifetz, 1998; Schwandt, 2000, 2015). Dilemmas are distinct from other types of challenges that can be addressed through technocratic solutions or with an infusion of resources such as capital or expertise; by definition, dilemmas do not have clear solutions (Heifetz & Laurie, 2001). By highlighting the evaluation practice dilemmas facing nonprofit leaders and organizing a research agenda around them, we hope to build a stronger bridge to practice, one that recognizes the diversity in the sector and supports greater pluralism in approach. Finally, we use the term participants or communities interchangeably to refer to individuals, families, and communities who are the intended beneficiaries of nonprofit initiatives, recognizing that any term inadequately conveys a respect for these individuals, their agency, and their central role in social change. For readers who may be less familiar with evaluation terms, we have included an appendix of terms and definitions.

Nonprofit Scholarship: Central Lines of Inquiry

Scholarship on nonprofit organizations approaches questions about the evaluation of social impact from several vantage points. We organize this literature into three overlapping but distinct streams of research: organizational effectiveness, organizational accountability, and institutional environments. Our goal here is not to provide a comprehensive review but rather to bring together what are sometimes viewed as unrelated lines of inquiry—to show how they constitute a larger historical arc of research relevant to evaluating nonprofit social impact. Our discussion is summarized in Table 1 below.

Table 1.

Nonprofit Research Streams on the Evaluation of Social Impact.

	Organizational effectiveness	Organizational accountability	Institutional environments
Key Questions	How can we know if NPOs are achieving their goals? • What criteria should be used to evaluate effectiveness? • What are the principal measures/indicators of effectiveness related to those criteria? • What implicit criteria and competing values are evident in the process of evaluation?	How does accountability shape evaluation in NPOs? • For what are nonprofits accountable? • To whom are they accountable? • How is accountability operationalized (upward and downward)? • How do NPOs respond to multiple demands for accountability from diverse constituencies? • What are field-level (rather than organization-level) responses to accountability demands?	What are the institutional forces shaping evaluation in nonprofits? • What are the cultural and societal expectations/norms about performance and evaluation in which nonprofits are embedded? • What field-level forces shape the extent to which nonprofits adopt evaluative practices? What explains this adoption/non-adoption? • How isomorphic and symbolic are these practices? What are the means of diffusion?
Key theoretical concern(s)	How can the multidimensional nature of effectiveness be theorized and evaluated?	How does evaluation shape power relationships in a nonprofit’s accountability environment?	How can evaluation be understood as a culturally and socially structured set of practices?
Key theoretical influences	Organizational sociology Management theories	Principal-agent theory Social constructivism	Institutional theory Resource-dependence
Examples of key findings	• Goal attainment/outcomes are difficult to use as criteria to evaluate organizational effectiveness. • Stakeholders use different criteria, some objective, some subjective. • Comparing performance across organizations is complicated by diverse and complex objectives.	• Nonprofits experience accountability demands from multiple stakeholders but have difficulty prioritizing among competing demands • Power differentials mean that evaluative practices that enable upward accountability to funders are better developed than practices of downward accountability to beneficiaries • Accountability for short-term results driven by funders, risks undermining longer-term work of learning and relationship-building with communities	• Evaluative practices are often used for symbolic purposes, to signal that what the organization is doing is ‘appropriate’ or ‘legitimate’, but have little influence on practice. • Practices of evaluation and strategic planning are becoming increasingly common among nonprofits, to show that the organization is serious, professional, and cares about evidence (regardless of whether the evaluation informs practice or not), especially in heavily resource-dependent contexts.

Note. NPOs = nonprofit organizations.

Organizational Effectiveness

In the 1970s and 1980s, as the number of nonprofits grew, scholars started paying attention to nonprofits as a distinct organizational form, separate from firms and government, with unique challenges in measuring organizational effectiveness. Early research on effectiveness was heavily shaped by scholars of organizational sociology, industrial psychology, and administrative sciences. For example, an influential edited volume on New Perspectives on Organizational Effectiveness (Goodman & Pennings, 1977) tapped a number of pioneering organizational scholars to offer their perspectives on the challenges of assessing effectiveness across various types of organizations. A foundational textbook in organizational sociology (Scott, 1977, 1992) drew on this prior work to identify three basic types of indicators for judging organizational effectiveness—outcomes, processes, and structures. But early on Scott (1992) pointed to challenges with outcome indicators: “outcomes present serious problems of interpretation” such as inadequate knowledge of cause and effect, time required to observe results, and environmental characteristics beyond the control of the organization (p. 354).

These ideas were picked up in The Nonprofit Sector: A Research Handbook (Powell, 1987). This edited volume was one of the first research handbooks devoted to nonprofit organizations, and it included a chapter focused specifically on the distinctive challenges of measuring nonprofit performance (Kanter & Summers, 1987). Later scholars further probed whether the challenges of effectiveness differed in nonprofit versus for-profit organizations, pointing to two key differences. The first is that for-profit actors tend to focus on financial measures of performance which are generally easier to assess in both the short- and long term, while nonprofits typically see financial performance as an input rather than an outcome (Kaplan, 2001; Speckbacher, 2003). The second key difference is that nonprofits face multiple constituencies (such as funders, beneficiaries, communities, and government) in assessing their effectiveness, with each often viewing effectiveness using different criteria (Kanter & Brinkerhoff, 1981; Kanter & Summers, 1987).

A review of empirical studies on the topic between 1977 and 1997 (Forbes, 1998) discussed three principal measures of effectiveness: (a) goal attainment; (b) resource attainment, in terms of enabling organizational survival; and (c) multidimensional approaches arising from diverse criteria and values that organizational stakeholders use to assess the merits of the organization. This review further noted that the way in which effectiveness is assessed—the actual process of assessment—often introduces competing values and implicit criteria or what other scholars have since framed as the political aspects of evaluation (Tassie et al., 1996) and the social construction of nonprofit effectiveness (Herman & Renz, 1999, 2008). Subsequent research has developed multidimensional constructs for effectiveness that include both goal attainment and organizational survival (Sowa et al., 2004), examined the determinants of effectiveness in both research practice (Liket & Maas, 2015), and proposed a multiple constituency theory of program performance (Campbell & Lambright, 2016).

Research on organizational effectiveness has also been influenced by business scholarship, especially from the fields of accounting and strategy (Anthony & Young, 2004; Merchant & Otley, 2006; Oster, 1995). The balanced scorecard, in particular, has attracted attention among nonprofits for its multidimensional approach to measuring performance along four dimensions or “perspectives”: financial/donors, customers/stakeholders, internal processes/quality, and learning/improvement processes (Kaplan, 2001). Nonprofit scholars note this approach has been helpful in offering nonfinancial measures of performance and in linking internal processes to goals but does little to address how to assess goal attainment given the nonlinear nature of cause and effect, and how to resolve competing performance criteria used by multiple constituencies (Speckbacher, 2003).

Finally, a related challenge in this literature lies in establishing measures of effectiveness that can be compared across organizations, given the complexity and diversity of nonprofit work (Ebrahim & Rangan, 2014; Stone & Cutcher-Gershenfeld, 2001). The lack of reliable and comparable measures, based on shared agreements about what criteria and measures are important, may be one reason why the nonprofit sector does not have a robust infrastructure to provide systematic data on organizational performance akin to private sector ratings agencies and industry analysts (Prakash & Gugerty, 2010). Even nonprofit managers themselves appear to hold very different views of organizational effectiveness, with some focused on outcome accountability and others on measures of organizational efficiency (Mitchell, 2013). The question of what constitutes nonprofit effectiveness is central to a core concern of evaluation: what to evaluate? We return to this issue in the next section.

Organizational Accountability

Nonprofit scholarship in the late 1990s and early 2000s responded to the growing demands for accountability in the sector, both in international development and in the United States. The former was perhaps best marked by the publication of a seminal article on accountability and performance in the flagship journal World Development (Edwards & Hulme, 1996) and further developed into a widely read edited book in the same year. At about the same time in the United States, the journal Nonprofit Management and Leadership devoted a special issue to the topic of accountability in 1995. Two trends in practice further heightened the interest of scholars in accountability. The first was growth in public sector contracting to nonprofits resulting from state retrenchment and the emergence of a “new public management” discourse on performance (Krauskopf & Chen, 2010; D. H. Smith, 1999; S. R. Smith & Lipsky, 1993). The second trend was a series of highly visible scandals that contributed to an erosion of public confidence in nonprofit organizations (Bebbington & Riddell, 1997; Fisher, 1998; Gibelman & Gelman, 2001; Young et al., 1996).

The diverse scholarship on accountability converged on three key questions, both empirical and normative: for what is an organization accountable, to whom is an organization primarily accountable (given competing demands), and how is accountability operationalized in practice? Scholars began to differentiate among “upward” accountability demands of funders (such as foundations, private investors, government agencies, and individual donors), “downward” accountability to clients, beneficiaries, and communities, and “internal” accountability to their own staff and boards (Edwards & Hulme, 1996; Kearns, 1996; Lindenberg & Bryant, 2001; Najam, 1996; Oster, 1995). The emerging research further sought to grapple with the distinctive features of nonprofit accountability given that nonprofits have no owners akin to shareholders (Hansmann, 1996) and face public scrutiny in exchange for tax exemption (Fremont-Smith, 2004).

Not surprisingly, researchers found that the most powerful accountability claims came from funders rather than beneficiaries or clients because funders could threaten to withhold funding whereas clients often did not have such an exit option or sanctioning mechanism (Ebrahim, 2003a; Hirschman, 1970a). Principal-agent perspectives suggested that mechanisms of upward accountability would be better developed than downward accountability, on the grounds that nonprofits essentially act as agents for their funder-principals (Prakash & Gugerty, 2010). Other research showed this principal-agent explanation to be too limited (Benjamin, 2010) and explored “mutual” and “plurilateral” accountability mechanisms among organizations working together in a network or under conditions of interdependence (Brown, 2007; Macdonald, 2007).

A flurry of empirical work looked at the reporting relationships between nonprofits and their funders, showing not only the negotiated and political nature of accounting for results (Benjamin, 2008a; Cutt & Murray, 2000; Tassie et al., 1996) but also how demands for accountability transferred the risk of delivering impact from funders to nonprofits, sometimes leading nonprofits to overstate their results or compromise relationship goals, such as community building (Benjamin, 2008b; Campbell, 2002). This work revealed how accountability demands are shaped by relationships of power among actors (Dubnick & Justice, 2004; Ebrahim & Weisband, 2007). Other research uncovered the consequences of such relationships, documenting how nonprofits sought to separate external reporting from internal learning, creating an “accountability myopia” focused on short-term results at the expense of long-term learning (Ebrahim, 2003b, 2005). The rise in self-regulation and accreditation regimes worldwide underlined the pressure on nonprofits and nongovernmental organizations (NGOs) to demonstrate accountability to external stakeholders while also attempting to stave off regulatory action by governments (Breen et al., 2019; Gugerty, 2009; Gugerty & Prakash, 2010).

Recent work on international NGOs has highlighted the irreconcilability of these competing accountability demands. As NGOs scale and build their capabilities for influencing global policy agendas, they can lose their abilities to stay connected and accountable to local actors (Balboa, 2018). Conversely, those that stay focused on accountability in grassroots relationships have difficulty building global capabilities and influence. More troubling, NGOs that succeed in building substantial authority in global politics can fall into an “authority trap” whereby they soften their activism to focus on incremental rather than radical change (Stroup & Wong, 2017). These studies highlight the relational rather than absolute nature of accountability, suggesting that the traditional bases of legitimacy and effectiveness that have enabled the global power of NGOs are now being eroded to the point of making them irrelevant (Mitchell et al., 2020).

This brief discussion reveals two orientations to accountability questions: a positivist or rationalist approach and a social constructivist approach. Rationalist perspectives suggest that evaluation efforts, guided by expert evaluators, can be used to find objective measures of performance, create a basis for learning, and hold organizations to account. Social constructivist perspectives, however, suggest that measures are rarely objective, as they are the result of relationships of (unequal) power among stakeholders. Both perspectives thus highlight different practice dilemmas that arise from accountability claims—with rationalist perspectives emphasizing the challenge of measurability and standardization, while social constructivists point to the dilemmas of negotiating among competing, or even incommensurable, demands for accountability. We revisit these perspectives when we consider the central dilemmas in the second half of this article.

Institutional Environments

The institutional environments literature, which has grown in prominence over the past two decades, documents a growing isomorphism in the nonprofit sector, such as the widespread adoption of evaluative tools and business management practices. The conceptual foundations of this work can be found in institutional theory, informed especially by DiMaggio and Powell’s (1983) seminal article on institutional isomorphism, alongside scholarship on how organizational environments shape the diffusion and adoption of managerial practices and the symbolic uses of information (Feldman & March, 1988; Meyer & Rowan, 1977).

Despite its long roots in organizational theory, research on institutional environments did not become a dominant stream in nonprofit scholarship until the early 2000s. The resulting research has shown that nonprofit organizations measure social impact not necessarily for purposes of assessing their own performance but for establishing social legitimacy within their organizational environments—often adopting short-term and easily quantifiable metrics over more ambiguous or complex measures of social change (Hwang & Powell, 2009) and decoupling measurement and evaluation policy from practice (Bromley & Powell, 2012). Measurement systems thus serve not simply as rational instruments of assessment but as political and contested means of social and cultural legitimation, especially in resource-dependent contexts (Pfeffer & Salancik, 1978).

Researchers have further argued that such use of measurement is part of a deeper structural transformation of the nonprofit sector characterized by marketization and managerialism, given the ascendance of business practices across society (Eikenberry & Kluver, 2004; Maier et al., 2016; Mair & Hehenberger, 2014; Powell et al., 2005). For example, scholars have shown a growing shift in nonprofits toward the hiring of professional managers, adoption of formalized managerial practices such as strategic planning, independent financial auditing, and quantitative evaluation and performance measurement (Bromley & Meyer, 2017; Tuckman & Chang, 2006).

Many scholars have sought to better understand nonprofits’ responses to these institutional pressures by examining variations in the adoption of evaluation practices (e.g., Barman & MacIndoe, 2012; Campos et al., 2011; Carman, 2007; Carman & Fredricks, 2010; Carman et al., 2008; Hoefer, 2000; Kang et al., 2012; Marshall & Suárez, 2014). A central finding of this research is that the growing adoption of the instruments and tools of social impact measurement and evaluation—such as theories of change, logic models and frameworks, and experimental methods of evaluation—may be a result of externally generated pressures for legitimacy and are less related to efforts to improve practice (please see the appendix for definitions of key terms). Yet the nonprofit scholarship also suggests that adoption and use depend on having adequate organizational capacity, and that managers may have some agency in how and why they adopt evaluative practices, as we explore in the section below (Benjamin & Campbell, 2020).

Together, these three lines of nonprofit scholarship highlight how challenges in defining effectiveness, negotiating demands for accountability, and responding to external institutional pressures shape how nonprofits engage with the assessment of social impact. At the heart of each stream of nonprofit literature lies a common emphasis on how to define and evaluate social impact but with somewhat different emphases (effectiveness, accountability, institutionalization) that reflect the concerns of the times. Rather than trying to reconcile the differences among these literature, our approach has been to illustrate their common concern with evaluation and social impact. We now turn to developing a research agenda that might open up space for a greater agency to nonprofits seeking to improve the social impacts of their organizations.

A Future Research Program

The nonprofit scholarship summarized above offers a set of powerful explanations for why many nonprofits face challenges in evaluating the social impact of their work. What is less clear from reviewing this literature is what we, as nonprofit scholars, might do: How might our research support nonprofits to engage in evaluation fruitfully in light of these challenges? We propose a research program that aims to develop a more applied and meso-level theory of evaluation for nonprofits, one that better aligns with how nonprofits work and the diverse ways they can contribute to social change and equity.

We believe such a research program is urgently needed because a narrow approach to evaluation is becoming increasingly institutionalized in the sector. This narrow approach focuses almost exclusively on interventions at the program or project level, prioritizes intended or predefined outcomes above other types of criteria, and holds evidence that can be readily quantified or even monetized as more credible. We see this approach enacted in the spread of evidence-based policies, value-for-money criteria, quantification of benefits, a predilection toward randomized control trials, and other trends that are becoming institutionalized in the nonprofit field. Although these approaches can have important contributions in terms of comparability and scale, they reflect narrow approaches to assessing value.

We believe that the complex nature of social change requires greater pluralism in approaches to social impact, a point echoed by practitioners and scholars alike. Our proposed research program thus seeks to support alternative ways of approaching evaluation in the sector, centering on the key practice dilemmas faced by nonprofit leaders. To help accomplish our goal, we turn to the scholarship on evaluation. We focus primarily on the scholarly field of program evaluation, which informs the professional field of evaluators working in the nonprofit sector (e.g., Alkin 2013; Dahler-Larsen, 2012; Schwandt, 2015; Shadish et al., 1991; Thomas & Campbell, 2020; Weiss, 1998). Program evaluation scholarship has developed a more pluralistic set of approaches to evaluation practice that increasingly recognizes how standard evaluation approaches, seen as objective or neutral, can in fact represent dominant interests, specifically those that are White, Western, colonial, or from the global north (e.g., Caldwell & Bledsoe, 2019; Cavino, 2013; Chilisa et al., 2016; Chouinard, 2016; Hood, 2004; Hopson, 2009; House, 2017; Kirkhart, 2010; LaFrance & Nichols, 2008; Madison, 2007; Stanfield, 1999; Thomas et al., 2018). We also draw inspiration from the sociology of valuation which considers the assumptions that inform evaluative processes endemic in social life (Barman, 2015; Beljean, n.d.; Boltanski & Thévenot, 2006; Lamont, 2012).²

This scholarship has theorized and debated four core evaluation questions (Shadish et al., 1991):

What is being evaluated? Identifying what to evaluate requires clarifying the unit of analysis for evaluation and drawing boundaries around what is included and excluded. For nonprofits, this requires not only considering the agent(s) of change—such as an anti-poverty program or project, an organization or network, the community or participants—but also what requires changing and the relationship between the two.

What is the purpose of evaluation? Evaluations are intended to be used to support some decision or action. In the nonprofit sector, this could involve making a final judgment that results in cutting a program or renewing a grant or for improving practices such as providing training to staff or changing the way a program is designed. Evaluation can also be used to encourage deliberation among stakeholders, reaffirm a community’s self-determination, or elevate community voice.

What criteria should be used in an evaluation to judge merit or worth? The central point of evaluation is to make some judgment about the entity being evaluated. This requires identifying and selecting evaluative criteria, and the values that inform them, and then applying these criteria to the relevant evidence. For nonprofits, standard criteria often include some measure of intended program outcomes, but other criteria could include greater community leadership or enhanced dignity among participants.

What evidence is credible and what methods are needed to gather that evidence? An evaluation typically assesses the performance of the entity against these criteria, using evidence and methods that are viewed as legitimate to key stakeholder groups. In the nonprofit sector, this evidence may be gathered formally with recognized social science methods, including community-engaged research methods or culled from existing information systems.

Organizing our research agenda around these four evaluation questions brings us closer to the practical dilemmas facing nonprofit leaders as they evaluate their social impact—because these questions must be answered in any evaluation, whether nonprofits explicitly address them or not. Making these questions explicit in an evaluation can provide nonprofit leaders with better traction and agency in their work while also giving nonprofit scholars a starting point for a more practice-oriented research agenda that supports pluralism and equity in the sector.

For each evaluation question, we first identify the central dilemma nonprofit leaders face as they attempt to answer this question. We point to how the institutionalization of a narrower response to the question hinders nonprofit leaders’ abilities to address the associated dilemma thoughtfully. We return to the nonprofit literature here to elaborate on the challenges nonprofits face in answering this question. We then turn to the evaluation scholarship. This scholarship is evolving but historically has been organized into four domains, reflected in the four core questions above and evident in introductory texts to the program evaluation field (e.g., Schwandt, 2015; Shadish et al., 1991; Weiss, 1998). Again, our purpose is to draw on key insights, rather than offer a systematic review. Together, these two bodies of scholarship lay the groundwork for a research program that seeks to meet two intimately connected goals: advancing a meso-level theory of nonprofit evaluation and supporting nonprofit leaders to productively engage evaluation in diverse and more equitable ways. We summarize this discussion in Table 2 below.

Table 2.

Toward an Integrative Nonprofit-Evaluation Research Agenda.

Evaluation question	What to evaluate?	For what purpose?	Using which criteria?	With what evidence and methods?
Core Dilemma & Example Questions	Defining Unit of Analysis • Should we evaluate a project, program, or policy? Or mission and strategy? • How does our organization contribute to network or system-level goals? • Should we evaluate the means as well as the ends—such as quality of relationships with communities?	Reconciling Competing Uses/Demands • How do we avoid goal displacement or overclaiming results when responding to funder requirements? • How can we address the needs of diverse users with competing needs? • Can we develop evidence systems that meet multiple needs of learning, accountability, and empowerment?	Identifying and Prioritizing with Multiple Criteria • What criteria are important? Who decides? • How should we weigh those criteria? • How do we include other criteria, not only the criteria of our funders?	Generating Diverse Kinds of Credibility of Evidence • What is good or credible evidence? What are appropriate methods for gathering and analyzing evidence? • What if we don’t have the capacity for experimental evaluations, or if they’re just not right for us? • What about other types of evidence and methods?
Institutionalized Response	Program or project	Accountability to funders	Intended program outcomes	Evidence from experimental studies
Key Insights from Nonprofit Literature	• Historical focus on the organization (not the program or project) as the unit of analysis. • Measure beyond project/program, such as organizational capacity, community/participant satisfaction, and contribution to collective goals. • Communities are not passive recipients of projects/programs but are coproducers.	• Upward accountability prioritized over downward to participants or inward to staff. • Limited capacity and requirements to produce multiple streams of data for funders mean nonprofits struggle to produce and control their own data. • Evaluation often used symbolically and viewed as disconnected from nonprofits’ “real” work.	• Goal attainment/outcomes are difficult to use as criteria to evaluate organizational effectiveness. • Stakeholders use different criteria, some objective, some subjective, although funders have the most influence in defining the criteria and deciding what is acceptable.	• Experimental methods are expensive, take too long to yield timely information, are less relevant for mid-course correction, and are not suitable for all types of social interventions. • Methods are often used for symbolic purposes: to signal legitimate behavior and for upward accountability.
Insights from Evaluation Literature	• Historical focus has been on the intervention—the program, project or policy—as the unit of analysis. • Theory-based evaluation opens up the black box of programs to better understand how to create social change. • Program outcomes can be emergent, defined in collaboration with participants. • Program outcomes depend not just on technical aspects (instrumental) but on the social interactions among staff, communities, and evaluators (relational).	• Use is not guaranteed but must be facilitated • Use requires attending to the needs of distinct audiences. • Use is not only instrumental, but includes enlightenment and symbolic use. • Use of evaluation is better conceptualized as influence, where the process of evaluation as well as the results can affect those involved in both intended and unintended ways.	• Evaluation criteria rest on values about what is good. • Values are omni-present in social programming, including in the organizational and political system in which the program is created and administered, in stakeholders’ commitments, and in evaluation methodologies and purpose. • Valuing, is the process of arriving at a final judgment. This includes identifying surfacing criteria, ranking criteria, and synthesizing evidence related to each criteria into a final judgment. • Different approaches are possible for each step in valuation.	Diversity and cultural responsiveness in methods and evidence provides a more holistic understanding of what is being evaluated. • There is a need to sort out the strengths and weaknesses of different methods for different purposes; no method is unbiased. • Diverse methods of evaluation exist, including interpretivist, qualitative, and contribution-based approaches; multiplism is necessary to strengthen findings. • Credibility is subject to interpretation and judgment by relevant reference groups.
Research Agenda	How can “strategy” help open up the unit of analysis from the program to the organization? What is the “right fit” between organizational strategy and needs for evidence, to deliver social impact? How do we define the unit of analysis to recognize the central role, contribution, and agency communities in social impact? How do we define the unit of analysis to recognize the effect of the organization—not only its programs—on participants? How do these organizational experiences vary depending on the organizational design and what are the implications for social impact?	How can we develop better understanding of evaluation use as an organizational practice in nonprofits? How can we center the information needs of nonprofit constituents—especially communities/participants and staff—in evaluative processes? How can theory support the agency of nonprofit leaders to build learning routines and practices?	What valuing processes do different stakeholder groups use to judge the worth of a nonprofit organization? How can values implicit within nonprofits be made explicit and used to deepen critical inquiry about social impact? How can the expressive purposes of nonprofits be incorporated into understanding social impact? How do standards institutionalize criteria in ways that enable or constrain critical inquiry about social impact?	How do different types of evidence help us understand nonprofit social impact? What does credibility in evidence look like from the experience of diverse nonprofit constituents? How can different methods (experimental, interpretivist, etc.) be chosen and developed to better align with organizational decision-making needs?

What to Evaluate?

The first evaluation question—what is to be evaluated?—appears deceptively simple. In fact, this question may not even be explicitly considered by nonprofit leaders because the answer is predetermined: evaluate specific programs or projects for funders. This focus on the program and project as the dominant unit of analysis is reinforced through evaluation tools and handbooks on logic models and theories of change intended to help nonprofits specify the central components of a program and its expected results (e.g., Knowlton & Phillips, 2012; USAID, 2022; W.K. Kellogg Foundation, 2004).

Nonprofit scholarship, however, has historically been concerned with the organization (rather than the project or program) as the primary unit of analysis. As discussed earlier, nonprofit scholars recognize both the multidimensional nature of organizational effectiveness and the need to consider more than the program when assessing the social impact of nonprofits, including organizational goals, organizational mission and strategy, financial measures, the assessments of multiple constituencies, and the contribution to larger coalition and system goals (e.g., Bryan, 2019; Ebrahim, 2019; Lecy et al., 2012; Sowa et al., 2004; Speckbacher, 2003). Other research focuses specifically on the relationship between nonprofit organizations and those individuals, families, and communities who are the intended direct beneficiaries of the organization. This body of scholarship calls attention to the diverse ways these participants engage with nonprofits as organizations, not simply as recipients of programs, and how this engagement affects participant experience and social impact (e.g., Benjamin, 2012, 2021a; Benjamin & Campbell, 2015; Knowlton & Phillips, 2012). These literatures show that the project or program is not always the most appropriate unit of analysis for evaluation, as it restricts our understanding of how nonprofits as organizations contribute to social impact. A core dilemma facing nonprofit leaders when considering what to evaluate is thus how to define the unit of analysis.

Although evaluation scholarship has principally been concerned with the program as the primary unit of analysis, it offers some nuance and depth for informing nonprofit evaluation. We discuss three developments in this scholarship relevant for our purposes. First, evaluators developed a more nuanced and complex understanding of programs. Early evaluation scholars focused on developing evaluation designs that could isolate the causal relationship between a program and measured results, viewing programs as simple instrumental interventions. But by the late 1970s, the challenges of implementation and the larger social and political context in which programs unfold spurred evaluators to open up the “black box” of programs—to better understand their internal structure, external constraints, as well as the recursive relationship between programs and social change (Shadish et al., 1991, p. 38). This included attention to how a program was implemented and whether fidelity to the model was maintained, what is sometimes referred to as process evaluation (Harachi et al., 1999; Mowbray et al., 2003; Stufflebeam, 1983). Relatedly, theory-based evaluations sought to specify the “theory of change” or the causal logic underlying program interventions, something familiar to many nonprofits today (Chen, 1990, 2005a, 2005b; Chen & Rossi, 1983; Meyer et al., 2021; Rogers, 2007, 2008; Weiss, 1998).³ Here scholars have drawn on realist philosophy to consider not simply whether something works but as Pawson and Tilley (2005) explain “What works for whom, in what circumstances, in what respects and how?” (p. 363; see also Pawson & Tilley, 1997).

Second, evaluation theorists recognized that desired outcomes are emergent in many settings, defined in collaboration with participants and communities and are not determined a priori. For example, neighborhood revitalization efforts involve working with residents to identify core concerns. When programs require partnering with participants and communities to define outcomes, evaluating fidelity to a predetermined program model is misplaced (Patton, 2011, 2016; Rogers, 2008). And third, evaluation approaches started to recognize that program outcomes depend not just on the technical aspects of program implementation but on the quality of the relationships in the organizational setting, including those among staff, between staff and communities, and among community members themselves (Abma, 2006; Visse & Abma, 2018). Although some of these interactions between staff and participants may be specified in an intervention protocol and thus studied in an evaluation, scholars have suggested that interactions extend beyond the intervention, as discussed below.

What are the implications of these two bodies of scholarship for future nonprofit research on the question of “what to evaluate”? The evaluation scholarship offers a more expansive and complex understanding of what goes into specifying a program and its effects, compared with a typical logic model familiar to many nonprofit leaders. Yet the characteristics of nonprofit organizations, including their diverse structures, their leadership, and organizing challenges, are typically not the focus of this scholarship. We believe research attentive to organizations can support nonprofit leaders and avoid the trap of only evaluating isolated programs and projects or of using simplistic assumptions about how they contribute to social change and equity. We offer two principal lines of future research centered at the organizational level.

First, to develop a meso-level theory of nonprofit evaluation, one that takes organizations seriously, we need to shift our focus from the program to organizational strategy. Nonprofit organizations are not simply a blank canvas on which programs and projects are executed but have overarching theories and assumptions about how programs fit together, get implemented, and respond to their environments. These are central concerns of strategy (Bryson, 2016; Oster, 1995).⁴ Indeed strategy is a familiar term to many nonprofit leaders, whether they are engaged in advocacy, social movements, or human services. Recent research identifies several distinct types of social change strategies—niche, integrated, emergent, and ecosystem—that are contingent on the organization’s knowledge about cause and effect and its degree of control over desired outcomes (Ebrahim, 2019). This work suggests that the appropriate unit of analysis, and the organization’s evaluation approach more broadly, depends on the organization’s strategy, thereby offering considerable agency to managers in determining “what to evaluate.” Other research suggests that choices about strategy shape operational capacities for measurement (Moore, 2013) and collaboration with other actors (Balboa, 2018) that managers need to consider in evaluation. New research is needed that can help us better theorize the relationships between strategy, evaluation (particularly units of analysis), and capacity-building.

Second, nonprofit scholars are uniquely positioned to conduct research that also considers how answering the unit of analysis question may vary depending on the nonprofit–community relationship. For example, we know that communities and participants are central actors in achieving social change, taking steps inside and outside the organization to achieve their desired outcomes. How might our evaluation approaches support equity, for example, by expanding the unit of analysis beyond the organization to consider how desired outcomes are co-defined and co-produced by communities (Benjamin, 2021a; Benjamin & Campbell, 2015; Bovaird & Loeffler, 2012; Chilisa et al., 2016; Ostrom, 1996)? This includes a critical examination of how nonprofits might contribute to or stymie community-desired outcomes. Relatedly, defining the unit of analysis at the organizational level requires documenting how the organization—and not simply its programs—shapes participants’ experiences in ways that matter for social impact and ultimately for social equity (Benjamin, 2021b; Kushner, 2000). This question is even more salient because the nonprofit form includes diverse organizational structures, from highly bureaucratic to collectivist. These diverse structures allow for different forms of engagement and authority on the part of participants, which in turn can affect the norms and values of the organization in ways that matter for participants’ experience and thus social impact (Benjamin, 2021b; Chen et al., 2013). Such experiences can include direct involvement on the board or an advisory group (which is often not well captured in program-focused evaluation), but it can also include less tangible experiences of organizational culture such as service interactions and ongoing relationships with nonprofit staff (Benjamin, 2022). We need research that examines how the organization, its governance, culture, and so on affect participants’ experiences in ways that matter for social equity and impact, and how this might vary depending on their engagement with the organization.

For What Purpose?

How evaluation results will be used is a central question in evaluation given that evaluations are typically undertaken to generate knowledge that informs decisions. But using evaluative data to inform decisions requires that data are matched to the types of questions and decisions that need to be made. Because nonprofits have numerous external and internal stakeholders who require different types of evaluative data, deciding whose decisions are to be informed requires articulating and mediating among uses for multiple audiences. This core dilemma—of addressing competing demands for use—is particularly challenging because the accountability demands of funders often take priority, further institutionalizing dominant perspectives and making it difficult for nonprofits to consider the full spectrum of information from which other constituents might benefit.

Several nonprofit studies document the consequences of using evaluation to meet funder accountability requirements and also point to other reasons nonprofits may not use the data they collect. These reasons include limited capacity, inability to control the data they collect, and inadequate technology (Benjamin et al., 2017; Hoefer, 2000). For example, the nonprofit literature on accountability discussed earlier showed how goal displacement or overclaiming of results might be a natural consequence of using evaluation to meet funder demands. The organizational effectiveness literature shows how the ambiguity inherent in defining nonprofit effectiveness is often resolved in favor of meeting funder requirements, resulting in evidence that is neither useful for organizational-level decision-making nor for learning (Bryan et al., 2020; Carman & Fredericks, 2010). Consequently, evaluative data needed by internal audiences for learning and improvement are often not available (Gugerty & Karlan, 2014). As a result of these forces, managers and staff may ultimately see evaluation as symbolic and separate from their “real” work (Buckmaster, 1999; Mitchell & Berlan, 2016; Riddell, 1999).

Use has been a central concern in evaluation scholarship, in part because evaluation results seem to be used so little, at least not directly (Dahler-Larsen, 2012). Early evaluation theorists assumed that evaluation results would inform decisions about program continuation or expansion (Shadish et al., 1991). But these naive assumptions about instrumental use confronted the stark reality that evaluation results were not being used as intended. Disappointment with this limited role helped to generate a theory that described: (a) the various audiences and types of evaluation use; (b) the time frames in which use occurs; and (c) how the use can explicitly be facilitated (Shadish et al., 1991, p. 53).

On this first point, evaluation scholars set out to identify the potential users of evaluations and their specific information needs, identifying how evaluation might influence a wide range of audiences that included policymakers, funders, managers, the policy-shaping community as well as program beneficiaries and communities (Greene, 2013; Kirkhart, 2000). The idea of evaluation “influence” helped to expand conceptions of use beyond immediate instrumental decision-making. For example, evaluation could shift the ways in which stakeholders conceptualized or thought about an issue (Weiss, 1973). Evaluation findings could also be used by stakeholders to enhance the legitimacy of a particular organization, program, or practice (Schwandt, 2015). Moreover, a number of evaluation approaches have been developed to advance social justice and equity, recognizing that all evaluations advance certain perspectives and interests and so the priority should be on those perspectives and interests with the least power (Greene, 1997). Here instrumental, conceptual, and legitimacy use could be critically redefined in light of larger equity goals. On the second point above, scholars also realized that different types of use might unfold over time. Examining use over longer time periods illuminated the ways in which evaluation created unintended as well as intended consequences that would not be visible in the short term. Longer time horizons also called attention to how the very act of participating in evaluation changed the understanding of program participants (Kirkhart, 2000).

Finally, evaluation scholars recognized that use required active facilitation (Schwandt, 2015). One stream of evaluation scholarship focused on policy influence and uptake of ideas, studying how and when generally available research evidence and evidence-based practices were incorporated into organizational practice and intervention design (Carswell et al., 2021; Hardwick et al., 2015). Another stream turned the lens to the needs of program managers for data that could be used for program improvement (Patton, 1997; Wholey, 1981). This included parallel trends in international development focused on “management by objectives” and “managing for results” to increase the use of performance data by managers with the hope of improving the effectiveness of international aid (Martinez & Cooper, 2020; Rossi et al., 1982). Facilitating use by managers also requires specific knowledge of and attention to incentives and rewards embedded in organizations (Behn, 2014; Wholey, 1981).

How can nonprofit and evaluation scholarship help to better theorize about evaluation use in a way that might guide nonprofit practice and vice versa? While several studies have examined how nonprofits use evaluation data, we focus on three possible lines of inquiry.

First, we need to better understand what “use” means across the wide range of organizations in the nonprofit sector. How, and in what ways, does evaluative data and its use get discussed in service delivery nonprofits, advocacy organizations, or community organizations? How do these discussions influence actual use by staff, managers, leaders, and beneficiaries? How do discussions and actions vary across types of organizations and stakeholders? Many studies report that some nonprofit leaders do make consistent use of evaluation information (Innonet, 2016; LeRoux & Wright, 2010), while others report that nonprofits are “drowning in data” and are either minimally using the data they do have (Benjamin et al., 2017; Snibbe, 2006) or are using it largely for symbolic purposes of compliance. Studies of the uptake of evidence-based policy and practices also suggest that, when evidence or evaluation results do not reflect the expertise and knowledge of clients and staff or are not co-produced by them, they are less likely to be used (Carswell et al., 2021; Hardwick et al., 2015). This suggests that attention to equity and diverse perspectives is a core component of facilitating use.

We also need a more systematic way of investigating the unintended consequences of evaluation use. The goal displacement resulting from trying to achieve certain narrow targets is well documented, but the evaluation process also has consequences. The very act of participating in evaluation can shift cognitive understandings of programs, affect stakeholders’ views of merit and worth, and alter dynamics and perceptions of power and privilege (Kirkhart, 2000; Schwandt, 2015; VanderPlaat, 1995). Here we might ask: How are beneficiaries affected by the evaluation process? How does it affect staff? Not only in terms of their workload, which is well documented (Benjamin et al., 2017; Kim et al., 2019; Snibbe, 2006), but also in how they engage with communities or how they think about communities? Who owns the data that are produced, who gets to use these data, and who gets to tell the story about the data? (See Cavino, 2013; Chambers, 1999; Stanfield, 1999). Critical approaches to evaluation research have highlighted the potentially extractive nature of evaluation (e.g., Cavino, 2013; Center for Evaluation Innovation, Institute for Foundation and Donor Learning, Dorothy A Johnson Center for Philanthropy, & Luminare Group, 2017; Chilisa et al., 2016; Chouinard, 2016; Tuck & Yang, 2014) such that communities of color, of indigenous peoples, and of people with disabilities, have long called for “nothing about us without us” (Charlton, 1998).

Research also needs to further explore the conflicting purposes between accountability and organizational learning. Empirical work on constraints to learning and how they might be overcome, remains limited, with many nonprofit leaders reporting they feel underprepared to take on this responsibility (Mitchell & Berlan, 2016). Extant scholarship often misses the fact that organizational capacity for evaluation is distinct from the capacity to use the results of these efforts. Creating an organizational culture that values meaningful evidence is critical to data use and to better understanding and avoiding negative unintended consequences (Bryan et al., 2020; Cousins et al., 2014; Taylor-Ritzler et al., 2013). This is the type of critical thinking that evaluative practice can support when narrow conceptions of instrumental use are set aside in favor of reflection, learning, and inclusion of diverse perspectives.

Using Which Criteria?

Identifying criteria for judging merit or worth is one of the central tasks in evaluation. Because criteria are informed by values about what is good or worthy, judgment requires being attentive to those underlying values. One central dilemma facing nonprofit leaders is how to identify and prioritize those values and the related criteria used by stakeholders to judge their organizations. Doing so is even more challenging because the outcomes movement, driven largely by funders, has institutionalized the idea that achieving intended program outcomes is the most legitimate criterion for judging effectiveness (Brest, 2020). And because funders have an outsized influence in defining these outcomes, as noted above, their criteria and values are often dominant in determining nonprofit worth or merit (e.g., Benjamin, 2008a; Cutt & Murray, 2000; Ebrahim, 2005; Mitchell et al., 2020). This emphasis on intended outcomes makes it harder to consider other criteria, including those that emerge in partnership with communities.⁵

Nonprofit research further elaborates on the challenges and limitations of using goal attainment measures, such as intended program outcomes, as the primary criterion for judging nonprofits (e.g., Campbell, 2002; Ebrahim, 2019). For example, humanitarian organizations operating in crisis contexts must focus on the delivery of short-term outputs such as food, water, temporary shelter, and medical services, without necessarily aiming to achieve longer term outcomes directly. Moreover, some nonprofits cocreate their outcomes with partner organizations as well as with participants and communities themselves, making it difficult to define these outcomes at the outset because they emerge while working in partnership (e.g., Benjamin & Campbell, 2015). Relatedly, measurable program outcomes may miss expressive roles of the sector (e.g., Knutsen & Brower, 2010; S. R. Smith, 2010) and using standardized criteria risks undermining nonprofit innovation and experimentation (Hwang & Powell, 2009; Phillips & Carlan, 2018). Finally, using program outcomes as the main criterion can miss more subtle social processes within these organizations that redress or reinforce inequity (Benjamin, 2022).

Informed by diverse disciplinary training, evaluation scholars take an expansive view of judging merit or worth, recognizing that values “are omnipresent in social programming” (Shadish et al., 1991, p. 47). Values are embedded in the organizational and political system in which the program is created and administered, as well as in stakeholders’ commitment to the importance of the problem and its solution and in the evaluation itself, including its methodologies and particular purposes (e.g., Chambers, 1994; Greene, 2013; House, 1980; Schwandt, 2015). For example, evaluation scholars call attention to how White supremacy, systemic racism, and colonialism have shaped evaluation and suggest ways to address this (e.g., Bowman, 2020; Caldwell & Bledsoe, 2019; Cavino, 2013; Chouinard, 2016; Dean-Coffey, 2018; LaFrance & Nichols, 2008; Stanfield, 1999; Thomas & Campbell, 2020; Thomas et al., 2018). Given that values implicitly or explicitly inform the criteria used to judge programs, this body of scholarship suggests ways in which evaluation can make those values, and their implications, explicit and thereby open to analysis.

To start, evaluation scholarship outlines the decisions required to reach a final assessment or judgment about a program: (a) Generating a set of possible criteria and deciding which criteria are relevant (e.g., outputs, outcomes, equity, efficiency, cultural relevance, responsiveness to participants). (b) Deciding what benchmark needs to be met on each criterion (e.g., is 60% of participants agreeing that the nonprofit is responsive considered acceptable or is 85%?); (c) Deciding how to synthesize the evidence related to multiple criteria (e.g., weighting or ranking system, deliberation and consensus, holistic); and (d) Deciding who should make these decisions and how (e.g., outside evaluator, nonprofit managers, funders, other stakeholders; Schwandt, 2015, p. 49).⁶ As we note below, in nonprofit evaluation, this valuing process is often implicit with the result that these distinct decisions are never discussed.

Evaluation scholars give consideration to each of these steps. For example, scholars point to the problems that result from using intended program outcomes as a criterion. Intended outcomes tend to reflect the perspectives and interests of decision makers rather than participants and often fail to account for unintended consequences of programs (Abma et al., 2020; Kushner, 2000; Madison, 1992; Mathison, 2005; Scriven, 1991). Some scholars suggest the need to focus on the experience of program stakeholders to understand quality, arguing that looking at intended outcomes tells us little about what is actually going on in a program (Stake, 2004, p. 89). Other evaluation scholars give attention to processes for generating and weighing criteria (e.g., House & Howe, 2000). For example, a prescriptive approach elevates one ethical value, such as equity or social justice, while a more descriptive approach describes and considers all stakeholders’ values without elevating one over another (Shadish et al., 1991, pp. 47–49).⁷

Bringing together the nonprofit and evaluation scholarship thus points to the need for explicit attention to values: values of stakeholders, values implicit in social change efforts, and values in organizations themselves. We see several possible lines of inquiry that could guide research on the “valuing of nonprofits.” Given space constraints, we focus here only on three possibilities.

First, nonprofit scholars could theorize more deeply on a non-instrumental understanding of social impact. An instrumental view treats nonprofit organizations as a means to other ends, that is, nonprofits are valuable if they produce a certain number of affordable housing units (Frumkin, 2002; Kramer, 1987). A non-instrumental view recognizes that the process or approach nonprofits take to working with communities can create different kinds of desired outcomes, including those where community leadership is recognized and supported to demand actions by government and other institutions that reflect their concerns (e.g., Dodge & Ospina, 2016; Mosley, 2011; D. D. H. Smith, 1999; S. R. Smith, 2010). Relatedly, nonprofit researchers could further probe how values—such as dignity, market logic, or white normativity—not only infuse organizations (Chen et al., 2013) but also how they shape strategy and social impact (Barman, 2015; Doan & Knight, 2020; Feit, 2019). With a noninstrumental view of nonprofit social impact, researchers might explore conflicts between expressive and instrumental purposes. We already know that too much emphasis on short-term instrumental results can undermine expressive work such as building grassroots community leadership (Benjamin, 2008b), but we could start to examine how other expressive work, such as engaging volunteers, affects the experience of those individuals, families, and communities that are intended to benefit directly from nonprofit initiatives (Horvath, 2020).

Second, nonprofit research can offer a clearer picture of how stakeholders approach questions of valuing, again building on previous work (e.g., Herman & Renz, 1999). Such research requires not only an understanding of how stakeholder values inform criteria but also how stakeholders rank these criteria and how they synthesize evidence to come to a final assessment. Making this process more explicit would enable nonprofit scholars to understand how different stakeholder groups, such as funders and participants or communities, approach this process. For example, prior research has shown that nonprofit leaders can have different priorities for neighborhood development than residents (Kissane & Gingerich, 2004). Other research suggests that nonprofit leaders strategically use the priorities of communities to negotiate criteria with funders (e.g., Ospina et al., 2002). More recent studies find that, as relationships between nonprofits and their funders evolve, funders sometimes relax their criteria in favor of those preferred by the nonprofit (Lall, 2019).

Third, nonprofit scholars could examine the valuing process embedded in sector-level standards. A wide spectrum of standards currently exists including voluntary codes of conduct (Gugerty, 2009; Kunugi & Schweitz, 1999); “club” standards required for membership in a selective group or association (Gugerty & Prakash, 2010); auditable standards, common in financial accounting but increasingly being adopted for assessing environmental, social, and governance (ESG) behavior (Barman, 2015; Lall, 2017); and standards for shared output and outcome metrics by industry or sector (McCreless et al., 2014). How do these standards condition the criteria viewed as valid? Could standards be used to expand the range of evidence condidered valid? If standards are understood as establishing a shared basis for judging value or worth, scholarship might examine the process of standard creation, the content they embody, whose values they represent, and how conflicting views and values get surfaced and addressed.

With What Evidence and Methods?

Finally, we turn to the fourth key question of evaluation: What evidence is needed to ascertain that nonprofits are “making a difference”? What are the appropriate methods for gathering and analyzing evidence? At the heart of these questions lies the enduring dilemma of how to establish credible evidence through evaluation. The institutionalized response—that the most credible evidence is that which proves nonprofit initiatives caused measurable results—rewards the use of practices that have been evaluated through experimental methods such as RCTs (Mosley et al., 2019). The rise of evidence registries (such as Cochrane Library, Campbell Collaborative, 3ie, etc.) further institutionalizes these methods (Prewitt et al., 2012). Although many nonprofits do not have the scale or resources to undertake experimental studies, the institutionalization of RCTs as a so-called “gold standard” shapes expectations about what constitutes credible evidence (Center for Global Development, 2006; Eyben et al., 2015).⁸

This institutionalized view, however, is increasingly being challenged by nonprofit practitioners and scholars who seek to identify methods that are better suited to diverse organizational realities and who argue that experimental methods of impact evaluation are expensive to conduct, take too long to yield timely information, are unsuitable for all types of social interventions, are not very helpful for mid-course correction, and ignore racialized power dynamics (Chambers et al., 2009; Dichter et al., 2016; Khagram et al., 2009; Mosley et al., 2019; Trelstad, 2008; Whittle, 2013). Research has also shown that nonprofit leaders lack the capacity to undertake such “rigorous” evaluation of program outcomes. Many organizations lack expertise and resources to invest in evaluating impact (Carman, 2007; Carman & Fredericks, 2010) are not attempting any kind of causal attribution (Hoefer, 2000), and may lack the evaluation “culture” to support evaluation efforts (Mitchell & Berlan, 2016). In short, managers need support in finding the “right fit” between their goals and needs for evidence (Gugerty & Karlan, 2018)

The evaluation literature has arrived at a more pluralistic view on the generation of credible evidence, noting that “all methods are not equally good for all tasks, so the task is to sort out the strengths and weaknesses of methods for different purposes” and further cautioning that “no method is routinely feasible and unbiased, so no study is ever free of flaws” (Shadish et al., 1991, p. 42).⁹ Although early evaluation scholars developed quasi-experimental alternatives to randomized control trials (such as matched comparison and regression discontinuity), these methods retained a focus on attribution. Approaches that emphasize causal attribution have been critiqued on the grounds that their positivist underpinnings lead them to oversimplify causality and fail to account adequately for political, social, and institutional context (Chambers et al., 2009; Khagram et al., 2009; Pawson, 2013; Pawson & Tilley, 1997; Virtanen & Uusikylä, 2004). Qualitative approaches to evaluation emerged throughout the 1980s and 1990s, drawing from theoretical traditions including interpretivism, hermeneutics and social constructivism (Schwandt, 2000). These approaches took human action as inherently meaningful, something that must be understood from the actor’s point of view and interpreted in the context in which it is undertaken. This led to a range of more inductive approaches to understanding complex social change processes (e.g., Glaser, 1998; Guba & Lincoln, 1989).

More recently, the evaluation field has devoted increasing attention to methodologies for assessing “contribution” rather than “attribution” (Ebrahim, 2019, pp. 229–236; Kane et al., 2021; Raynor et al., 2021). Contribution-based methodologies are more appropriate for examining social change in complex systems where it is not feasible to create an experiment with a control group, to sufficiently isolate causal mechanisms, or to establish an observable counterfactual (Lemire et al., 2012; Mayne, 2001, 2011, 2012; Rogers, 2007, 2009). A range of such methods have emerged over the years, including contribution analysis, process tracing, outcome harvesting, and outcome mapping (Beach & Pedersen, 2013; Befani & Mayne, 2014; Bennett, 2010; Bennett & Checkel, 2015; Davies & Dart, 2005; Earl et al., 2001; Wilson-Grau & Britt, 2012). This growth in attention to multiple methodologies is part of a larger trend in the evaluation literature toward recognition and respect for diverse approaches, what Cook presciently called “multiplism,” to strengthen findings (Cook, 1985; Greene et al., 2001).

While the field of evaluation has generated relatively pluralistic approaches to the credibility of evidence, some evaluation scholars note that the field still tends to privilege western, colonial, and White-dominant perspectives to the exclusion of knowledge and perspectives of black, indigenous, and scholars of color from around the globe (Chouinard, 2016; Shanker, 2019) Evaluation scholarship cautions that the credibility of evidence “is subject to interpretation by the relevant reference group charged with making that judgment” (Schwandt, 2015, p. 73). But who is the relevant reference group—is it professional evaluators, funders, organizational leaders, frontline staff, participants, communities, or some combination? Or, as the renowned development studies scholar, Robert Chambers (1999), often asked, “Whose reality counts?.” This question in the evaluation field has spawned a movement to reconsider and reclaim the Afro-centric and African American roots of evaluation (Chilisa et al., 2016; Hood & Hopson, 2008) and to develop alternative approaches, including multicultural and culturally relevant evaluation (Bledsoe & Donaldson, 2014; Kirkhart, 2010) including ones that employ indigenous-centered methodologies (Bowman, 2020; Cavino, 2013; LaFrance & Nichols, 2008; L. T. Smith, 2012).

What do these developments in evidence and methodology suggest for nonprofit research? We identify at least three broad directions for nonprofit scholars. First, scholars can take seriously Shadish et al.’s (1991, p. 42) suggestion “to sort out the strengths and weaknesses of methods for different purposes” with particular attention to the nonprofit organizational context. What methods of evaluation do nonprofits currently use, and why? How can methods be chosen and developed to better align with organizational strategy and recognize the diverse ways nonprofits partner with communities? The evaluation literature suggests that methods aligned with cultural and organizational norms are more likely to feed into decision-making. Recent scholarship provides frameworks and principles to guide nonprofit managers in identifying the best fit among measurement methods (Gugerty & Karlan, 2018) and sorting methods and tools by different stages of funder decision-making (Ebrahim, 2019, pp. 208–240). These efforts just scratch the surface of the growing range of evaluative methods and innovations, indicating a need for nonprofit scholars to examine diverse methods and purposes.

Second, there remains a dearth of quality research on the nature of evidence and methods that reflect the knowledge and perspectives of beneficiaries and communities—particularly marginalized or underrepresented groups including Black, indigenous and communities of color as well as persons with disabilities (Mertens et al., 1994). What does “credibility” in evidence look like when the lived experiences of beneficiaries and communities tell a different story than the causal inferences drawn from expert-driven evaluations? What constitutes culturally-relevant rigor and validity? What can be learned from participatory methodologies in creating credible evidence based on the experiences of frontline staff and clients? Recent research using in-depth qualitative methods uncovers the “hidden work” of social change, substantiating a belief by nonprofit staff that dominant methods do not adequately capture what they do or the difference they make (Benjamin, 2012, 2022; Benjamin & Campbell, 2015; Rayner & Bonnici, 2021). This work provides examples of how participants’ definition of, and pathway to, significant change does not always align with the stated outcomes of the organization. In exploring this line of inquiry, nonprofit researchers might examine related evaluation scholarship that centers on the experience of participants (Abma et al., 2020; Center for Evaluation Innovation, Institute for Foundation and Donor Learning, Dorothy A Johnson Center for Philanthropy, & Luminare Group, 2017; Cousins & Whitmmore, 1998; Fetterman, 2005; VanderPlaat, 1995), widens the understanding of validity, and provides more open and responsive methodologies that recognize diverse ways of knowing and alternative forms of evidence (e.g., Cavino, 2013; Griffith & Montrosse-Moorhead, 2014; House, 1980; Thomas & Campbell, 2020). Recent efforts around beneficiary feedback aim to better understand the experiences of beneficiaries, using methodologies such as “lean data” and “constituent voice” (Dichter et al., 2016; Twersky et al., 2013), but there has been little scholarly study of the effects of such methods, and the basis upon which they establish credible evidence.

Third, and perhaps the least studied, is the range of new methodologies—based on big data and artificial intelligence—for making sense of large, complex data. These approaches do not rely on traditional scientific methods of hypothesis testing for drawing causal inferences. Instead, their strength lies in uncovering relationships through the detection of patterns in large data sets. Their potential value rests in helping us to both describe and understand systems, such as those involving climate change and racial justice, that cannot be broken down into a simple set of causal relationships. Their practical value for social change resides in developing more accurate classifications of change agent populations such as nonprofit organizations (Santamarina et al., 2021) and in identifying levers for intervening in complex systems, despite only a poor understanding of cause and effect (see Burns & Worsley, 2015; Meadows, 2008; Miller & Page, 2007; Siegenfeld & Bar-Yam, 2020). We know little about how to deploy such methods in ways that are feasible for nonprofit leaders, and the risks involved in using them, particularly their potential to reify past patterns of bias.

Conclusion: Toward a Meso-Level Theory of Nonprofit Evaluation

In this article, we set out to bring together the scholarship on evaluation with the literature on nonprofit organizations. Our aim was to propose a research program that we believe will lay the groundwork for a meso-level theory of nonprofit evaluation—to galvanize research that supports nonprofit leaders in productively engaging evaluation to advance their abilities to contribute to social change. This is an applied agenda for research and theory-building, responsive to the practical dilemmas facing nonprofit leaders and their organizations as they seek to generate social impact. Our research agenda is explicitly and normatively motivated by a desire for enabling greater agency—of nonprofit leaders, staff, communities—to pursue evaluation that better meets their needs and ultimately results in greater social impact for communities.

We have only just begun mapping the contours of such a research program. The four key questions that form the center of evaluative practice—what to evaluate, for what purpose, using which criteria, and with what evidence and methods—provide the basic scaffolding for the research program necessary to develop this body of theory. Each component of this scaffold (our research agenda) is grounded in the practice dilemmas that nonprofit leaders must address in deploying evaluation: defining the unit of analysis, addressing competing uses and demands, working with multiple criteria, and establishing the credibility of evidence. We hope that our effort to explicate these dilemmas, and the questions for future research that we pose, will open up new ways of seeing, valuing, and using evaluation as a plural set of approaches and mindsets in the service of social change and equity.

In building on this scaffolding, our research program offers several analytical supports toward constructing a meso-level theory of nonprofit evaluation. First, our research program is centered on the organization, rather than the program or project. Instead of one overarching approach to organizational evaluation, we envision approaches that are attentive to the specific needs of different types of nonprofit organizations. Second, a nonprofit theory of evaluation takes seriously the values-based core of nonprofits, inquiring not just about the instrumental results that organizations might produce, but also about the expressive values embodied in how they engage constituents and thus shape the experience of dignity and status of individuals and communities. Third, such a theory recognizes and values knowledge that goes beyond evaluation experts to center the knowledges of participants, as well as that of managers and staff. If nonprofit organizations can be understood as sites and vehicles of community-building, identity creation, agency in society, then we need an epistemology of evaluation that goes beyond the expertise of the evaluator. And finally, a theory of nonprofit evaluation requires explicit attention to issues of use—how data and information are produced, how and by whom they are used, and who ultimately owns, controls, and benefits from evaluative knowledge.

Footnotes

Appendix

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Lehn M. Benjamin

Notes

Author Biographies

Lehn M. Benjamin is an Associate Professor of Philanthropic Studies at the Lilly Family School of Philanthropy and an Affiliate Faculty at the Paul H. O’Neill School of Public and Environmental Affairs at Indiana University, Indianapolis.

Alnoor Ebrahim is a Professor of Management, The Fletcher School and The Tisch College of Civic Life, Tufts University. He is author of the award-winning book, Measuring Social Change: Performance and Accountability in a Complex World.

Mary Kay Gugerty is the Nancy Bell Evans Professor of Nonprofit Management & Philanthropy and Associate Dean for Teaching & Learning at the Evans School of Public Policy & Governance at the University of Washington. She is the co-author of the award winning book, The Goldilocks Challenge: Right-Fit Evidence for the Social Sector.

References

Abma

T. A.

(2006). The social relations of evaluation. In Shaw

Greene

Mark

(Eds.), The SAGE handbook of evaluation (pp. 185–199). SAGE.

Abma

T. A.

Visse

Hanberger

Simons

Greene

J. C.

(2020). Enriching evaluation practice through care ethics. Evaluation, 26(2), 131–146.

Alkin. (2013). Evaluation roots: Tracing theorists′ views and influences (2nd ed.). SAGE.

American Evaluation Association. (2011). American Evaluation Association statement on cultural competence in evaluation. https://www.eval.org/About/Competencies-Standards/Cutural-Competence-Statement

Anthony

R. N.

Young

D. W.

(2004). Financial accounting and financial management. In Herman

R. D.

(Ed.), The Jossey-Bass handbook of nonprofit leadership and management (2nd ed., pp. 466–512). Jossey-Bass.

Balboa

C. M.

(2018). The paradox of scale: How NGOs build, maintain, and lose authority in environmental governance. The MIT Press.

Banerjee

A. V.

Amsden

A. H.

Bates

R. H.

Bhagwati

J. N.

Deaton

Stern

(2007). Making aid work. The MIT Press.

Banerjee

A. V.

Duflo

(2009). The experimental approach to development economics. Annual Review of Economics, 1, 151–178.

Barman

(2015). Of principle and principal: Value plurality in the market of impact investing. Valuation Studies, 3(1), 9–44.

10.

Barman

MacIndoe

(2012). Institutional pressures and organizational capacity: The case of outcome measurement. Sociological Forum, 27(1), 70–93.

11.

Beach

Pedersen

R. B.

(2013). Process-tracing methods: Foundations and guidelines (1st ed.). University of Michigan Press.

12.

Bebbington

Riddell

(1997). Heavy hands, hidden hands, holding hands?: Donors, intermediary NGOs and civil society organisations. In Hulme

Edwards

(Eds.), NGOs, states and donors: Too close for comfort? (pp. 107–127). St. Martin’s Press with Save the Children.

13.

Befani

Mayne

(2014). Process tracing and contribution analysis: A combined approach to generative causal inference for impact evaluation. IDS Bulletin, 45(6), 17–36.

14.

Behn

R. D.

(2014). The PerformanceStat potential: A leadership strategy for producing results. Brookings Institution Press.

15.

Beljean

(n.d.). Five key questions for a sociology of evaluation: A review essay (Working Paper).

16.

Benjamin

L. M.

(2008a). Account space: How accountability requirements shape nonprofit practice. Nonprofit & Voluntary Sector Quarterly, 37(2), 201–223.

17.

Benjamin

L. M.

(2008b). Bearing more risk for results: Performance accountability and nonprofit relational work. Administration & Society, 39(8), 959–983.

18.

Benjamin

L. M.

(2010). Funders as principals: Performance measurement as philanthropic relationships. Nonprofit Management & Leadership, 20(4), 383–403.

19.

Benjamin

L. M.

(2012). Nonprofit organizations and outcome measurement: From tracking program activities to focusing on frontline work. American Journal of Evaluation, 33(3), 431–447.

20.

Benjamin

L. M.

(2021a). Bringing beneficiaries more centrally into nonprofit management and research. Nonprofit and Voluntary Sector Quarterly, 50(1), 5–26.

21.

Benjamin

L. M

. (2021b). Beyond programs: Toward a fuller picture of beneficiaries in nonprofit evaluation. In Dahler-Larsen

(Ed.), A research agenda for evaluation (pp. 81–103). Edgar Elgar Publishing.

22.

Benjamin

L. M.

(2022). Status processes in nonprofit organizations and the consequences for inequality. Special issue on status and inequality. Russell Sage Journal, 210–227. https://doi.org/10.7758/RSF.2022.8.7.11

23.

Benjamin

L. M.

Campbell

D. A.

(2020). Evaluation and performance measurement. In Anheier

H. K.

Toepler

(Eds.), Routledge companion to nonprofit management (pp. 197–212). Routledge.

24.

Benjamin

L. M.

Campbell

D. C.

(2015). Nonprofit performance: Accounting for the agency of clients. Nonprofit and Voluntary Sector Quarterly, 44(5), 988–1006.

25.

Benjamin

L. M.

Voida

Bopp

(2017). Policy fields, data systems, and the performance of nonprofit human service organizations. Human Service Organizations: Management, Leadership & Governance, 42(2), 185–204.

26.

Bennett

(2010). Process tracing and causal inference. In Brady

H. E.

Collier

(Eds.), Rethinking social inquiry: Diverse tools, shared standards (2nd ed., pp. 207–220). Rowman and Littlefield.

27.

Bennett

Checkel

J. T.

(Eds.). (2015). Process tracing. Cambridge University Press.

28.

Bledsoe

Donaldson

S. I.

(2014). Culturally responsive theory-driven evaluation. In Hood

Hopson

Frierson

(Eds.), Continuing the journey to reposition culture and cultural context in evaluation theory (pp. 3–28). Information Age Publishing.

29.

Boltanski

Thévenot

(2006). On justification: Economies of worth (English trans.). Princeton University Press.

30.

Bovaird

Loeffler

(2012). From engagement to co-production: The contribution of users and communities to outcomes and public value. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 23(4), 1119–1138.

31.

Bowman

(Waapalaneexkweew, Mohican/Lunaape). (2020). Nation-to-nation in evaluation: Utilizing an indigenous evaluation model to frame systems and government evaluations. New Directions for Evaluation, 166, 101–118.

32.

Breen

O. B.

Dunn

Sidel

(2019). Riding the regulatory wave: Reflections on recent explorations of the statutory and nonstatutory nonprofit regulatory cycles in 16 jurisdictions. Nonprofit and Voluntary Sector Quarterly, 48(4), 691–715.

33.

Brest

(2020). The outcomes movement in philanthropy and the nonprofit sector. In Powell

W. W.

Bromley

(Eds.), The nonprofit sector: A research handbook (3rd ed., pp. 381–408). Stanford University Press.

34.

Bromley

Meyer

J. W.

(2017). “They are all organizations”: The cultural roots of blurring between the nonprofit, business, and government sectors. Administration & Society, 49(7), 939–966.

35.

Bromley

Powell

W. W.

(2012). From smoke and mirrors to walking the talk: Decoupling in the contemporary world. Academy of Management Annals, 6(1), 483–530.

36.

Brown

L. D.

(2007). Multiparty social action and mutual accountability. In Ebrahim

Weisband

(Eds.), Global accountabilities: Participation, pluralism, and public ethics (pp. 87–111). Cambridge University Press.

37.

Brown

W. A.

(2014). Strategic management in nonprofit organizations. Jones & Bartlett Publishers.

38.

Brown

W. A.

(2016). Strategic management. In Renz

D. O.

(Ed.), The Jossey-Bass handbook of nonprofit leadership and management (4th ed., pp. 217–239). John Wiley & Sons.

39.

Bryan

T. K.

(2019). Toward a contingency model for the relationship between capacity and effectiveness in nonprofit organizations. Nonprofit and Voluntary Sector Quarterly, 48(4), 885–897.

40.

Bryan

T. K.

Robicheau

R. W.

L’Esperance

G. E.

(2020). Conducting and utilizing evaluation for multiple accountabilities: A study of nonprofit evaluation capacities. Nonprofit Management and Leadership, 31(3), 547–569.

41.

Bryson

J. M.

(2016). Strategic planning and the strategy change cycle. In Renz

D. O.

(Ed.), The Jossey-Bass handbook of nonprofit leadership and management (4th ed., pp. 240–273). John Wiley & Sons.

42.

Buckmaster

(1999). Associations between outcome measurement, accountability and learning for non-profit organisations. International Journal of Public Sector Management, 12(2), 186–197.

43.

Burns

Worsley

(2015). Navigating complexity in international development: Facilitating sustainable change at scale. Practical Action Publishing.

44.

Caldwell

L. D.

Bledsoe

K. L.

(2019). Can social justice live in a house of structural racism? A question for the field of evaluation. American Journal of Evaluation, 40(1), 6–18.

45.

Campbell

(2002). Outcomes assessment and the paradox of nonprofit accountability. Nonprofit Management and Leadership, 12(3), 243–259.

46.

Campbell

D. A.

Lambright

K. T.

(2016). Program performance and multiple constituency theory. Nonprofit and Voluntary Sector Quarterly, 45(1), 150–171.

47.

Campos

Andion

Serva

Rossetto

Assumpção

(2011). Performance evaluation in non-governmental organizations (NGOs): An analysis of evaluation models and their applications in Brazil. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 22(2), 238–258.

48.

Carswell

Kothari

Peter

(2021). Reflections on effective services: the art of evidence-based programming. Voluntary Sector Review, 12(2), 277–288. https://doi.org/10.1332/204080520X15893044346921

49.

Carman

J. G.

(2007). Evaluation practice among community-based organizations: Research into the reality. American Journal of Evaluation, 28(1), 60–75.

50.

Carman

J. G.

Fredericks

K. A.

(2010). Evaluation capacity and nonprofit organizations: Is the glass half-empty or half-full? American Journal of Evaluation, 31(1), 84–104.

51.

Carman

J. G.

Fredericks

K. A.

Introcaso

(2008). Government and accountability: Paving the way for nonprofits and evaluation. New Directions for Evaluation, 2008(119), 5–12.

52.

Cavino

H. M.

(2013). Across the colonial divide: Conversations about evaluation in Indigenous contexts. American Journal of Evaluation, 34(3), 339–355.

53.

Center for Evaluation Innovation, Institute for Foundation and Donor Learning Dorothy A Johnson Center for Philanthropy, & Luminare Group. (2017). Equitable evaluation framing paper (p. 9). Equitable Evaluation Initiative. https://www.equitableeval.org/blog-main/2017/7/17/equitable-evaluation-framing-paper

54.

Center for Global Development. (2006). When will we ever learn? Improving lives through impact evaluation (The Evaluation Gap Working Group, p. 95). https://www.cgdev.org/publication/when-will-we-ever-learn-improving-lives-through-impact-evaluation

55.

Chambers

(1994). All power deceives. IDS Bulletin, 25(2), 14–26.

56.

Chambers

(1999). Whose reality counts?: Putting the first last (Illustrated ed.). Intermediate Technology Publications.

57.

Chambers

Karlan

Ravallion

Rogers

(2009). Designing impact evaluations: Different perspectives (Working Paper 4; p. 38). International Initiative for Impact Evaluation (3ie). https://www.3ieimpact.org/evidence-hub/publications/working-papers/designing-impact-evaluations-different-perspectives

58.

Charlton

J. I.

(1998). Nothing about us without us. University of California Press.

59.

Chen

H. T.

(1990). Theory-driven evaluations. SAGE.

60.

Chen

H. T.

(2005a). Practical program evaluation: Assessing and improving planning, implementation, and effectiveness. SAGE.

61.

Chen

H. T.

(2005b). Program theory. In Mathison

(Ed.), Encyclopedia of evaluation (pp.340–342). SAGE.

62.

Chen

H. T.

Rossi

P. H.

(1983). Evaluating with sense: The theory-driven approach. Evaluation Review, 7(3), 283–302.

63.

Chen

K. K.

Lune

Queen

E. L.

(2013). How values shape and are shaped by nonprofit and voluntary organizations: The current state of the field. Nonprofit and Voluntary Sector Quarterly, 42(5), 856–885.

64.

Chilisa

Major

T. E.

Gaotlhobogwe

Mokgolodi

(2016). Decolonizing and indigenizing evaluation practice in Africa: Toward African relational evaluation approaches. Canadian Journal of Program Evaluation, 30(3), 313–328. https://doi.org/10.3138/cjpe.30.3.05

65.

Chouinard

J. A.

(2016). Introduction: Decolonizing international development evaluation. Canadian Journal of Program Evaluation, 30(3), 237–247. https://doi.org/10.3138/cjpe.30.3.01

66.

Cook

T. D.

(1985). Postpositivist critical multiplism (Chapter X). In Shotland

R. L.

Mark

M. M.

(Eds.), Social science and social policy (pp. 21–62). SAGE.

67.

Cousins

J. B.

Goh

S. C.

Elliott

C. J.

Bourgeois

(2014). Framing the capacity to do and use evaluation. New Directions for Evaluation, 2014(141), 7–23.

68.

Cousins

J. B.

Whitmore

(1998). Framing participatory evaluation. New Directions for Evaluation, 1998(80), 5–23.

69.

Cutt

Murray

V. V.

(2000). Accountability and effectiveness evaluation in nonprofit organizations (Vol. 2). Routledge.

70.

Dahler-Larsen

(2012). The evaluation society. Stanford Business Books.

71.

Davies

Dart

(2005). The “Most Significant Change” (MSC) technique: A guide to its use (p. 104). https://www.mande.co.uk/wp-content/uploads/2005/MSCGuide.pdf

72.

Dean-Coffey

(2018). What’s race got to do with it? Equity and philanthropic evaluation practice. American Journal of Evaluation, 39(4), 527–542.

73.

Dichter

Adams

Ebrahim

(2016). The power of lean data. Stanford Social Innovation Review, 36–41.

74.

DiMaggio

P. J.

Powell

(1983). The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields. American Sociological Review, 48, 147–160.

75.

Doan

D. H.

Knight

(2020). Measuring what matters. https://globalfundcommunityfoundations.org/wp-content/uploads/2020/10/MeasuringWhatMatters.pdf

76.

Dodge

Ospina

(2016). Nonprofits as “schools of democracy”: A comparative case study of two environmental organizations. Nonprofit and Voluntary Sector Quarterly, 45, 478–499.

77.

Dubnick

M. J.

Justice

J. B.

(2004). Accounting for accountability [Paper presentation]. Annual meeting of the American Political Science Association, Chicago, IL, United States. http://mjdubnick.dubnick.net/papersrw/2004/dubjusacctg2004.pdf

78.

Earl

Carden

Smutylo

(2001). Outcome mapping: Building learning and reflection into development programs. IDRC. https://www.idrc.ca/en/book/outcome-mapping-building-learning-and-reflection-development-programs

79.

Ebrahim

(2003a). Making sense of accountability: Conceptual perspectives for northern and southern nonprofits. Nonprofit Management and Leadership, 14(2), 191–212.

80.

Ebrahim

(2003b). Accountability in practice: Mechanisms for NGOs. World Development, 31(5), 813–829.

81.

Ebrahim

(2005). Accountability myopia: Losing sight of organizational learning. Nonprofit and Voluntary Sector Quarterly, 34(1), 56–87.

82.

Ebrahim

(2019). Measuring social change: Performance and accountability in a complex world. Stanford Business Books.

83.

Ebrahim

Rangan

V. K.

(2014). What impact? A Framework for measuring the scale and scope of social performance. California Management Review, 56(3), 118–141.

84.

Ebrahim

Weisband

(2007). Global accountabilities: Participation, pluralism, and public ethics. Cambridge University Press.

85.

Edwards

Hulme

(1996). Too close for comfort? The impact of official aid on nongovernmental organizations. World Development, 24(6), 961–973.

86.

Eikenberry

A. M.

Kluver

J. D.

(2004). The marketization of the nonprofit sector: Civil society at risk? Public Administration Review, 64(2), 132–140.

87.

Eyben

Guijt

Roche

Shutt

(2015). The politics of evidence and results in international development: Playing the game to change the rules?. Practical Action Publishing.

88.

Feit

(2019). Addressing racial bias in nonprofit human resources (Chapter 6). In Eikenberry

A. M.

Mirabella

R. M.

Sandberg

(Eds.), Reframing nonprofit organizations: Democracy, inclusion, and social change (1st ed., pp. 66–78). Melvin & Leigh Publishers.

89.

Feldman

M. S.

March

J. G

. (1988). Information in organizations as signal and symbol. In March

J. G.

(Ed.), Decisions and organizations (1st ed., pp. 410–428). Basil Blackwell.

90.

Fetterman

D. M.

(2005). Empowerment evaluation. In Mathison

(Ed.), Encyclopedia of evaluation (pp. 125–129). SAGE.

91.

Fisher

(1998). Nongovernments: NGOs and the political development of the third world. Kumarian Press.

92.

Forbes

D. P.

(1998). Measuring the unmeasurable: Empirical studies of nonprofit organization effectiveness from 1977 to 1997. Nonprofit and Voluntary Sector Quarterly, 27(2), 183–202.

93.

Fremont-Smith

M. R.

(2004). Governing nonprofit organizations: Federal and state law and regulation. Belknap Press of Harvard University Press.

94.

Frumkin

(2002). On being nonprofit: A conceptual and policy primer. Harvard University Press.

95.

Furubo

J. E.

Rist

R. C.

Sandahl

(Eds.). (2002). International atlas of evaluation (1st ed.). Transaction Publishers.

96.

Gibelman

Gelman

S. R.

(2001). Very public scandals: Nongovernmental organizations in trouble. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 12(1), 49–66.

97.

Glaser

B. G.

(1998). Doing grounded theory: Issues & discussion. Sociology Press.

98.

Goodman

P. S.

Pennings

J. M.

(1977). New perspectives on organizational effectiveness (1st ed.). Jossey-Bass.

99.

Greene

J. C.

(1997). Evaluation as advocacy. Evaluation Practice, 18(1), 125–135.

100.

Greene

J. C.

(2013). Making the world a better place through evaluation (Chapter 16). In Alkin

M. C.

(Ed.), Evaluation roots: A wider perspective of theorists’ views and influence (2nd ed., pp. 208–217). SAGE.

101.

Greene

J. C.

Benjamin

Goodyear

(2001). The merits of mixing methods in evaluation. Evaluation, 7(1), 25–44.

102.

Griffith

J. C.

Montrosse-Moorhead

. (2014). The value in validity. New Directions for Evaluation 142, 17–30.

103.

Guba

E. G.

Lincoln

Y. S.

(1989). Fourth generation evaluation. SAGE.

104.

Gugerty

M. K.

(2009). Signaling virtue: Voluntary accountability programs among nonprofit organizations. Policy Sciences, 42(3), 243–273.

105.

Gugerty

M. K.

Karlan

(2014). Measuring impact isn’t for everyone (April 2, 2014). Stanford Social Innovation Review. https://ssir.org/articles/entry/measuring_impact_isnt_for_everyone

106.

Gugerty

M. K.

Karlan

(2018). The Goldilocks challenge: Right-fit evidence for the social sector. Oxford University Press.

107.

Gugerty

M. K.

Prakash

(Eds.). (2010). Voluntary regulation of NGOs and nonprofits: An accountability club framework. Cambridge University Press.

108.

Hansmann

(1996). The ownership of enterprise. Harvard University Press.

109.

Hardwick

Anderson

Cooper

(2015) How do third sector organisations use research and other knowledge? A systematic scoping review. Implementation Science, 10, 84. https://doi-org.offcampus.lib.washington.edu/10.1186/s13012-015-0265-6

110.

Harachi

T. W.

Abbott

R. D.

Catalano

R. F.

Haggerty

K. P.

Fleming

C. B.

(1999). Opening the black box: Using process evaluation measures to assess implementation and theory building. American Journal of Community Psychology, 27(5), 711–731.

111.

Heifetz

R. A.

(1998). Leadership without easy answers (1st ed.). Harvard University Press.

112.

Heifetz

R. A.

Laurie

D. L.

(2001, December). The work of leadership. Harvard Business Review. https://hbr.org/2001/12/the-work-of-leadership

113.

Herman

R. D.

Renz

D. O.

(1999). Theses on nonprofit organizational effectiveness. Nonprofit and Voluntary Sector Quarterly, 28(2), 107–126.

114.

Herman

R. D.

Renz

D. O.

(2008). Advancing nonprofit organizational effectiveness research and theory: Nine theses. Nonprofit Management and Leadership, 18(4), 399–415.

115.

Hirschman

A. O.

(1970). Exit, voice, and loyalty: Responses to decline in firms, organizations, and states. Harvard University Press.

116.

Hoefer

(2000). Accountability in action? Program evaluation in nonprofit human service agencies. Nonprofit Management & Leadership, 11(2), 167–177.

117.

Hood

(2004). A journey to understand the role of culture in program evaluation: Snapshots and personal reflections of one African American evaluator. New Directions for Evaluation, 102, 21–37.

118.

Hood

Hopson

R. K.

(2008). Evaluation roots reconsidered: Asa Hilliard, A Fallen Hero in the “nobody knows my name” project, and African educational excellence. Review of Educational Research, 78(3), 410–426.

119.

Hopson

R. K.

(2009). Reclaiming knowledge at the margins: Culturally responsive evaluation in the current evaluation moment. In Ryan

K. E.

Cousins

J. B.

(Eds.), The SAGE international handbook of educational evaluation (pp. 429–446). SAGE.

120.

Horvath

(2020). The transformative potential of experience: Learning, group dynamics, and the development of civic virtue in a mobile soup kitchen. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 31(5), 981–994.

121.

House

E. R.

(1980). Evaluating with validity. SAGE.

122.

House

E. R.

(2017). Evaluation and the framing of race. American Journal of Evaluation, 38(2), 167–189.

123.

House

E. R.

Howe

K. R.

(2000). Deliberative democratic evaluation. New Directions for Evaluation, 85, 3–12.

124.

Hwang

Powell

W. W.

(2009). The rationalization of charity: The influences of professionalism in the nonprofit sector. Administrative Science Quarterly, 54(2), 268–298.

125.

Innonet. (2016). State of evaluation: Evaluation practice and capacity in the nonprofit sector. Innovation Network. https://www.innonet.org/media/2016-State_of_Evaluation.pdf

126.

Kane

Levine

Orians

Reinelt

(2021). Contribution analysis: A promising method for assessing advocacy’s impact. New Directions for Evaluation, 171, 45–57.

127.

Kang

Anderson

S. G.

Finnegan

(2012). The evaluation practices of US international NGOs. Development in Practice, 22(3), 317–333.

128.

Kanter

R. M.

Brinkerhoff

(1981). Organizational performance: Recent developments in measurement. Annual Review of Sociology, 7(1), 321–349.

129.

Kanter

R. M.

Summers

D. V.

(1987). Doing well while doing good: Dilemmas of performance measurement in nonprofit organizations and the need for a multiple constituency approach. In Powell

(Ed.), The nonprofit sector: A research handbook (1st ed., pp. 154–166). Yale University Press.

130.

Kaplan

R. S.

(2001). Strategic performance measurement and management in nonprofit organizations. Nonprofit Management and Leadership, 11(3), 353–370.

131.

Kearns

K. P.

(1996). Managing for accountability: Preserving the public trust in public and nonprofit organizations (1st ed.). Jossey-Bass.

132.

Khagram

Thomas

Lucero

Mathes

(2009). Evidence for development effectiveness. Journal of Development Effectiveness, 1(3), 247–270.

133.

Kim

Charles

Pettijohn

(2019). Challenges in the use of performance data in management: Results of a national survey of human service nonprofit organizations. Public Performance & Management Review, 42(5), 1085–1111.

134.

Kirkhart

K. E.

(2000). Reconceptualizing evaluation use: An integrated theory of influence. New Directions for Evaluation, 88, 5–23.

135.

Kirkhart

K. E.

(2010). Eyes on the prize: Multicultural validity and evaluation theory. American Journal of Evaluation, 31(3), 400–413.

136.

Kissane

R. J.

Gingerich

(2004). Do you see what I see? Nonprofit and resident perceptions of urban neighborhood problems. Nonprofit and Voluntary Sector Quarterly, 33(2), 311–333.

137.

Knowlton

L. W.

Phillips

C. C.

(2012). Creating program logic models. In The logic model guidebook: Better strategies for great results (2nd ed., pp. 35–48). SAGE.

138.

Knutsen

W. L.

Brower

R. S.

(2010). Managing expressive and instrumental accountabilities in nonprofit and voluntary organizations: A qualitative investigation. Nonprofit and Voluntary Sector Quarterly, 39(4), 588–610.

139.

Kramer

R. M.

(1987). Voluntary agencies and the personal social services. In Powell

W. W.

(Ed.), The nonprofit sector: A research handbook (1st ed., pp. 244–257). Yale University Press.

140.

Krauskopf

Chen

(2010). Administering services and managing contracts: The dual role of government human services officials on JSTOR. Journal of Policy Analysis and Management, 29(3), 625–628.

141.

Kunugi

Schweitz

(1999). Codes of conduct for partnership governance: Text and commentaries. United Nations University.

142.

Kushner

(2000). Personalizing evaluation. SAGE.

143.

LaFrance

Nichols

(2008). Reframing evaluation: Defining an indigenous evaluation framework. Canadian Journal of Program Evaluation, 23(2), 13–31.

144.

Lall

(2017). Measuring to improve versus measuring to prove: Understanding the adoption of social performance measurement practices in nascent social enterprises. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 28(6), 2633–2657.

145.

Lall

(2019). From legitimacy to learning—How impact measurement perceptions and practices evolve in social enterprise—Social finance organization relationships. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 30(3), 562–577.

146.

Lamont

(2012). Toward a comparative sociology of valuation and evaluation. Annual Review of Sociology, 38(1), 201–221.

147.

Lecy

J. D.

Schmitz

H. P.

Swedlund

(2012). Non-governmental and not-for-profit organizational effectiveness: A modern synthesis. Voluntas: International Journal of Voluntary and Nonprofit Organizations, 23, 434–457. https://doi.org/10.1007/s11266-011-9204-6

148.

Lemire

S. T.

Nielsen

S. B.

Dybdal

(2012). Making contribution analysis work: A practical framework for handling influencing factors and alternative explanations. Evaluation, 18(3), 294–309.

149.

LeRoux

Wright

N. S.

(2010). Does performance measurement improve strategic decision making? Findings from a national survey of nonprofit social service agencies. Nonprofit and Voluntary Sector Quarterly, 39(4), 571–587.

150.

Liket

K. C.

Maas

(2015). Nonprofit organizational effectiveness: Analysis of best practices. Nonprofit & Voluntary Sector Quarterly, 44(2), 268–296.

151.

Lindenberg

Bryant

(2001). Going global: Transforming relief and development NGOs. Kumarian Press.

152.

Mowbray

C. T.

Holter

M. C.

Teague

G. B.

Bybee

(2003). Fidelity criteria: Development, measurement, and validation. American Journal of Evaluation, 24(3), 315–340. https://doi.org/10.1177/109821400302400303

153.

Macdonald

(2007). Public accountability within transnational supply chains: A global agenda for empowering southern workers?. In Ebrahim

Weisband

(Eds.), Global accountabilities: Participation, pluralism, and public ethics (pp. 1–23). Cambridge University Press.

154.

Madison

A. M.

(2007). New directions for evaluation coverage of cultural issues and issues of significance to underrepresented groups. New Directions for Evaluation, 114, 107–114.

155.

Madison

A. M.

(1992). Primary inclusion of culturally diverse minority program participants in the evaluation process. New Directions for Program Evaluation, 53, 35–43.

156.

Maier

Meyer

Steinbereithner

(2016). Nonprofit organizations becoming business-like: A systematic review. Nonprofit and Voluntary Sector Quarterly, 45(1), 64–86.

157.

Mair

Hehenberger

(2014). Front-stage and backstage convening: The transition from opposition to mutualistic coexistence in organizational philanthropy. Academy of Management Journal, 57(4), 1174–1200.

158.

Marshall

J. H.

Suárez

(2014). The flow of management practices: An analysis of NGO monitoring and evaluation dynamics. Nonprofit and Voluntary Sector Quarterly, 43(6), 1033–1051.

159.

Martinez

D. E.

Cooper

D. J.

(2020). Seeing through the logical framework. VOLUNTAS: International Journal of Voluntary and Nonprofit Organizations, 31, 1239–1253.

160.

Mathison

(Ed.). (2005). Goal-free evaluation. In Encyclopedia of evaluation (p. 171). SAGE.

161.

Mayne

(2001). Addressing attribution through contribution analysis: Using performance measures sensibly. Canadian Journal of Program Evaluation, 16(1), 1–24.

162.

Mayne

(2011). Contribution analysis: Addressing cause and effect. In Forss

Marra

Schwartz

(Eds.), Evaluating the complex: Attribution, contribution and beyond (1st ed., pp. 53–96). Routledge.

163.

Mayne

(2012). Contribution analysis: Coming of age? Evaluation, 18(3), 270–280.

164.

McCreless

Fonzi

C. J.

Edens

Lall

(2014). Metrics 3.0: A new vision for shared metrics. Stanford Social Innovation Review. https://ssir.org/articles/entry/metrics_3.0_a_new_vision_for_shared_metrics

165.

Meadows

D. H.

(2008). Thinking in systems: A primer ( Wright

, Ed.). Sustainability Institute and Chelsea Green Publishing.

166.

Merchant

K. A.

Otley

D. T.

(2006). A review of the literature on control and accountability. In Chapman

C. S.

Hopwood

A. G.

Shields

M. D.

(Eds.), Handbook of management accounting research (pp. 785–802). Elsevier.

167.

Mertens

D. M.

Farley

Madison

Singleton

(1994). Diverse voices in evaluation practice: Feminists, minorities, and persons with disabilities. Evaluation Practice, 15(2), 123–129.

168.

Meyer

J. W.

Rowan

(1977). Institutionalized organizations: Formal structure as myth and ceremony. American Journal of Sociology, 83(2), 340–363.

169.

Meyer

M. L.

Louder

C. N.

Nicolas

(2021). Creating with, not for people: Theory of change and logic models for culturally responsive community based intervention. American Journal of Evaluation, 43(3), 378–393.

170.

Miller

J. H.

Page

S. E.

(2007). Complex adaptive systems: An introduction to computational models of social life. Princeton University Press.

171.

Mintzberg

(1978). Patterns in strategy formation. Management Science, 24(9), 934–948.

172.

Mintzberg

Waters

J. A.

(1985). Of strategies, deliberate and emergent. Strategic Management Journal, 6(3), 257–272.

173.

Mitchell

G. E.

(2013). The construct of organizational effectiveness: Perspectives from leaders of international nonprofits in the united states. Nonprofit and Voluntary Sector Quarterly, 42(2), 324–345.

174.

Mitchell

G. E.

Berlan

(2016). Evaluation and evaluative rigor in the nonprofit sector. Nonprofit Management and Leadership, 27(2), 237–250.

175.

Mitchell

G. E.

Schmitz

H. P.

Vijkeifen

T. B.

(2020). Between power and irrelevance: The future of transnational NGOs. Oxford University Press.

176.

Moore

M. H.

(2013). Recognizing public value (Illustrated ed.). Harvard University Press.

177.

Mosley

J. E.

(2011). Institutionalization, privatization, and political opportunity: What tactical choices reveal about the policy advocacy of human service nonprofits. Nonprofit and Voluntary Sector Quarterly, 40(3), 435–457.

178.

Mosley

J. E.

Marwell

N. P.

Ybarra

(2019). How the “what works” movement is failing human service organizations, and what social work can do to fix it. Human Service Organizations: Management, Leadership, & Governance, 43(4), 326–335.

179.

Najam

(1996). NGO accountability: A conceptual framework. Development Policy Review, 14(4), 339–354.

180.

Ospina

Diaz

O’Sullivan

J. F.

(2002). Negotiating accountability: Managerial lessons from identity-based nonprofit organizations. Nonprofit and Voluntary Sector Quarterly, 31(1), 5–31.

181.

Oster

S. M.

(1995). Strategic management for nonprofit organizations: Theory and cases (1st ed.). Oxford University Press.

182.

Ostrom

(1996). Crossing the great divide: Coproduction, synergy, and development. World Development, 24(6), 1073–1087.

183.

Patton

M. Q.

(1997). Utilization-focused evaluation: The new century text (3rd ed.). SAGE.

184.

Patton

M. Q.

(2011). Developmental evaluation: Applying Complexity concepts to enhance innovation and use. The Guilford Press.

185.

Patton

M. Q.

(2016). The state of the art and practice of developmental evaluation: Answers to common and recurring questions. In Patton

M. Q.

McKegg

Wehipeihana

(Eds.), Developmental evaluation exemplars: Principles in practice (Reprint ed., pp. 1–24). The Guilford Press.

186.

Pawson

(2013). The science of evaluation: A realist manifesto (1st ed.). SAGE.

187.

Pawson

Tilley

(1997). Realistic evaluation. SAGE.

188.

Pawson

Tilley

(2005). Realistic evaluation. In Mathison

(Ed.), Encyclopedia of evaluation (pp.362–367). SAGE.

189.

Pfeffer

Salancik

G. R.

(1978). The external control of organizations: A resource dependence perspective. Harper & Row.

190.

Phillips

Carlan

. (2018). On impact: Emerging challenges of evaluation for Canada’s nonprofit sector. In Seel

(Ed.), Management of nonprofit and charitable organizations in Canada (4th ed., pp. 345–379). Toronto: LexisNexus.

191.

Porter

M. E.

(1980). Competitive strategy: Techniques for analyzing industries and competitors. Free Press.

192.

Powell

W. W.

(1987). The nonprofit sector: A research handbook (1st ed.). Yale University Press.

193.

Powell

W. W.

Gammal

D. L.

Simard

(2005). Close encounters: The circulation and reception of managerial practices in the San Francisco Bay Area nonprofit community. In Czarniawska

Sevón

(Eds.), Global ideas: How ideas, objects and practices travel in a global economy (pp. 233–258). Liber & Copenhagen Business School Press.

194.

Prakash

Gugerty

M. K.

(2010). Trust but verify? Voluntary regulation programs in the nonprofit sector. Regulation & Governance, 4(1), 22–47.

195.

Prewitt

Schwandt

T. A.

Straf

M. L.

(2012). Using science as evidence in public policy. National Academies Press.

196.

Rangan

V. K.

(2004). Lofty missions, down-to-earth plans. Harvard Business Review, 82(3), 112–119.

197.

Rayner

Bonnici

(2021). The systems work of social change: How to harness connection, context, and power to cultivate deep and enduring change. Oxford University Press.

198.

Raynor

Coffman

Stachowiak

(2021). An introduction to policy advocacy evaluation: The concepts, history, and literature of the field. New Directions for Evaluation, 171, 11–18.

199.

Riddell

(1999). Evaluating NGO development interventions. In Lewis

(Ed.), International perspectives on voluntary action: Reshaping the third sector (pp. 222–241). Earthscan.

200.

Rogers

P. J.

(2007). Theory-based evaluation: Reflections ten years on. New Directions for Evaluation, 2007(114), 63–67.

201.

Rogers

P. J.

(2008). Using programme theory to evaluate complicated and complex aspects of interventions. Evaluation, 14(1), 29–48.

202.

Rogers

P. J.

(2009). Matching impact evaluation design to the nature of the intervention and the purpose of the evaluation (Working Paper 4; Designing impact evaluations: different perspective). 3ie (International Initiative for Impact Evaluation). https://www.3ieimpact.org/evidence-hub/publications/working-papers/designing-impact-evaluations-different-perspectives

203.

Rossi

P. H.

Freeman

H. E.

Rosenbaum

(1982). Evaluation: A systematic approach (2nd ed.). SAGE.

204.

Stufflebeam

D. L.

(1983). The CIPP model for program evaluation. In: Evaluation Models. Evaluation in Education and Human Services, vol 6. Springer, Dordrecht. https://doi-org.offcampus.lib.washington.edu/10.1007/978-94-009-6669-7_7

205.

Santamarina

F. J.

Lecy

J. D.

Van Holm

E. J.

(2021). How to code a million missions: Developing bespoke nonprofit activity codes using machine learning algorithms. Voluntas: International Journal of Voluntary and Nonprofit Organizations, Advance online Publication. https://doi.org/10.1007/s11266-021-00420-z.

206.

Sawhill

J. C.

Williamson

(2001). Mission impossible?: Measuring success in nonprofit organizations. Nonprofit Management and Leadership, 11(3), 371–386.

207.

Schwandt

T. A.

(2000). Three epistemological stances for qualitative inquiry: Interpretivism, hermeneutics and social constructivism (Chapter 7). In Denzin

N. K.

Lincoln

Y. S.

(Eds.), Handbook of qualitative research (2nd ed., pp. 189–213). SAGE.

208.

Schwandt

T. A.

(2015). Evaluation foundations revisited: Cultivating a life of the mind for practice. Stanford University Press.

209.

Scott

W. R.

(1977). Effectiveness of organizational effectiveness studies. In Goodman

P. S.

Pennings

J. M.

(Eds.), New perspectives on organizational effectiveness. Jossey-Bass.

210.

Scott

W. R.

(1992). Organizations: Rational, natural, and open systems (3rd ed.). Prentice Hall.

211.

Scriven

(1991). Prose and cons about goal-free evaluation. Evaluation Practice, 12(1), 55–76.

212.

Shadish

W. R.

Cook

T. D.

Leviton

L. C.

(1991). Foundations of program evaluation: Theories and practice. SAGE.

213.

Shanker

(2019). Definitional tension: The construction of race in and through evaluation. https://hdl-handle-net-s.web.bisu.edu.cn/11299/211799

214.

Siegenfeld

A. F.

Bar-Yam

(2020). An Introduction to complex systems science and its applications. Complexity, 2020, 16.

215.

Smith

D. H.

(1999). The effective grassroots association, part one. Nonprofit Management and Leadership, 9(4), 443–456.

216.

Smith

L. T.

(2012). Decolonizing methodologies: Research and indigenous peoples (2nd ed.). Zed Books.

217.

Smith

S. R.

(2010). Nonprofits and public administration: Reconciling performance management and citizen engagement. American Review of Public Administration, 40(2), 129–152.

218.

Smith

S. R.

Lipsky

(1993). Nonprofits for hire: The welfare state in the age of contracting. Harvard University Press.

219.

Snibbe

A. C.

(2006). Drowning in data. Stanford Social Innovation Review, 4(3), 39–45. https://ssir.org/articles/entry/drowning_in_data

220.

Sowa

J. E.

Selden

S. C.

Sandfort

J. R.

(2004). No longer unmeasurable? A multidimensional integrated model of nonprofit organizational effectiveness. Nonprofit & Voluntary Sector Quarterly, 33(4), 711–728.

221.

Speckbacher

(2003). The economics of performance management in nonprofit organizations. Nonprofit Management and Leadership, 13(3), 267–281.

222.

Speckbacher

(2013). The use of incentives in nonprofit organizations. Nonprofit and Voluntary Sector Quarterly, 42(5), 1006–1025.

223.

Stake

R. E.

(2004). Standards-based and responsive evaluation. SAGE.

224.

Stanfield

J. H.

II . (1999). Slipping through the front door: Relevant social scientific evaluation in the people of color century. American Journal of Evaluation, 20(3), 415–431.

225.

Stockmann

Meyer

(Eds.). (2016). The future of evaluation: Global trends, new challenges, shared perspectives (pp. 228–237). Palgrave Macmillan.

226.

Stone

M. M.

Cutcher-Gershenfeld

(2001). Challenges of measuring performance in nonprofit organizations. In Flynn

Hodgkinson

V. A.

(Eds.), Measuring the impact of the nonprofit sector—Patrice Flynn—Springer (1st ed., pp. 33–57). Kluwer Academic/Plenum Publishers.

227.

Stroup

S. S.

Wong

W. H.

(2017). The authority trap: Strategic choices of international NGOs (1st ed.). Cornell University Press.

228.

Tassie

Murray

Cutt

Bragg

(1996). Rationality and politics: What really goes on when funders evaluate the performance of fundees? Nonprofit and Voluntary Sector Quarterly, 25(3), 347–363.

229.

Taylor-Ritzler

Suarez-Balcazar

Garcia-Iriarte

Henry

D. B.

Balcazar

F. E.

(2013). Understanding and measuring evaluation capacity: A model and instrument validation study. American Journal of Evaluation, 34(2), 190–206.

230.

Thomas

V. G.

Campbell

P. B.

(2020). Evaluation in today’s world: Respecting diversity, improving quality and promoting usability. SAGE.

231.

Thomas

V. G.

Madison

Rockcliffe

DeLaine

Lowe

S. M.

(2018). Racism, social programming and evaluation: Where do we go from here? American Journal of Evaluation, 39(4), 514–526.

232.

Trelstad

(2008). Simple measures for social enterprise. Innovations: Technology, Governance, Globalization, 3(3), 105–118.

233.

Tuck

Yang

K. W.

(2014). R-words: Refusing research. In Paris

Winn

M. T.

(Eds.), Humanizing research: Decolonizing qualitative inquiry with youth and communities (pp. 223–248). SAGE.

234.

Tuckman

H. P.

Chang

C. F.

(2006). Commercial activity, technological change, and nonprofit mission. In Powell

W. W.

Steinberg

(Eds.), The nonprofit sector: A research handbook (2nd ed., pp. 629–644). Yale University Press.

235.

Twersky

Buchanan

Threlfall

(2013). Listening to those who matter most, the beneficiaries. Stanford Social Innovation Review, 11(2), 40–45.

236.

USAID. (2022). ADS chapter 201: Program cycle operational policy. United States Agency for International Development (USAID). Retrieved from https://www.usaid.gov/ads/policy/200/201

237.

Vanderplaat

(1995). Beyond technique: Issues in evaluating for empowerment. Evaluation, 1(1), 81–96.

238.

Virtanen

Uusikylä

(2004). Exploring the missing links between cause and effect: A conceptual framework for understanding micro–macro conversions in programme evaluation. Evaluation, 10(1), 77–91.

239.

Visse

Abma

T. A.

(Eds.). (2018). Evaluation for a caring society (1st ed.). Information Age Publishing.

240.

Weiss

C. H.

(1973). The politics of impact measurement. Policy Studies Journal, 1(3), 179–183.

241.

Weiss

C. H.

(1998). Evaluation: Methods for studying programs and policies (2nd ed.). Prentice Hall.

242.

Whittle

(2013). How feedback loops can improve aid (and maybe governance). Center for Global Development. https://www.cgdev.org/publication/how-feedback-loops-can-improve-aid-and-maybe-governance

243.

Wholey

J. S.

(1981). Using evaluation to improve program performance. In Levine

R. A.

Martina

Hellstern

G. M.

Wollman

(Eds.), Evaluation research and practice: Comparative and international perspectives (1st ed., pp.55–69). SAGE.

244.

Wilson-Grau

Britt

(2012). Outcome harvesting. Ford Foundation. https://www.outcomemapping.ca/resource/outcome-harvesting

245.

W.K. Kellogg Foundation. (2004). Logic model development guide (p. 71). https://www.wkkf.org:443/resource-directory/resources/2004/01/logic-model-development-guide

246.

Young

D. R.

Bania

Bailey

(1996). Structure and accountability: A study of national nonprofit associations. Nonprofit Management and Leadership, 6(4), 347–365.