Abstract

The market is best served when each organization can measure its social impact in the way that is most meaningful and insightful to its aim and operations…. [It is possible to] achieve comparability by focusing on the analytical skills needed to compare social impacts without mandating a rigid set of required metrics. The premise is that efficient capital markets demand analysts who are capable of interpreting and comparing apples and oranges. Why? Because they understand fruit. (Ruff & Olsen, 2016, para. 3; emphasis in original)
Impact investors are keen to create measurement standards. They seek a set of uniform metrics that spans locations, organizations, and fields of work. But this is a misguided quest. Of the many, many standards that have been proposed in our society, most have faded into oblivion or found themselves competing with an ever-increasing number of similar standards (Timmermans & Epstein, 2010). One aspect that successful standards have in common is a well-struck balance between uniformity (one size fits all) and relevance (customized to specific needs; Ruff, 2013; Timmermans & Epstein, 2010). It means that a successful impact measurement standard must find a balance between the metrics that investors want (uniformity) and the carefully crafted causal assessments that evaluators do. We recently wrote an article (Ruff & Olsen, 2016) and a blog (Olsen & Ruff, 2017), in which we proposed a potential solution to this challenging balancing act. The solution has three elements, which are outlined in the three sections below. The linchpin is a cadre of skilled analysts who know how to make sense of impact reports after the measurements are taken and the findings released. In many ways, evaluators already have the skills needed to be impact analysts. Stepping into that position requires accepting some new relationships with data and new roles in the field.
Typically, evaluators conduct impact assessments that are custom-designed to the specific evaluand (which is usually a single program or organization) and for which they have broad discretion when selecting the indicators, measures, methods, frameworks, and so on to be used. The resulting variability makes it difficult for investors, and others, to compare results across organizations or programs and to assess the collective impact of a portfolio, sector, or system. Also, these types of evaluations are typically rigid (i.e., the intervention cannot change during the study period) and slow to produce findings. This makes evaluations cumbersome, even counterproductive, for organization managers. Rigorous evaluations can also be expensive, making organizations reluctant to repeat them regularly. Taken together, the sorts of evaluations that can establish “impact”—namely, causal claims—do not deliver the ongoing and comparable impact information that the impact investment community seeks.
From the investor’s point of view, the problem is that impact evaluation is too varied and too complex. It is not surprising, then, that they seek to make impact measurement the same. They want fixed sets of metrics comparable to nutritional labeling for food products. This, they believe, will both simplify impact measurement and eliminate variation. But fixed sets of metrics have been tried before and failed to gain traction. Evaluators know why: Uniform measures have insufficient relevance to the particular requirements of a given organization or program. Uniformity diminishes the meaningfulness of the impact account and reduces managers’ ability to learn from it. The very act of making a system uniform creates one that no longer accurately measures impact. The challenge, therefore, is to create measures that somehow combine uniformity and relevance—that is, standards that are flexible enough for specific purposes, yet comparable enough to allow for both portfolio-level and sector-level analysis. Our proposed three-part solution is an impact measurement standard that: harnesses operational data to enable causal inferences, uses constructs with “bounded flexibility,” and engages a cadre of impact analysts capable of interpreting reports.
The third part is the most crucial: Without the analysts, none of the other elements work. In the section below, we describe each element in more detail.
Harness Operational Data
Impact evaluation refers to the identification of causation (Howell & Yemane, 2016; Mohr, 1999; Pawson & Tilley, 1997). Within the pages of the American Journal of Evaluation, the question of what constitutes a causal claim has been well debated. Some insist that there must be the counterfactual (Howell & Yemane, 2016; Reichardt, 2011) and also, ideally, a comparable pre- and posttest “design, control group, instrument development and testing, and random sample selection” (Bamberger, 2004, p. 5). Others argue that reasonable inferences can be made in several other ways, such as identifying causal mechanisms (Pawson & Tilley, 1997; Paz-Ybarnagaray & Douthwaite, 2017), engaging with theory (Chen, 1990; Hansen, Klejnstrup, & Andersen, 2013; Weiss, 1997), and eliminating all other alternative explanations (Mohr, 1999).
Informed by these strategies for reasonable inferences, thoughtfully chosen operational data—that is, data collected in the normal course of running a business or delivering a program—can be harnessed to tell a reasonable impact story. Examples of such data might be wages paid (to workers of a certain profile), sales (in a certain region), number of items sold (to customers of a specific demographic), gallons of water used in production, or number of clients served. Data related to certain outputs (for instance, mosquito nets provided for beds) can also be tracked against certain outcomes (reduced incidence of disease). Operational data are similar to what evaluators call monitoring data, in that it is collected frequently; and in some cases, “operational” and “monitoring” might describe the same data, but the two are not necessarily the same. Operational data are collected as a consequence of operating the organization rather than for the specific purpose of assessing impact. The advantage of harnessing operational data is that it is affordable, continuous, and (by definition) well aligned with the organization’s activities.
But for operational data to measure impact, careful consideration must be given to causal mechanisms, theory, and alternative explanations. This is where evaluators come in. As we mentioned above, engaging evaluators as impact analysts will require them to take on new relationships and roles in the field. One example of this is a greater focus on integrating evaluation expertise within the accounting and information systems that organizations use. Harnessing operational data for impact measurement requires that these systems be developed and installed with causal mechanisms, theories, and alternative explanations in mind. Achieving this requires the focus of evaluation to shift upstream from the single organization using software to the software developer.
As an example, consider the case of the online parenting platform WeeSchool designed to improve the school readiness of children in low-income communities of Colorado. The app seeks to “level the playing field, starting from birth” by selling learning resources (music, games, toys, etc.) that enable parents to understand, support, and monitor their children’s development. Operational data, both from sales and from the app itself, can be assembled to tell an impact story. This might include assessing the reach of the intervention: who uses it, and where, and when? The impact story could also involve evaluating the differences between the developmental milestones reported on the app—what developmental gains has a child made?—and milestones according to other public data sets: What developmental gains might a child of this demographic profile be expected to make? It is crucial that impact questions are anticipated before the app and business systems are built, so that the operational data can be used to make reasonable causal claims.
Some evaluators already do this work, and many nonprofits have monitoring technology in place that can fulfill these data requirements. But for impact investors, and for the business enterprises they engage with to effect change, much work still remains to be done. This is where evaluators are most needed.
Constructs With Bounded Flexibility
The second element of our proposed solution is to create impact measurement and reporting standards with “bounded flexibility.” Bounded flexibility is a middle ground between “anything goes” and “only one right way” (Ruff, 2013; Ruff & Olsen, 2016). This approach creates comparability by focusing on the commonality of the construct itself rather than on differences in the indicators used to define and measure it. Any given construct (for instance, “new jobs” or “improved livelihoods”) can be defined, counted, and measured in many ways. Using bounded flexibility, organizations choose the definitions, counts, and measures that are most relevant to them from a prescribed (bounded) set of options. The bounded nature of the options reduces the variation in measures by eliminating unreasonable or irresponsible approaches. It does not impose uniformity across different contexts, it merely corrals similar organizations operating in similar contexts around similar metrics. And it allows aggregation and comparison across contexts at the level of the construct.
Bounded flexibility can be illustrated using the UN Sustainable Development Goals (SDGs). SDG indicator 1.2.1 (proportion of men, women, and children of all ages living in poverty in all its dimensions according to national definitions) permits variation in indicators of poverty. The flexible approach means that although different countries define and measure poverty differently, it is still possible to aggregate the number of people globally who are living in poverty in their country. The measure could be further bounded by placing some restrictions on allowable national approaches. By contrast, consider “youth” in SDG indicator 8.6.1 (proportion of youth [aged 15–24 years] not in education, employment, or training). The construct of “youth” refers to those people who are between childhood and adulthood. The SDG indicator imposes a uniform measure of youth. Canada’s youth employment program, however, defines youth up to age 30. The higher age limit is in recognition of its national context such as norms around education and age of marriage. A bounded flexibility would allow reasonable local definition of “youth” while still allowing for global comparisons around joblessness among those people who are “between childhood and adulthood.”
The more flexible standards are, the more relevant they can be to a wide variety of organizations and contexts. But variation makes comparisons complex. Skilled readers, who we call analysts, are needed to make sense of the reported data in order to draw nuanced comparisons and conclusions. Analysts need information and skills. Bounded flexibility only works if organizations disclose the details of their methods, so that analysts can see the sources of variation. Analysts also need expertise to make sense of differences between evaluands, time frames, definitions, and reliability of instruments. Social Value International, an organization that supports standards for social and environmental value, offers a course in this form of impact analysis (http://socialvalueus.org/skilled-impact-analysis-certificate-framework). Evaluators who are accustomed to meta-analysis already have the required technical skills. Bounded flexibility requires impact analysts in large numbers to help impact investors, and others, understand what the numbers mean. Evaluators are well suited to fill this need.
Let’s return to our earlier example of WeeSchool. Gary Community Investments (GCI), which invested in the program, wants to measure the impact of its portfolio—whether the app has actually improved school readiness in low-income children. The data provided by WeeSchool, together with the principles of bounded flexibility, allow WeeSchool to measure school readiness in one way (the benefit to children) and also allow GCI to measure it in another way (the benefit to its portfolio). Both organizations analyze WeeSchool’s performance and extract from it the information most relevant for their purposes. If GCI wanted to compare WeeSchool with other investment opportunities, its evaluator could draw nuanced conclusions from the available data.
A Cadre of Analysts Skilled at Interpreting Impact Reports
The WeeSchool example demonstrates how impact analysis supports comparisons at the portfolio level. The same role can be scaled up or down the capital supply chain: Analysts can also interpret impact at the organizational level and at the level of industries, sectors, and even of entire countries. In this capacity, analysts can enable commensuration of impact without imposing a single moral standard. This is perhaps the most exciting and transformative role that evaluators can step into as impact analysts.
Impact investors seek commensuration of impact evaluations such that the impacts of diverse organizations can be ranked. However, this commensuration is entangled with moral values (Espeland, 2001; Espeland & Stevens, 1998, 2008)—particularly for impact-driven activities, which are always enactments of a particular ideology (Mabry, 2002). Any standard that renders impacts commensurate does so by standardizing valuation, which is necessarily rooted, to some degree, in moral judgment. Rather than develop the standard for commensuration of impacts, evaluators as skilled impact analysts can bring a diversity of evaluative judgments to the public discourse—an ability that Clark, Emerson, and Thornley (2014) refer to as multilingual leadership.
With a cadre of impact analysts to call on, impact investors could work with those whose commensuration processes align with their own moral values. Since society rarely agrees on precisely what “good” means, no single method will appeal to everyone. A diversity of evaluative judgments in public discourse enables a measurement system that allows a plurality of values.
For example, GCI has invested in WeeSchool because the organization views school readiness as a path out of poverty in its home state. From that perspective, it values WeeSchool highly. But other investors might view the app through the perspective of gender, for instance, and might find it wanting; or they might be more concerned with global poverty than with Colorado poverty; or they may concern themselves with a different issue entirely. Those people are unlikely to value WeeSchool as highly as GCI does. In our view, social capital markets are best served when this diversity is allowed to flourish—and that means separating the commensuration of impacts from the accounts of what happened.
Conclusion and Future Outlook
We have suggested three features of a common approach to impact measurement: harness operational data, use constructs with bounded flexibility, and develop a cadre of analysts who are skilled at interpreting reports. The analysts are the most crucial of these. Evaluators are well suited to step into these roles, but it will require them to take on new relationships with data and new roles in the field. Perhaps as a result of the importance of the analyst role, new efforts are now being made to better define it and to cultivate the skill set.
Today, numerous actors in the financial services world including the Impact Management Project, World Economic Forum, the Global Impact Investing Network, and two of the major global credentialing bodies for financial services providers—the Institute for Chartered Financial Analysts and the Chartered Alternative Investment Analysts Association—are either developing and delivering curricula to train investors in impact investing or exploring the best way to do so. Among the key considerations in these trainings are what a given investment’s impact is, and how to relate it to a given investor’s priorities. With resources such as these coming online to accelerate education of investors about how to engage for impact, wider recognition of the crucial role that skilled impact analysts play will be key to transcending the Catch-22 of impact metrics that are both relevant and commensurable.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
