Abstract
It is a much-lamented fact that research with the potential to inform or influence education policy instead remains policy inert. There are many reasons for this frustrating state of affairs, including a lack of strategic thinking on the part of researchers on how to successfully accomplish outreach—as opposed to communication with peers (in-reach). Another, and a principal focus of this article, is the failure of researchers to appreciate the power of employing compelling narratives to bring their findings to the attention of policymakers and other stakeholders. Accordingly, this article presents some examples of narratives specifically designed for outreach and discusses some of their features. It also considers the challenges in gaining traction with counternarratives once a particular narrative has achieved currency. Researchers should also be mindful of the tenor of the times, with experts now often viewed with skepticism, if not downright hostility. In some quarters, excessive reliance on technocrats is even seen as a threat to democratic governance. The article concludes with some recommendations on how to appropriately enhance the role of research in education policymaking.
1. Introduction
This article is addressed principally to early- and mid-career researchers with aspirations to have some influence/impact on education policy. The reality is that most of the research generated, whatever the discipline, is “policy-inert”—buried in reports, journals, or handbooks that never reach policymakers. Those findings that do reach policymakers are often viewed as neither relevant nor timely. Further, even if the research is potentially relevant, it is rarely communicated in a manner that makes it accessible to the target audience (Conaway, 2013). This last observation is one impetus for the present article. I contend that most researchers do not understand the level of sustained, strategic thinking required to have a real chance at influence/impact. In particular, they do not appreciate the power of embedding their key findings, or policy analysis, in a compelling narrative. Failing to do so, usually condemns the work to obscurity—at least from a policy perspective. Crafting such narratives requires training and practice, both of which are in short supply. Although the focus of the article is principally on how researchers can enhance utilization of their work and insights, it is appropriate to acknowledge that policymakers and knowledge brokers also have a responsibility to seek out relevant research.
I draw here principally upon my own experiences and readings in this realm to address a number of the issues involved. 1 I have four goals: (1) to briefly explicate the power of narratives and to offer some illustrative examples, (2) utilizing two case studies to describe the challenges in offering “counternarratives” to those narratives that already occupy the high ground in policy, (3) to discuss some of the headwinds researchers face in having their expertise valued by different stakeholders, and (4) to offer some recommendations for how to enhance the likelihood of research influence/impact.
The article is organized as follows: The next two sections provide some background on the topic of expertise and on the power of narratives in communicating to different audiences. The next section presents three examples of outreach (i.e., to nonpeer groups), and this is followed by two examples of the difficulties in challenging an ascendant narrative with an alternative (i.e., a counternarrative). The next section describes some of the headwinds researchers may confront in seeking to have influence/impact. Some are due to the general tenor of the times and others are specific to the context. The final section looks forward and offers some recommendations on how to enhance the role of research in education policymaking.
2. Some Background
Education policy, in common with all social policy, is a contested domain of practice. This is due, in part, to the range of stakeholders involved, each bringing their own values, perspectives, and interests to the table. Moreover, policy formulation and enactment are complex processes entailing not only political, economic, and pragmatic considerations but also factors specific to the issue at hand (Stone, 2002). Note that we follow Orland (2009) in distinguishing educational policy from educational practice: The former [policy] refers to laws, regulations, requirements, and operating procedures that govern the general manner in which educational services are provided, while the latter [practice] concerns the specific behaviors of teachers and others who directly interact with students. (p.114)
By the term education research, I denote research on topics related to education that follows general scientific principles, as adapted and applied in different settings, and that has implications for education, broadly conceived (Shavelson & Towne, 2002). 2 Some, but not all, products of education research have the potential to inform education policy. Examples are empirically grounded findings, theoretical reformulations of a particular area, or policy analyses that employ a variety of sources and methodologies. Some products are expressly intended to advocate for a particular policy, employing warrants based on education research. In what follows, I will use the term “policy research” to refer to the full panoply of such products. Implicit in the term is that there is, in fact, an intention to contribute to policy deliberations. 3
Empirical studies are, perhaps, the most common forms of policy research. In order to achieve some level of recognition among policymakers, an empirical study must have some claim to generalizability (e.g., it reveals some systematic patterns of broad interest) and yet, at the same time, must be seen among policy implementers as relevant to their particular context (e.g., How do these findings apply to my school?). Given the difficulties in conducting field research, satisfying both sets of demands can be challenging. Nonetheless, there is considerable evidence that research has contributed to education policy (Hanushek, 2015; Shavelson, 1988).
Indeed, research evidence is only one factor (and appropriately so) in policy formulation and enactment. 4 Figure 1 captures many of these other factors (C. Weiss, 2007). Note that the values and judgments of decision makers are another such factor. In fact, they deserve to be highlighted for not only are they important aspects of democratic governance but also justifiably mediate between policy research evidence and policy decisions (Messick, 1987). It is by means of values and judgments that policymakers consider and weigh multiple factors in coming to a decision. In the end, that decision may run counter to the body of evidence, seize upon specific studies that deviate from expert consensus, or reinterpret studies in order to satisfy other demands (Shepard, 2015). That is the reality, for better or worse.

Factors influencing policy decision-making. Source. C. Weiss (2007, p. 286).
Although some of the factors represented in the figure are generally beyond the control of researchers, many factors are. Foremost is the quality of the research, judged according to professional norms. Whether based on quantitative, qualitative, or mixed-methods, the interpretations of findings, as well as the inferences derived therefrom, should be respectful of the carrying capacity of the data; that is, any limitations related to data quality, study design, implementation, and analysis (Braun, 2021). Policymakers, or at least their staff, are generally attentive to research quality (Conaway, 2013; Orland, 2009). Secondly, as Orland (2009) pointed out, researchers and policymakers travel in different “orbits” and it is the responsibility of the former to appreciate, and take into account, the circumstances of the latter. Researchers should be mindful of policymakers’ needs, particularly with respect to relevance, interpretability, and timeliness.
Relevance is the degree to which the research product is germane to current policy debates or informs forthcoming policy decisions. Importantly, relevance is in the “eye of the beholder”; that is, researchers should make every effort to explain how their study findings may be applicable to local situations—even if they are framed in national or even international contexts. At the same time, they should be explicit about the limitations of the study (or the literature in general) with respect to the context of the issue at hand.
Interpretability is the extent to which policymakers and other stakeholders are not only able to understand the research findings (or the research argument) and their implications for policy but also to appreciate the limitations of the research with respect to the issues at hand.
5
Relevance and interpretability must be complemented by timeliness. If the policy debate has ended and decisions have been made, then there is little to be done except to prepare for the next cycle. Policymakers and others also labor under the burden of finding the sweet spot in time. As Grant (https://wtgrantfoundation.org/building-the-policy-wave-the-power-of-data-based-storytelling) put it: John Kingdon likens policy advocates to surfers waiting in the ocean for the right wave at the right time to ride their boards to shore…. Like surfers looking for that perfect wave, advocates and policymakers await the timely confluence of public energy and politics to bring those policies to shore.
Unfortunately, there are ample historical precedents for concerns regarding excessive deference to experts and their prescriptions. One is the trope of “experts for hire,” as the extended technical controversies over the health effects of smoking and environmental chemicals attest. Disagreements among experts are sometimes interpreted as an invitation to just “choose your favored side” and, sometimes, as evidence that experts “really don’t know what they are talking about.” For example, how can policymakers, education leaders, and teachers make sense of the never-ending reading wars, including the recent backtracking of a leading proponent of whole language instruction (https://edsource.org/2022/a-movement-rises-to-change-the-teaching-of-reading/675989). At the same time, well-founded and well-intentioned statements related to the uncertainties attendant to research findings are sometimes interpreted as a signal of irrelevance.
Another challenge is that studies that are prized by the academy, such as randomized control trials (RCTs), are often narrowly focused, typically do not investigate mechanisms, and, in addition, lack the generalizability that would make the results a suitable basis for policy decisions. Nonetheless, the policy implications of research findings are sometimes extended well beyond what the data support or are developed without due regard to relevant contextual factors. A prominent example is Project STAR, an RCT conducted in Tennessee in the late 1980s to investigate the impact of class size reductions on student test performance in early grades. Although there has been considerable debate regarding the findings, it is generally agreed that, on average, smaller classes resulted in a substantively meaningful impact in kindergarten and first grade, especially for minority and low-income children (Krueger, 1999). At the time, the findings were widely publicized and other states (e.g., California) decided to implement the intervention, apparently neglecting to appreciate that class size reductions would necessitate both more classrooms and more teachers—with the commensurate increase in costs. One outcome was that poorer districts not only had to bring in trailers to accommodate the extra classes but also lost their best teachers to wealthier districts and were forced to replace them with less qualified personnel. The failure to recognize the distinction between modest internal validity and weak external validity in a policy context led to substantial and costly disruptions to the education system with unwanted consequences for many students (Justman, 2018). Further, as Justman points out, there appeared to be little consideration of alternative, research-based strategies to class size reduction that could have reduced test score gaps at lower or equal cost.
Finally, widespread coverage of the “replication crisis” in various domains has fueled skepticism of scientific findings (Lehrer, 2010; Piper, 2020). More recently, concerns regarding reliance on AI-based systems for decision making (e.g., bank loans, parole judgments), as well as for purposes of identification (e.g., facial recognition), have affected the public’s and policymakers’ attitudes toward expertise (Hill, 2022). Admittedly, these instances are far removed from education policy. Nonetheless, they provide a backdrop for current attempts to bring research evidence to bear on education policy issues.
3. Communication and the Power of Narrative
The first principle of effective communication is to be clear about the composition and interests of the target audience. At a broad level, the question is whether the focus is on “in-reach” or “outreach.” By the former, I mean an audience that comprises mainly technical peers and, by the latter, an audience of all other stakeholders, typically possessing different levels of technical sophistication and a range of background knowledge of the issues. Graduate training and academic norms reward excelling at in-reach. Experience suggests that success at in-reach does little to prepare one for success at outreach; indeed, it can prove to be a barrier to successful communication with other audiences (Conaway, 2013). The difficulty is that the norms of academic communication prize detailed descriptions of methodology, comprehensive data presentations, and extended discussion of both the robustness of the results and the limitations of the study. Such content does not attract (or interest) most members of a policy audience. Consequently, it takes intentional effort and practice to become comfortable with, if not adept at, effective outreach communication.
With regard to outreach, there are at least four main purposes. The first is to inform stakeholders of research findings and their likely implications. The second is to illuminate a particular policy domain (e.g., by providing a critical review of the literature or offering a novel perspective on the domain, or both). The third is to advocate for a specific course of action or policy, and the fourth is to offer evidence to counter an existing policy or policy proposal. Often enough, purposes become conflated, as when an informative article shades into advocacy, but without explicitly signaling the shift, while failing to acknowledge the different levels of support appropriate to the two purposes. 8
Psychologists, psychiatrists, politicians, and religious leaders all appreciate the power of narrative (or story telling). Human beings appear to be hard-wired to respond to stories and well-crafted stories can have an enormous impact on listeners or readers (Bruner, 1990; Frankl, 2006). As Dery (2021) phrased it, “Facts matter…. [B]ut they become meaningful when they’re inlaid in the mosaic of narrative.” According to one distinguished educational psychologist (Berliner, 1992; p. 143), lack of research impact is due to a failure to tell compelling stories: In addition to understanding, prediction, and controls the goal of educational psychology is to influence practice. This requires attention to the fact that people have a preference for stories, meaningful narratives about other people and the problems they face in everyday life. It is argued that the findings and concepts of our discipline are not seen as possessing verisimilitude and rarely influence educational policy and practice because they are not well contextualized for educators. To do better, we must learn to tell stories about our research that focus on real teachers and students in ordinary school settings.
4. Three Examples
It was only relatively late in my career that I undertook serious outreach. In years prior, I had published some tutorials on technical matters (e.g., Braun, 1989) that were for in-reach rather than outreach. Fortunately, during my years at the Educational Testing Service (1979–2006), I garnered extensive experience in addressing client groups, as well as general audiences, on technical and policy issues related to high-stakes assessment. Over time, I developed a sense of how best to communicate to such groups. However, as I learned subsequently, effective in-person communication differs from effective writing. In some ways, the former offers more latitude, while the latter, in the absence of personal contact, demands a different type of coherent narrative structure.
My first major attempt at outreach concerned the use of value-added modeling for teacher and school accountability (Braun, 2005). A discussion of this monograph appears in the following section. In this section, I describe three subsequent efforts at outreach through written communications. The first example deals with a particular problem in educational measurement; namely, the issue of test score interpretation as it plays out in countless homes, schools, and offices (Braun & Mislevy, 2005). The next two examples address more general social problems that had a significant educational component. In each case, the intent was to bring to the attention of a broad audience the nature and extent of a serious problem, the implications of not addressing the problem, and a few suggestions on how to make some progress. The narrative strategy was to present a simple framework that could be used to structure an argument, organize the relevant evidence, and serve as the basis for describing a strategy to address the problem.
4.1 Intuitive Test Theory
In the first effort, Mislevy and I were motivated by the persistent mischaracterizations of tests (e.g., all Grade 4 math tests are basically the same) and the interpretations of reported test scores (e.g., test scores above 90% must be an “A”). We noted that by virtue of their school experiences, most adults have developed their own (intuitive) test theory that they then apply whenever they have to deal with test-related issues—in analogy with the intuitive theory of physics (aka Aristotelian physics) that each of us develop by dint of living in the physical world. However, just as intuitive physics is a poor theory for getting a rocket to the moon (Newtonian physics works better), so too intuitive test theory is a weak basis for understanding tests and the evidence they generate (modern test theory works better).
Unfortunately, the stories that people tell themselves (i.e., “personal” narratives) are highly resistant to change, and much mischief has been the result. Perhaps the most obvious example is the mistaken notion tests with the same title (e.g., a test of Grade 4 mathematics) are effectively measuring the same skills and, therefore, the scores generated can be interpreted in the same way and have similar implications. Thus, the choice of a test, say for a school district, can comfortably be made on other grounds—such as cost. A further implication is that it is a relatively simple matter to relate the scores on one test to scores on another. Consequently, states that permit local districts to choose from among an approved list of tests can still achieve state-level comparability on outcomes and, especially, for accountability.
In targeting an audience of educators, we first explained how scientific test theory differed from intuitive test theory. In brief, we characterized scientific test theory as a “special kind of evidentiary argument…[it’s] about reasoning from a handful of things that students say, do, or make, to more broadly cast inferences about what they know…or are apt to do in the future.” Thus, we implicitly contrasted the systematic and statistical foundations of test theory with the informal theories individuals develop as a result their experiences or those of others with whom they interact.
Without resorting to technical jargon, we then addressed seven of the most common misconceptions, offering explanations and some resolutions employing the tenets of scientific test theory. For example, two related misconceptions we discussed were: “A test is a test is a test” and “Any two tests that measure the same thing can be made interchangeable with a little ‘equating’ magic.” We explained that a test (e.g., of Grade 4 mathematics) can be conceptualized in very different ways and these differences will then be reflected in the content of the test and, sometimes (though not always) in the types of item formats employed. Accordingly, relative student performance can vary considerably from one test to the other. Unfortunately, if one holds the first misconception, the second seems to follow logically. In fact, these misconceptions led the Clinton Administration to ask the National Research Council to investigate the possibility of linking the different state tests in mathematics and reading/language arts to the NAEP scale, thereby facilitating the reporting of all student end-of-year test scores on a common, national scale. The Council produced two reports that demonstrated that although constructing such a linkage was technically possible, it would not withstand rigorous scrutiny (Feuer et al., 1999; Koretz et al., 1999).
We concluded with the following: In most issues that involve technical considerations, experts are consulted, and their perspectives become part of the policy debate. They don’t make the decisions, and they shouldn’t. In any social setting, there are more considerations than purely technical ones. But policy options should be restricted to those that are in accord with basic principles and broadly held standards of practice—the analogs of Newton’s laws of motion.
10
(p. 497)
4.2 America’s Perfect Storm
My next effort, in collaboration with colleagues at ETS and Northeastern University, concerned the long-term impact of three trends in American society (Kirsch et al., 2007). The target audience was primarily state-level education policymakers for both K–12 and higher education. The goal was to convey to a broad audience a clear warning of the danger in not focusing directly on reducing skill gaps.
The three trends were (1) large differences in the distributions of literacy and numeracy skills by gender and by race/ethnicity, (2) demographic trends in the proportions of the U.S. adult population categorized by both gender and race/ethnicity, and (3) changing labor market skill requirements. These three trends constituted the framework for the narrative and it was the confluence of these trends that we characterized as “America’s Perfect Storm.”
To illustrate the implications, we used Census Bureau population projections to 2030 to show that if skill gaps were not substantially reduced, then aggregate skill distributions would be strongly shifted to the left relative to the current distribution, although the labor market increasingly demands skill distributions shifted to the right (see Figure 2). 11 As part of the argument, we presented data showing how labor market success, defined in terms of type of employment and wages, depended both on educational attainment and on literacy skills. The narrative strategy was to avoid technical language and employ a linear, narrative structure to describe each trend, accompanied by simple, color-coded tables and charts. This was followed by a short synthesis of the argument for the “perfect storm” and recommendations for moving forward. Additional statistical material was relegated to an appendix.

Distributions of literacy for U.S. adults: 1992 (actual) and 2030 (projected).
The narrative text of the report comprised 20 pages, laid out in a reader-friendly manner. It was accompanied by a 7-minute video that conveyed the main features of each trend and highlighted the key implications of the confluence of these trends. Thousands of copies of the report and the video were distributed. In addition, over a period of 18 months, the authors made more than 80 presentations to a variety of audiences (local, regional, and national)—often due to an invitation from someone who had seen either the report or the video. Although we cannot point to direct impact, the project certainly served an “enlightenment” function in public discourse (C. H. Weiss, 1977).
4.3 Opportunity in America
The last example is taken from another collaboration with ETS colleagues. The impetus here was the broad-based evidence that societal and economic inequities were increasing and, in particular, that there were large gaps in the opportunities available to children, depending on both demography and geography. With substantial support from ETS, we embarked on a multiyear project that resulted in multiple products. The intended audience was policymakers and knowledge brokers at the state and national levels. 12
One publication, an e-book titled “Choosing Our Future,” incorporated graphics with voice-overs, as well as short film clips to make the story more powerful (Kirsch et al., 2016). We also repeatedly revised the text in response to reviews by nontechnical readers in order to enhance readability; that is, we made a concerted attempt to communicate to a broad audience using carefully crafted language, along with simple tables and charts. In addition, we coproduced a documentary film, also called “Choosing Our Future” that enriched the basic narrative with longer film clips of interviews with a number of individuals around the country (visual anecdotes) that offered perspectives on opportunity in their communities. 13
The e-book details how early differences in opportunities to develop a rich skill profile concatenates over the developmental span from birth to young adulthood, resulting in increasingly large differences in human capital at the threshold of adulthood and entry into the labor market. Those differences, in turn, are strongly associated with differences in a range of adult outcomes, including labor market success, family formation, and civic participation. Even more problematic is the transmission of (dis)advantage to the next generation with grave implications for the promise of shared prosperity and the health of the democratic polity. Unfortunately, experience suggests that reform policies that do not explicitly target gaps in achievement are unlikely to have much impact (Braun et al., 2010).
At this point, a fair question is whether these publications had any influence or impact. The simple answer is that we have no way of offering a quantitative answer. All had quite wide distribution and elicited numerous comments and responses. In that the primary intent was to “enlighten,” they did succeed in stimulating conversation on each of the topics. Indeed, these types of publications rarely bring recognition or direct rewards—which, unfortunately, can be a disincentive for faculty working toward promotion.
In general, policy research that serves an enlightenment function should be seen as a contribution to the common good. Nonetheless, a natural question that arises is whether it is possible for policy researchers to determine the direct impact of their work. Typically, drawing causal connections between a body of work and policy adoption is very challenging, if not impossible. In general, the adoption of a major policy initiative is almost always a complex, highly nonlinear process, involving many stakeholders. Occasionally, it is possible to link a researcher to policy change. A prominent example is offered by the efforts of Robert Balfanz to identify “high school drop-out factories” and to institute effective reform measures (Balfanz & Legters, 2004).
5. Narratives and Counternarratives
Falsehood flies, and the truth comes limping after it; so that when men come to be undeceived, it is too late; the jest is over, and the tale has had its effect. (Jonathan Swift)
14
5.1 Value-Added Models
Following the 1989 Charlotteville Summit called by then-president George H. W. Bush, there was an increased interest in measuring growth in student learning and reporting the results to the public. Familiar choices, such as gain scores, had many well-known technical difficulties. William (Bill) Sanders had a solution.
Sanders, whose mother was a teacher, had a long-standing interest in public education. A well-respected, applied statistician at the University of Tennessee (Knoxville), he proposed adapting mixed, linear models developed for use in agricultural studies for the educational context. His proposal was supported by extensive, highly technical analyses that offered support for the contention these methods were able to isolate the contributions teachers made, over the course of a school year, to their students’ learning, net of their previous test scores and other factors contributing to test performance. These estimates of teachers’ contributions were called teachers’ “value-added” estimates and the regression models employed were termed “value-added models” (VAM; Ballou et al., 2004; Sanders & Horn, 1998).
In the early 1990s, Sanders was invited to help draft legislation incorporating the newly-named Tennessee Value-Added Assessment System (TVAAS) into the work of the Tennessee Department of Education. 15 The intent was to have school districts employ the methodology to estimate schools’ and teachers’ value-added. Use of the tool was entirely voluntary and no stakes were attached to the value-added scores that were generated.
Early on, the TVAAS was evaluated by the Office of Accountability in the Tennessee State Comptroller of the Treasury (Baker & Xu, 1995). The report discussed a number of issues that anticipated concerns subsequently raised in the peer-reviewed literature. These included, among others, year-to-year volatility, lack of transparency and the need for an independent evaluation of the model and its implementation. Sanders was given the opportunity to respond. He did, in detail and, in the years to follow, continued to conduct extensive empirical research in support of TVAAS and its successor, the Educator Value-Added Assessment System (EVAAS). EVAAS spread steadily and, with the passage of No Child Left Behind, many states shifted to high-stakes uses of VAM scores for accountability. Despite his initial objections to such applications, Sanders acquiesced and provided ongoing support for the states from his new home at the SAS Institute. 16
VAM is an outstanding example of how a researcher can have impact on policy. That impact had much to do with Sanders’ ability to craft compelling narratives for nontechnical audiences, effectively translating technical issues into lay language. In fact, he was a charismatic speaker whose presentations, seasoned with a “down-home” delivery, had the character of revival meetings. By choosing strategic partners, such as the SAS Institute and Battelle for Kids, he was able to effectively leverage his efforts and establish VAM as one of the two most widely used methods for measuring student growth on state summative assessments.
As noted above, early on there were questions raised regarding the appropriateness of employing VAM scores for high-stakes accountability. A RAND monograph provided an early description and review of VAM, addressing issues such as the volatility and systematic bias in VAM estimates. (McCaffrey et al., 2003). 17 I entered the fray with a somewhat different focus (Braun, 2005). The VAM Primer was targeted at a nontechnical audience. It had three goals. The first was to explain the logic of VAM in terms accessible to lay audiences. The second, building on the first, was to clarify the implicit causal assumptions supporting the use of VAM scores for accountability. The third was to examine the plausibility of those assumptions in the context of public schooling. The overall intent was to make the case that the high-stakes use of VAM was much more problematic than its proponents’ rhetoric suggested.
The VAM Primer was widely distributed and, I believe, attracted a fair amount of attention, especially among educators. 18 Over the next decade, I continued to publish on VAM, made many presentations to a variety of audiences, and worked with the National Education Association to oppose the use of VAM scores as the principal quantitative indicator of teacher efficacy. Regrettably, I never made a concerted effort to join forces with other researchers to embark on a campaign to convince legislators and policymakers to reconsider VAM use. To my knowledge, such a broad-based policy strategy was never undertaken.
Over that same decade, there were robust exchanges between the Sanders faction and many in the educational measurement, econometric and statistical communities. For the latter, the widespread adoption of VAM scores for teacher accountability that followed the Race to the Top initiative was particularly concerning because it further identified teacher efficacy with a flawed indicator. Nonetheless, it is only with passage of the Every Student Succeeds Act in 2015, and the elimination of federal teacher accountability requirements, that the use of VAM began to wane. Thus, sadly, I have to conclude that the various counternarratives developed by researchers had little direct impact on policy.
5.2 International Large-Scale Assessments
Messick (1987) argued that large-scale assessment surveys were essentially a type of policy research and should be judged on the basis of their policy utility. Further, he maintained that the data generated were intended to inform policy decisions, not to determine them. Although his focus was on the U.S. National Assessment of Educational Progress (NAEP), the issues are germane to international large-scale assessments (ILSA). Over the last 20 years, data generated by ILSA (e.g., TIMSS, PIRLS, PISA) have become more salient in education policy discussions around the globe (Braun & Kirsch, in press; Kirsch & Braun, 2020). 19 In fact, participation in ILSAs can bring many benefits. Countries can (1) compare levels of achievement overall and by subgroup, (2) compare patterns of relationships between cognitive scores and a range of demographic/contextual factors, and (3) benefit from the many secondary analyses that are conducted in the years following release of the data.
However, when ILSA country-level results are first made public, most prominent is a table that lists the countries in rank order of their average scores. Over time, these so-called league tables have assumed outsized importance in the minds of the public and policymakers both. Much of the credit for the importance attached to these league tables is due to Andreas Schleicher, who heads the Education Directorate within the Organization for Economic Cooperation and Development (OECD). 20
Schleicher, who founded PISA in 2001, has astutely leveraged the strengths of the OECD to promote PISA. An indefatigable global traveler spreading the PISA gospel, Schleicher argues that making the effort to understand and then emulate the policies of countries at the top of the rankings, or those whose ranks have significantly increased from earlier cycles, will be well rewarded. To this end, he has crafted a variety of compelling narratives that are effectively delivered in person or through other channels. He also has enlisted strategic partners. In the United States, for example, he has teamed with the National Center for Education and the Economy.
The counternarrative has two main themes. The first focuses on the warrants for making causal connections between ILSA rankings and education system efficacy. The key assumption here is that differences in ranks are directly related to differences in the efficacies of the respective education systems. In practice, however, even moderate differences in ranks may correspond to rather small differences in the corresponding score distributions. Second, justifying the assumption requires discounting the many other factors that can influence a country’s performance on the ILSA—even though many of those factors have been well-established in the literature. 21 See also Feuer (2012).
Finally, there are well-known limitations on making credible causal claims on the basis of cross-sectional studies. This point is illustrated by Table 1 (Singer & Braun, n.d.). It displays six possible uses of ILSA results and, for each, the appropriateness of that use given the strengths and limitations of such data. 22 Note the sixth use at the bottom: To explore causal relationships between contextual factors (demographic, social, economic, and educational variables) and student achievement. Of course, it is this use that is at the heart of the direct policy prescriptions touted by Schleicher and others. It is labeled dangerously difficult.
Purposes of School-Based International Assessments
The second theme concerns the notion that the policies of “high flyers” offer a generalizable prescription for success. In point of fact, emulation is never simple, as Schleicher himself has acknowledged (Schleicher, 2018). Earlier, Wiseman (2013) argued that, to the extent that ILSAs exert influence on educational policies, the outcome is not necessarily some form of strict policy convergence. Instead, the result may be an “isomorph” of the “model” system, whereby countries extract some key features of the model (e.g., curriculum) but then modify and/or adapt them to better fit the country’s traditions and culture. Thus, despite exhortations in support of emulation, the path from ILSA results to policy responses is neither straightforward nor predictable (Pons, 2017).
A report of two workshops conducted under the aegis of the National Academy of Education offers a relatively brief but comprehensive review of these and other issues, for the most part avoiding technical analyses (Singer et al., 2018). Subsequent publications presented a synopsis of the issues to general academic audiences (Braun & Singer, 2019; Singer & Braun, 2018). The counternarrative proved more challenging to craft and to communicate because it unavoidably incorporated additional complexity. The challenge here, as in other contexts, is to find settings where these concerns can be brought to light and to develop effective counternarratives appropriate to the audiences. With regard to the latter, specific examples of how misuse of rankings can result in misguided policies resulting in harms to specific student groups or certain regions might prove useful.
6. Headwinds
One can argue that the impact policy research has had on education policy is not commensurate with the wealth of information and insights that have been generated over the years. Thus, in principle, there should be multiple opportunities to expand both influence and impact. As noted earlier, however, researchers face significant headwinds in attempts to influence policy. In this section, I first present two examples where experts appear to have failed to appreciate the broader context in which policy decisions were made and, as a result, met with determined opposition. Then follows an elucidation of some general concerns regarding the role of experts and expertise in public affairs.
The two examples illustrate the worry that, over time, experts form an elite that is increasingly isolated from the rest of society. Almost 100 years ago, John Dewey articulated this concern: “Experts are inevitably so far removed from common interests as to become a class with private interests and private knowledge” (quoted in Cassam, 2021). This isolation can result in ignoring the legitimate interests and values of the different stakeholders who are (or will be) affected by a particular policy. 23 Without a concerted effort to consult a wide range of stakeholders with respect to both material interests and values, as well as taking account of the politics that may be at play, policy prescriptions can have significant, unintended, negative consequences. 24 This is the basis of many criticisms of policies enacted by national governments, as well as those by supra-national organizations, such as the World Bank, the International Monetary Fund, and various agencies of the European Union. 25,26
An example from England makes the point. In 2020, the Office of Qualifications and Examinations Regulation (Ofqual) attempted to compensate for the lack of end-of-year standardized assessments by using statistical models (employing prior years’ data) to moderate teacher-assigned grades in order to achieve some rough comparability across schools. 27 By contrast, in the absence of the usual centralized examinations, parents regarded teacher assigned grades as a fair measure of their children’s academic accomplishments. They were not amused by the impact of mysterious (to them) statistical adjustments that seemed not to have anything to do with their children’s class work—while affecting their future schooling opportunities, such as admission to more selective schools. The resulting parental outcry forced the government to back down, with a black eye for the experts!
What had transpired was a conflict between different conceptions of fairness (i.e., broad relational fairness vs. intuitive notions of local fairness). It is quite likely that Ofqual staff were not in the habit of consulting with parents or the public-at-large regarding proposed analyses or score adjustments. Nonetheless, given the scale of the challenge posed by a lack of end-of-year data, conducting a few focus groups around England might have provided fair warning of the likely parental response. In view of its actions during the course of the incident, it is unlikely, even with such evidence in hand, that Ofqual would have decided to change its plans! This incident underscores the point that although policymakers should certainly consider technically preferred solutions, they are often better placed than experts to take into account other factors (e.g., complexity and likely public acceptance) when coming to a policy decision.
An earlier example, on our own shores, reinforces the point. In this case, the issue concerned the “flagging” of test scores obtained on postsecondary admissions tests (e.g., ACT, SAT, GMAT). The term refers to the practice of adding a notation to test score reports that the scores were obtained under nonstandard (accommodated) conditions; that is, there was an intentional breach of the typical standard conditions of administration. 28 Proponents of flagging maintained that this practice was necessary for fairness, in that score users were made aware that the scores had been obtained under non-standard conditions. Most experts at the testing agencies predicted dire consequences if flagging were eliminated. For example, it might encourage gaming of the system, whereby many students would seek unneeded accommodations in order to gain some advantage. Opponents noted that (1) a student needed extensive documentation to obtain an accommodation, (2) the exact nature of the accommodation was determined by the testing agency and conformed to specific guidelines, and (3) the flag could serve as a rationale for discriminating against these students in the admissions process. This last point was central to their argument. So again, there were contrasting views of fairness, with each side having a reasonable claim to the validity of test score interpretation and use. This clash culminated with the advocates for students with disabilities succeeding in eliminating flagging. Although research evidence certainly played an important role in the debate, ultimately business and political considerations tipped the balance. 29 In the end, the elimination of flagging did not result in a crisis of test score validity.
These examples illustrate educational measurement is not just about the science and technology of testing. Indeed, as Messick (1994) asserted: [B]asic assessment issues as validity, reliability, comparability, and fairness need to be uniformly addressed for all assessments because they are not just measurement principles, they are social values that have meaning and force outside of measurement wherever evaluative judgments and decisions are made. (p. 13)
More recently, in his 2020 NCME Presidential Address, Sireci (2021) called attention to the public’s loss of faith in educational testing. He argued that this is understandable in light of the deleterious impact testing has had on marginalized communities. He referred to the historic association of tests with the eugenics movement, as well as the use of tests to restrict opportunity both in the past and in the present. 30 Further, he castigated the measurement profession for not adhering to its own standards of practice, especially with regard to amassing appropriate validity evidence for intended test uses, as well as its resistance to change by clinging to outmoded conventions of standardization and comparability. Not surprisingly, as he notes, there are many who view tests and testing experts with some suspicion.
Moreover, methodologists of all stripes have been called to account by the “Quantcrit” movement, that applies the lens of critical race theory (CRT) to data analysis and interpretation, as well as to test construction and test interpretation (Randall, 2021). Interestingly, nearly 50 years ago, Kaplan (1964) argued that data were as much a product of an interpretive process as interpretation was the product of data. In his advice to methodologists, he anticipated many of the recommendations advanced by advocates of Quantcrit—the difference being that the latter highlight the need to reexamine long-held assumptions, conventional patterns of thought and disciplinary traditions in light of such essential values as equity and social justice, with due regard to the pernicious effects of hundreds of years of systemic racism. 31
Somewhat different concerns surface when considering the political implications resulting from governments allocating substantial authority to experts especially when there is minimal oversight and direct accountability (Bertsou & Caramani, 2020). Building indirectly on Dewey’s warning (quoted above), contributors ask the following question: Should technical elites operating within agencies insulated from short-term political considerations be seen as a useful corrective to dysfunctional or hyper-partisan politics—or as much a threat to democratic governance as unbridled populism (Caramani, 2020). 32 If both views contain a modicum of truth, then the challenge experts face is to forge a path leading to a balance sheet favoring benefits over costs—recognizing, of course, that stakeholders will differ in the values they assign to various benefits and costs!
7. Looking Forward
This article argues for the importance of policy researchers learning how to craft narratives that their target audiences find interesting and relevant, if not compelling. As the Berliner quote indicated, this will often require weaving into the narrative stories to which readers can easily relate. To accomplish this requires specialized skills that take time to develop. In his well-known treatment of the education of researchers, Levine makes no mention of skills in communication (Levine, 2007). Unfortunately, as far as I know, the current situation has hardly improved since then, and surely merits sustained attention. As practitioners, teachers, and mentors, we have an obligation not only to cultivate our own skills in this regard but also to convey to our students the importance to their careers of developing communication skills for both in-reach and outreach. Of course, with respect to outreach, there is an opportunity cost to devoting time and effort to such activities in place of more traditional journal publications and seeking grant support. More painful, perhaps, is the reality that crafting compelling narratives almost always involves a loss of nuance that can invite misreading and misinterpretation.
In the current climate, however, the quest for influence/impact requires more than learning to write and speak effectively. Viewing our work through a CRT-type lens brings the realization that identity helps to shape expertise (i.e., expertise is never neutral). We should, therefore, reexamine long-held assumptions, conventional patterns of thought, and disciplinary traditions in light of such essential values as equity and social justice. We can begin with serious introspection, asking how our backgrounds, education, and values affect how we carry out our work, including such activities as:
• constructing or adapting a theoretical framework,
• designing and building assessments,
• posing research questions,
• interpreting and communicating empirical results,
• conducting and reporting policy evaluations,
• arriving at policy descriptions and limitations, and
• investigating and presenting policy implications.
As policy researchers, we have to think carefully about how to craft narratives that fairly present research evidence and their implications for decisions—while taking due account of audience diversity, acknowledging the limitations of the different, relevant research studies (perhaps even conflicting results), and, finally, avoid conflating evidence interpretation with (implicit) value judgments (Hanushek, 2015).
There are also strategic aspects of the quest for influence/impact. First, policy researchers must appreciate the sociopolitical context in which policy decisions are being made and, accordingly, the different ways in which research can contribute to the discussions (Dunlop & Radaelli, 2020). Second, they should understand the social ecology of research use by policymakers; that is, the different and interlocking networks they rely on for information and advice. Finally, they must plan on making a substantial investment in time and effort to build trust through long-term relationships, both directly with policymakers and with the “knowledge brokers” who often play an important intermediary role in bringing research to the table. 33
When contemplating a policy research action agenda, researchers must be prepared to answer questions such as (1) What are the policy target(s)? (2) Are the goals to critique, to enlighten, or to advocate? (3) Who are the decision makers? (4) Who are the stakeholders? (5) Who are potential partners? (6) What is time horizon? and (7) What are the channels of communication?
With respect to this last point, the infrastructure of communication, especially that of social media is constantly changing (metastasizing?) with enhancements to existing platforms and new platforms appearing almost monthly! Although it is likely that there will continue to be heavy use of current channels, storytelling will evolve to take advantage of the affordances that become available and, thereby, develop and refine more effective ways to accomplish outreach. With their expertise on offer, education policy researchers will have to adapt to this new environment if they are to enhance their influence/impact. Whether they can do so while respecting changing norms of professional practice remains an open and critical question—perhaps deserving research scrutiny of its own.
Footnotes
Acknowledgments
I benefited from advice and comments from Michael Feuer, Maya Komakhidze, Larry Ludlow, Bob Mislevy, and Rich Shavelson. The suggestions of the referee were incorporated in the final version of the article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
