Abstract
Experimental and quasi-experimental designs are used in educational research to establish causality and develop effective practices. These research designs rely on a counterfactual model that, in simple form, calls for a comparison between a treatment group and a control group. Developers of educational practices often assume that the population from which control groups are drawn is unchanging in its behavior or performance. This is not always the case. Populations and study samples can change over time—sometimes dramatically so. We illustrate this important point by presenting data from 5 randomized control trials of the efficacy of Kindergarten Peer-Assisted Learning Strategies, a supplemental, peer-mediated reading program. The studies were conducted across 9 years and involved 2,591 students. Findings demonstrate a dramatic increase in the performance of control students over time, and suggest the need for a more nuanced understanding of the counterfactual model and its role in establishing evidence-based practices.
Keywords
He that will not apply new remedies, must expect new evils; for time is the greatest innovator.
Medical guidelines adopted in the early 2000s recommended that doctors prescribe marine-derived omega-3 polyunsaturated fatty acids (i.e., fish oil supplements) for the prevention of heart attack, stroke, and cardiac death (Kris-Etherton, Harris, & Lawrence, 2002; Van der Werf et al., 2008). In part, the guidelines were based on findings from three randomized control trials (RCTs) in which researchers demonstrated decreased cardiac events and mortality for patients who received the omega-3 supplement (Burr, Sweetnam, & Fehily, 1997; GISSI-Prevenzione Investigators, 1999; Singh et al., 1997). However, on September 11, 2012, more than a decade later, people across the world learned that the authors of a just-released meta-analysis concluded that “[our] findings do not justify the use of omega-3 as a structured intervention in everyday clinical practice” (Rizos, Ntzani, Bika, Kostapanos, & Elisaf, 2012, p. 1032). An important and obvious implication was that the very large sums of money spent on fish oil, nearly $1 billion annually in the United States alone (Hawthorne, 2012), were apparently being wasted.
This scientific about-face left many wondering what had happened. A closer examination of results from the meta-analysis revealed an unmistakable pattern: Studies conducted between 1999 and 2006 demonstrated that omega-3 reduced the relative risk for all-cause mortality (Figure 1); studies conducted after 2006 did not. Some suggested this pattern was due to an increase in sample size and associated improvements in the reliability of treatment effects in the later studies (Humphrey, 2013). Others speculated whether the post-2006 study participants were more likely to be taking statin drugs or eating more fish (see Boyles, 2012; Gann, 2012). These last two explanations reflect recognition that populations may collectively shift their behavior over time, which, in this instance, may have increased the health of many participants, thereby decreasing the relative benefit of omega-3. In short, results from the omega-3 meta-analytic study and ensuing commentary promoted the notion that time can exert its own effect on phenomena of interest to scientists.

Cumulative meta-analysis of the omega-3 supplements for all-cause mortality
Evidence-Based Practices
The field of medicine has a long history of basing clinical practice on direct observation and experimentation (Kennedy, 2004). However, it was only two decades ago that a formalized model of evidence-based medicine was first disseminated (Evidence-Based Medicine Working Group, 1992). Evidence-based medicine, “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients” (Sackett, Rosenberg, Muir Gray, Haynes, & Richardson, 1996, p. 71), represented a transformation in health care. Its implicit and pivotal assertion was that data are superior to authority and tradition (Patterson, 2002). Notwithstanding related concerns about a devaluation of doctors’ clinical experience and judgment (see Feinstein & Horwitz, 1997; Sackett et al., 1996), the practice of medicine was changed fundamentally. Doctors were now expected to base their treatment decisions on scientific evidence.
Education has lagged behind medicine in the use of “rigorous research designs … to generate sound evidence” (Boruch & Rui, 2008, p. 41). However, an important change occurred with passage of the No Child Left Behind Act (NCLB, 2001). Policymakers intent on improving academic outcomes for U.S. students mandated that practitioners base their instructional methods on scientific research. The Education Sciences Reform Act (ESRA, 2002) furthered the evidence-based-practice movement by creating the Institute of Education Sciences (IES), which was tasked with conducting and otherwise supporting scientifically valid research activities. The reauthorization of the Individuals With Disabilities Education Act (IDEA, 2004) extended the reach of the NCLB mandate by requiring that services for children and youth with disabilities should also be “based on peer-reviewed research to the extent practicable” (Sec. 614, d 1 A i IV). Together, the three legislative acts—NCLB, ESRA, and IDEA—have begun to transform education as a field. According to Russell Whitehurst (2003), the first director of IES, education researchers would now be responsible for conducting rigorous research to determine “what works, for whom, and under what circumstances” (p. 6); teachers and policymakers would be expected to “want to know what the research says before making an important [practice-related] decision” (p. 12).
The Counterfactual
When Boruch and Rui (2008), Whitehurst (2003), and others speak about how “rigorous research” will provide a solid footing for educational practices, they are referring to experimental or quasi-experimental designs that are the primary means of establishing causality in medicine and the social sciences. They allow causality to be inferred, in part, by controlling for well-known threats to internal validity (e.g., history, maturation; Shadish, Cook, & Campbell, 2002). The designs are based on a counterfactual model described by 18th-century philosopher David Hume (Lewis, 1973). The counterfactual is an unobservable event that is estimated to evaluate the effect of an experimental intervention. “We observe what did happen when people received a treatment … [and use a control group to estimate] what would have happened to those same people if they simultaneously had not received treatment” (Shadish et al., 2002, p. 5).
There are at least two important points here. First, in a counterfactual framework, treatment effects are understood in relative, not absolute, terms. If the treatment group is superior to the control group, it is not typically because more of its members have achieved a functionally important performance criterion. Rather, it is because its average score is reliably different from that of controls. Second, many supporters of this framework assume that the counterfactual represents an unchanging benchmark: a level of performance on some valued outcome that remains constant both during and beyond the study. We have reason to question this assumption.
Our skepticism is based on results from five related RCTs. Each was conducted in a different year in a 9-year span in one school district to evaluate the effects of Kindergarten Peer-Assisted Learning Strategies (K-PALS; e.g., Fuchs, Fuchs, Thompson, Al Otaiba, Yen, Yang, et al., 2001), a supplemental reading program. Retrospective analyses of the time-series data from these RCTs suggested that a changed (i.e., strengthened) counterfactual decreased the relative value of the K-PALS program. Some might point to this as evidence of a general wrongheadedness of the counterfactual model and traditional scientific inquiry. For our part, the data highlight the importance of considering time and place and the possibility of a changing counterfactual when interpreting experimental and quasi-experimental research, thereby leading to more nuanced understandings of education science.
In the following, we briefly describe methodological features of and findings from the five RCTs. We then explain in more detail our retrospective analyses of these data, and we discuss how specific changes in context may have affected our findings. Finally, we offer recommendations for conducting education research and identifying evidence-based practices in a changing world.
Five K-PALS Studies Across Nine Years
As indicated, K-PALS is a supplemental, peer-mediated reading program (see Fuchs & Fuchs, 2005; McMaster, Fuchs, & Fuchs, 2007). Based on an earlier peer-tutoring program, Classwide Peer Tutoring (Delquadri, Greenwood, Whorton, Carta, & Hall, 1986), K-PALS was designed to intensify students’ practice of important beginning reading skills (e.g., phonological awareness, letter-sound recognition, and decoding) and to facilitate instructional differentiation in a whole-class setting. See Fuchs, Fuchs, Thompson, Al Otaiba, Yen, Yang, et al. (2001) and Fuchs, Fuchs, Thompson, Al Otaiba, Yen, McMaster, et al. (2001) for more information.
Our five RCTs were conducted in Nashville, Tennessee. In each study, classroom teachers were assigned randomly to treatment or control conditions. Project staff worked with treatment teachers in prestudy workshops to prepare them to implement K-PALS with all their students. Staff also coached the teachers during study implementation to improve their fidelity of implementation. Fidelity data were collected at multiple points on teachers and students; students were assessed on appropriate reading-related outcomes; and analyses were conducted to evaluate the relative benefit of the K-PALS program. The RCTs in the 1990s (1997 1 , 1998, 1999) explored the efficacy of K-PALS (e.g., Fuchs, Fuchs, Thompson, Al Otaiba, Yen, Yang, et al., 2001; Fuchs et al., 2002). The RCTs in the 2000s (2004, 2005) were conducted as part of an effectiveness, or scaling-up, evaluation (Fuchs et al., 2010; McMaster et al., 2010). Although there were subtle differences in the research questions guiding the efficacy and effectiveness studies, 2 the basic nature of the independent variable (i.e., K-PALS) and its implementation did not change across the five investigations.
The RCTs in the 1990s demonstrated that, in contrast to comparable controls, low- and average-achieving students in the K-PALS program achieved statistically significant and educationally important improvements across a variety of early reading measures (Fuchs, Fuchs, Thompson, Svenson, et al., 2001). Based in part on these findings, Best Evidence Encyclopedia considers K-PALS to have “strong evidence of effectiveness” and What Works Clearinghouse deems its effect on alphabetics as “potentially positive.” The K-PALS research team anticipated similar outcomes from the RCTs conducted in the 2000s. However, this was not to be. In 2005, data indicated that K-PALS students outperformed controls on only one outcome measure.
Reanalysis of the K-PALS Studies
The research team was puzzled and disappointed—more accurately, shocked and depressed—by these results. How, we asked, could this have happened? Had our evidence-based practice flopped? We revisited the data from the five RCTs to better understand our findings. We did so unclear about our approach. From our original spreadsheets, we created multiple ways of depicting the data to look critically at the reading performance of each of the study groups in every year on all the reading measures we had administered. It eventually appeared to us that, whereas K-PALS students had maintained or improved their absolute level of reading performance over time, control students seemed to have more dramatically strengthened their reading skills—thereby decreasing the relative benefit of our K-PALS intervention.
We formally tested these impressions in two ways. First, we examined each year’s data independently (i.e., analyzing data separately for each of the five RCTs). Our aim was to ensure consistency of analyses to facilitate year-by-year comparisons. Second, we conducted an across-years analysis (i.e., analyzing a combined data set from all 5 years) so we could more directly explore the possibility of an interaction between time and intervention effects. Before discussing these analyses, we briefly describe the common measures we used in the RCTs.
Measures
There were five early literacy measures. For Rapid Letter Sounds, students had 1 minute to say the correct sound for as many randomly ordered letters as possible. Scores ranged from 0 to 111. Segmenting asked students to say the sounds in words they heard. For example, the tester said “dog” and the student segmented the word into three sounds: /d/ /o/ /g/. Students were given 1 minute and they earned 1 point for each correctly segmented sound. Scores ranged from 0 to 56. Word Identification (Woodcock, 1998) required students to read single words. This measure consisted of a list of 100 words ordered by difficulty. It was discontinued after six consecutive errors, and students received 1 point for each correctly pronounced word. Scores ranged from 0 to 73. Word Attack (Woodcock, 1998) evaluated students’ ability to decode 45 nonsense words (e.g., “dee”) ordered from easiest to most difficult. The test was discontinued after six consecutive errors, and students received 1 point for each correctly pronounced nonsense word. Scores ranged from 0 to 41. For Reading Fluency, students read aloud two end-of-kindergarten–level reading passages. The average number of words read correctly in 1 minute was the score. Scores ranged from 0 to 147.
During each of the five RCTs, students in the treatment and control conditions were tested on the first four measures just mentioned immediately before treatment implementation and 16 weeks later, right after treatment completion. Reading fluency was measured at posttreatment in 1999, 2004, and 2005 only. For more on these measures, see Fuchs, Fuchs, Thompson, Al Otaiba, Yen, Yang, et al. (2001) and Fuchs et al. (2002).
Year-by-year analysis
We examined findings from each RCT using a common set of data analytic methods. Table 1 displays demographic information on the children participating in the RCTs. Chi-square tests indicated that samples differed on Title I status, race, and English language learner (ELL) status across the years. A greater proportion of students in 2004 and 2005 were non-White, received Title I support, and were ELLs. Thus, although the independent variable (K-PALS) did not change across the RCTs, the population became more racially, economically, and culturally diverse. In subsequent analyses, we controlled for these demographic differences.
Demographic Data by Year and Study Group
Next, we conducted a series of multilevel mixed effects linear regressions to evaluate differences between K-PALS and control students on each measure within each year. In these models, the posttreatment score was the dependent variable. The prettreatment score as well as Title I, race, and ELL status were covariates. K-PALS status was included to evaluate the efficacy of the intervention. Teacher and school effects were allowed to vary at random to account for the clustering of students within classrooms and schools. No teacher or school-level predictors were included in the analyses. Models were fit by maximum likelihood, using the “xtmixed” command in Stata/IC 10.1 for Macintosh (Stata Corporation, 2009).
Table 2 presents means and standard deviations of growth (posttreatment minus prettreatment) for K-PALS and control students on each reading measure for each year (1997, 1998, 1999, 2004, 2005). Figures 2–6 show K-PALS and control students’ average pre- and posttreatment raw score performances for each of the 5 years in which we conducted the RCTs. Each figure represents performance on one of our five reading measures (i.e., Figure 2 displays data on the RLS, Figure 3 on segmenting sounds, and so forth). The figures also indicate whether the relative between-group contrast in a given year was statistically significant (based on the multilevel, mixed effects, linear regressions) and they show the estimated effect size in parentheses (Hedge’s g; IES, 2011). Positive effect sizes favor the K-PALS group.
Growth (Posttreatment – Pretreatment) by Year and Condition a
Reading Fluency was measured posttreatment only in 1999, 2004, and 2005.

Rapid Letter sounds

Segmenting

Word Identification

Word Attack

Reading Fluency
Results from 1997 and 1998 illustrate reliably greater pre-to-posttreament change for K-PALS students on three and two (of four) measures, respectively (see Figures 2–6). In 1999 and 2004, K-PALS students performed more strongly on all five reading-related measures. Results for 2005, however, indicated that K-PALS students outperformed controls on only Rapid Letter Sounds. Data in Figures 2–6 also show that K-PALS students in 2004 and 2005 made substantially greater gains on most reading measures compared to K-PALS students in 1997 and 1998. However, the relative value of K-PALS lessened over time because the performance of control students increased markedly.
Across-years analysis
To more directly explore the effects of time, we ran a series of multilevel regression models in which data from the five RCTs were combined into one data set. These models accounted for students nested within teachers and teachers nested within schools. Separate models were conducted for Rapid Letter Sounds, Segmenting, Word Identification, and Word Attack. (Reading Fluency was not modeled because the corresponding measure was not administered in all years.) The models included covariates (i.e., pretreatment, race, Title I, and ELL status) and main effects of time and treatment. Time was entered as a student-level covariate due to the cross-sectional nature of the data collection, and it was centered at 1997 to examine the importance of the treatment effect at that time, which was our first year of data collection. The interaction of time and treatment was also entered in the model to explore whether the treatment effect became larger or smaller. Equation 1, included in Appendix A, depicts the general form of the multilevel model: Three categories of race (Black, White, and Hispanic/Other) were represented with dummy variables (Black serving as the comparison group), and the pretreatment variable was centered. In each model, a quadratic term was tested to account for deceleration effects. Because the data were skewed and zero-heavy, we estimated effects using generalized linear mixed models with a zero-inflated negative binomial distribution (Maghimbeigi, Eshraghian, Mohammad, & McArdle, 2008; see data analysis note included in Appendix A for rationale).
Results from the four models (one for each reading measure administered pre- and posttreatment) are presented in Table 3. The coefficient for the K-PALS variable was positive and statistically significant (p < .001) for all outcomes. This indicates that K-PALS students in 1997 earned higher posttreatment scores than controls, accounting for student demographic characteristics and prettreatment scores. The time variable (Time) was also positive and statistically significant for all outcomes, suggesting that, overall, posttreatment reading scores were increasing as a function of time. We obtained a significant quadratic term (Time2) for only one measure—Segmenting. The positive slope and negative quadratic associated with this measure suggests that performance was generally increasing over time, albeit at a diminishing rate. Controlling for the other effects in the models, there was a significant negative interaction between K-PALS and Time for two measures, Word Identification and Word Attack. This suggests that the positive treatment effect of K-PALS relative to control was reliably decreasing across time.
Results From Zero-Inflated Negative Binomial Multilevel Model (N = 2,587)
RLS = Rapid Letter Sounds; SEG = Segmenting; WID = Word Identification; WAT = Word Attack.
p < .05. **p < .01. ***p < .001.
Contexts Change
National policy and federal legislation
Findings from our retrospective analyses indicated that K-PALS and control students became increasingly similar on the reading measures. Moreover, the increasing similarity—or the shrinking difference—between the two groups was not because K-PALS students achieved less than they had in prior years. In fact, average growth for the sample was greater on all four measures for K-PALS students in 2005 compared to those in 1997 and 1998 (see Table 1). Rather, the disappearing difference between treatment and control groups was because controls improved their reading skills much more than they had in previous years. This prompts the question, “How come?” Remember that the first three RCTs were conducted in the 1990s and the last two in the 2000s. What, we thought, might have occurred between 1999 and 2004 to affect the controls so dramatically? The obvious answer, it seemed to us, was the changed landscape of early grade reading instruction that was heavily influenced by the National Reading Panel (2000) and resultant legislation of Reading First, the cornerstone of NCLB (2001). The National Reading Panel identified components of instruction validated by scientific evidence, and the Reading First legislation mandated the provision of this instruction to Kindergarten through third grade students in Reading First schools. The legislation’s aim was to ensure that all children would be proficient readers by the end of third grade. Between 2003 and 2008, Reading First provided $6 billion to approximately 6,000 schools across 54 states and territories (Gamse et al., 2011).
District-level reforms
Informal discussions between project staff and teachers participating in the 2005 study indicated that the purpose and nature of Nashville’s Kindergartens changed dramatically between 2001 and 2005. Everyone with whom we spoke identified the change agent as a no-nonsense Chief Instructional Officer, recently hired by the district. In a 2-hour interview in February 2007, this administrator made clear that her priority was to fundamentally reform Kindergarten classrooms based on the National Reading Panel findings. She explained that prior to 2001, when she was hired, there was no reading curriculum or formal reading instruction in Kindergarten classrooms (an assertion confirmed by us in interviews with Kindergarten teachers).
Shortly after assuming her position, the Chief Instructional Officer introduced a Kindergarten reading initiative that required teachers to use systematic, explicit reading instruction. The teachers were trained to implement instructional components recommended by the National Reading Panel: concepts of print, alphabetic knowledge, letter names and sounds, sight words, phonics, and phonemic awareness. In 2001, 15 elementary schools participated in the district’s Kindergarten reading initiative; by 2002, 40 elementary schools were involved. The following year, all elementary schools, including seven that received Reading First funds, were part of the effort. During this time, the Chief Instructional Officer made certain that building principals were also knowledgeable about early reading; that the district’s reading standards were revised to specify skills Kindergarteners should master in their first year of school; and that data were collected in Kindergarten classes and used to guide instruction and to hold teachers accountable. As she said, “[although] weighing the pig doesn’t make it grow, what gets measured gets done.” By all accounts, the Chief Instructional Officer was a force to be reckoned with.
Of course, we do not really know that the Chief Instructional Officer, Reading First, or other district initiatives were responsible for the changed content of Kindergarten reading instruction because our direct knowledge of exactly what happened across the large school district is limited. That said, we believe it likely that the Chief Instructional Officer’s reading initiative played an important role in changing not only the nature of kindergarten reading instruction but expectations for reading proficiency. Regardless of whether we can convincingly specify the causes of this change, we are confident that the K-PALS implementations in 2004 and 2005 were occurring in a very different context than its implementations in 1997 through 1999. The changed context—that is, the introduction of formal reading instruction in all Kindergarten classes—raised the bar in terms of what it would take to get statistically significant findings favoring K-PALS in an experimental evaluation.
Implications for Education Researchers
Rethinking the Counterfactual Model
In education research, intervention effects are typically explored by comparing the performance of participants in the intervention to an estimate of what their performance would be absent the intervention. In the parlance of many researchers, this estimate is the “counterfactual.” Its operationalization is the control or comparison group. In principle, the counterfactual is pivotally important to research and practice because it is an index of where we are, a departure point for where we want to be. It is necessary for determining whether a new curriculum or instructional program is an improvement over current practice and whether it should be regarded as a “best” (or maybe, “better”) practice. Implicit is the belief that the counterfactual shares features of all valid and useful benchmarks: It signifies something valued; it has been calibrated accurately; and if not permanent, it is at least stable.
In reality, counterfactuals change—sometimes dramatically so. Confused and disappointed by findings that our K-PALS program, implemented in 2005 in Nashville, produced no greater reading improvement among participants than those achieved by controls, we revisited data nearly a decade old—specifically, three RCTs conducted from 1997 to 1999 on the efficacy of the same program in the same district. We found that (a) K-PALS students and controls became increasingly similar over the years on reading outcomes and (b) this increasing similarity between the groups was not because K-PALS students achieved less than they had in prior years. Indeed, average growth on all reading measures was greater for K-PALS students in 2005 than for those in the program in 1997 and 1998. Rather, the disappearing difference between treatment and control groups was likely because controls had improved their reading skills much more than they had in previous years. Interviews with district personnel, most notably the Chief Instructional Officer, suggested that widespread implementation of Reading First, beginning in the early 2000s, was responsible.
Although this story with its detailed documentation and analysis may be unique, its message is not. Many have made the point that contexts are mutable. Shadish et al. (2002) wrote that what we know is likely to change as each experiment is “conducted at a particular point in time that rapidly becomes history” (p. 19). This nuanced understanding of the counterfactual suggests that practices are not by nature simply evidenced-based or not. Instead, application of the label “evidence-based” requires a subtle understanding of the underlying evidence and the relative nature of the counterfactual comparison. In a changing world, the counterfactual of the past may not accurately represent the counterfactual of the present or future. In fact, if we do our jobs well as education researchers, critical features of our efficacious interventions will likely be adopted and integrated into control settings, thereby changing them. This “interference” may decrease our ability to reproduce experimental findings by diminishing the difference between experimental and control conditions (Hernán & VanderWeele, 2011; Schwartz, Gatto, & Campbell, 2011).
And whereas we have described how the instructional context in the same school district can change across time, Coyne et al.’s (2013) varied replication of another supplemental early literacy program illustrated how much instructional contexts can vary from one place to another in the same time frame. For both Coyne et al. and us, differences in context seemed to influence school practice, student achievement, and estimates of the value of our respective instructional programs. As a reflection or representation of context, the counterfactual can exert powerful effects on research and development and on how we think about evidence-based practices. Yet much of the research community seems unimpressed by the possibility. Their apparent lack of interest is mirrored in the convention of referring to the counterfactual as “business as usual,” as if the phrase denotes something transcendent of space and time.
Reasons for Rejecting the Model
If the K-PALS data are taken at face value—if we agree that these data illustrate the changing nature of the counterfactual—then how should we think about “rigorous research” and “evidence-based practices,” concepts vigorously promoted in recent federal legislation and by government agencies? That is, what are the implications of an inconstant counterfactual for these mainstay notions?
Some readers, no doubt, will point to the introduction of formal reading instruction in Nashville’s kindergartens and claim that our knowledge of the particular was necessary to the storytelling. In other words, to make sense of the variation in program effects across time, we had to place or situate the K-PALS implementation in a larger and more complex context than “experimental versus control.” We had to understand the interplay between federal and district policies, how the Metro-Nashville Public Schools’ policy changed, and how it was communicated to and implemented by practitioners at the building level.
Elmore (1996), Darling-Hammond (1996), Richardson (1996), and others have expressed a similar view in arguing for situated rather than standardized school-based interventions. Standardized interventions, like K-PALS, we are told, are created for the “typical” teacher who is as illusory as the mythic student at the exact mean of a distribution. Critics of standardized interventions say that contrary to conventional wisdom instructional strategies and curricula must be designed and evaluated with an individual teacher in mind, not with “the teacher” as an abstraction or composite. When a new practice works for one teacher, it may not work for another teacher down the hall in the same school who might require a different practice to accomplish the same objective; what works in 2014 (as we write) may not work for the same teacher with a different class of students the following year. According to Elmore, Darling-Hammond, Richardson, and others, this is the best (if not the only) way to conduct research, disseminate knowledge, and improve practice—school by school, teacher by teacher. Thus, some are skeptical of standard (decontextualized) evidence-based practices as well as scaling up and state or district mandates because each purportedly violates the importance of the particular.
So one likely reaction to the K-PALS story is to say that it dramatizes why the counterfactual should be viewed as something evolving, devolving, always dynamically connected to the particulars of time and place. In short, for some, the K-PALS story is a cautionary tale, an object lesson about why one should view the counterfactual model with skepticism, if not rejecting it outright.
How to Strengthen the Model
As intervention researchers who have long embraced the counterfactual model and its epistemological underpinnings, this is not our reaction. Yet we have learned from our K-PALS experience. We suggest to others who share our perspective on science that they become as knowledgeable about the counterfactual as about their own treatment group(s) and that they think about such knowledge in at least two ways. The first concerns the content of the treatment. If the treatment is a first-grade reading program, addressing phonological awareness, word recognition, decoding, and reading fluency in equal measure, what of the counterfactual? What content is taught during reading and language arts in control classrooms? A second kind of knowledge is about the instructional process: for example, its intensity (e.g., duration and frequency) and its quality (e.g., its clarity of presentation and its pacing and the enthusiasm with which it is delivered) and whether the children are engaged. Whereas the first of these considerations addresses the “what” of instruction, the second focuses on the “how.”
Researchers should also recognize that comparing the what and how of instruction across treatment and control conditions will likely differ along a continuum: complete overlap at one end, nonoverlap at the other. Many such comparisons, no doubt, will find a middle space but, in the late 1990s, K-PALS represented virtually the only reading instruction in Nashville’s kindergartens. As a result, our RCTs in those years could be fairly described as classic, or conventionally understood, treatment versus no-treatment control comparisons. There was little or no overlap between the two. By 2005, early reading instruction was occurring in both K-PALS and control classrooms. The counterfactual was no longer a no-treatment control but a comparison group. There was now at least modest overlap. Because control kindergartens offered reading instruction, an important question for us should have been “where precisely is the content overlap between the two groups?” And “what are the similarities and differences between them regarding how instruction is delivered?”
Cordray and colleagues (Cordray & Pion, 2006; Nelson, Cordray, Hulleman, & Sommer, 2012) have described a procedure for exploring such questions. Following a similar approach, Vaughn et al. (2013) identified instructional content and processes that distinguished treatment from comparison conditions. This kind of effort can lead to more fine-grain analyses of between-group differences and perhaps to greater clarity about active ingredients—or what more precisely contributes to the superiority of one group over the other. This extended focus will likely require additional resources. Funding agencies might consider adjusting their funding formulas to allow for sustained effort to identify essential components of efficacious interventions.
Becoming more knowledgeable about a successful program’s essential components may also extend researchers’ understandings of how to bridge the notorious gap between research and practice. Fuchs and associates (see Fuchs et al., 2010; McMaster et al., 2010) demonstrated the value of such an understanding in an IES-supported effectiveness (scaling-up) evaluation of a reading program (PALS) for students in Grades 2–5, which involved 41 schools, 116 teachers, in three states. This was the same effectiveness study in which we collected already-presented K-PALS data in 2004 and 2005. In 2006, or Year 1 of the effectiveness study in grades 2-5, Fuchs and colleagues first trained the PALS teachers in these grades to implement the program with fidelity. In the following year, the same teachers were given a choice. They could implement PALS for a second year without change, or they could implement only a part of it—the part the researchers believed was most important. The teachers were told that if they chose the second option (implementing only part of the PALS program), they would be encouraged to customize the intervention by making adaptations or adding activities to better meet the needs of their students. Furthermore, they were told that the researchers would assist them in preparing adapted or supplemental materials as long as they complied with certain parameters such as implementing the program’s essential components and maintaining the frequency, duration, and total number of PALS sessions.
Students of teachers choosing the second option showed statistically significantly greater progress across the school year than students of teachers who chose the first option and controls (ES = 0.25 to 0.60 across reading measures and study group comparisons). Although this complex effectiveness study in grades 2-5 requires greater explanation than we have opportunity to provide, it illustrates at minimum the possibility of combining “top-down,” researcher-developed instruction with “bottom-up,” practitioner-inspired contributions. Such a possibility can be facilitated by the researchers’ capacity to understand active and not-so-active components of their programs.
Researchers and Best Practices
Intervention researchers who have interest in the learning and teaching process work hard to develop efficacious instructional programs, curricula, and materials. If they succeed in demonstrating the value of their programs and products, it is typically assumed that they will stand the test of time. As we have seen, this belief may be illusory. An instructional program may be relatively strong at Point A but not at Point B. How should the researcher-developer respond to her Point B data? Despair would be understandable but nonadaptive. Rather, we suggest that in such a circumstance she start thinking like a successful entrepreneur. By definition, the entrepreneur knows her products well. She has technical knowledge about how they are built. She has conducted focus groups to understand who likes them and who doesn’t and why. And she has similar knowledge about her competitions’ products because she appreciates that she and they are competing against each other in the same marketplace and playing a zero-sum game: One person’s sale is another’s lost sale. The researcher-developer who fails to “beat” the counterfactual must review many things to move forward. These include rethinking the nature of the instructional program and how it was implemented. But such a review must also include an attempt to understand the counterfactual: how it and the researcher’s instructional program are alike and different; how the researcher might modify her program with this information in mind.
In other words, intervention researchers should recognize that they are in competition with the counterfactual, and their programs will be valued or not on this basis. A well-known and convincing way of beating the competition is by showing that the experimental group outperformed the control (or comparison) group on a valued outcome like academic achievement. But there are other ways of outperforming the counterfactual. Consider a situation in which children in experimental and control groups make equally impressive gains. 3 But instruction in the experimental condition takes less time to conduct, or teachers view it as simpler and more satisfying to deliver, or it is less expensive to buy. Program efficiency, popularity, and costs can affect its sustainability. So if experimentals and controls perform equally well on desired academic outcomes, but the experimental treatment is more likely to sustain, then the researcher-developer arguably has a leg up on her competition.
From this perspective, a “business-as usual” conception of the counterfactual is unhelpful. Its implicit message is “if you’ve seen one control group, you’ve seen them all.” It discourages rather than encourages researchers from studying the counterfactual to learn how and why their treatment beat (or failed to beat) it. Recognizing that we necessarily conduct our work in a particular time and place can strengthen us and our enterprise. With such knowledge we are more likely to bridge research and practice and prove our value as members of the larger educational community.
