Abstract
After the adoption of No Child Left Behind (NCLB), a host of anecdotal evidence suggested that NCLB diminished students’ school engagement—a multidimensional construct that describes students’ active involvement and commitment to school and encompasses students’ thoughts, behaviors, and feelings about school. Using data from repeated cross-sections of the Children of the National Longitudinal Survey of Youth, this study draws on methodological innovations from research linking NCLB to academic outcomes to explore this possibility. Findings are suggestive of an immediate NCLB-based increase in engagement that diminished and ultimately became negative over time. Because engagement predicts both achievement and socio-emotional well-being, researchers and policymakers should work to ensure that the Every Student Succeeds Act facilitates accountability systems that promote engagement.
A robust literature links students’ school engagement to a host of positive educational and behavioral outcomes, including academic achievement, educational attainment, mental health and well-being, and the absence of risk behaviors (e.g., Fredricks, Blumenfeld, & Paris, 2004; Markowitz, 2017a; Upadyaya & Salmela-Aro, 2013; Wang & Degol, 2014). School engagement—a measure of students’ involvement with and commitment to school—is the result of students’ responses to school characteristics including the school’s social and emotional climate, discipline policies, and teacher-student relationships. Indeed, experimental evidence shows that interventions designed to improve school climate improved students’ engagement and outcomes across both academic and socio-emotional domains (e.g., Battistich, Solomon, Kim, Watson, & Schaps, 1995; Battistich, Solomon, Watson, & Schaps, 1997; Solomon, Watson, Battistich, Schaps, & Delucchi, 1996), suggesting that educational policy designed to shift school climate may have a powerful impact on students’ school engagement. Despite this potential—and the breadth of outcomes associated with school engagement—school engagement has historically been ignored in many state and federal policy initiatives (Cohen, McCabe, Michelli, & Pickeral, 2009; Fredricks et al., 2004; Osterman, 2000; Wang & Fredricks, 2014), including the recently reauthorized federal education policy No Child Left Behind (NCLB). Indeed, prior to NCLB, educational scholars criticized consequential accountability policies as likely to diminish students’ engagement (e.g., Deci, Vallerand, Pelletier, & Ryan, 1991; Osterman, 2000), and post-NCLB popular media routinely claimed that NCLB turned students into “test taking robots” (e.g., Kirp, 2015). Despite the wide-reaching implications of this hypothesis, the relationship between NCLB and school engagement has yet to be examined rigorously. This paper will do so.
Many scholars have argued that NCLB’s emphasis on high-stakes testing and consequential accountability encouraged teachers to narrow curricula, spend less time getting to know students, and use more didactic pedagogy, ultimately diminishing both student engagement and student outcomes (Au, 2007; Cochran-Smith & Lytle, 2006; Diamond, 2007; Osterman, 2000; Plank & Condliffe, 2013). However, it is also possible that NCLB enhanced student engagement. Students may have responded favorably to an increased emphasis on academics, as suggested by research on academic press (Ma, 2003). Moreover, students’ academic competence is tightly linked to engagement (L. H. Anderman, 2003; Appleton, Christenson, & Furlong, 2008); insofar as NCLB improved achievement, it may have also increased engagement. The present study is an exploratory analysis that draws on methodological innovations from research linking NCLB and academic outcomes (Dee & Jacob, 2011; Wong, Cook, & Steiner, 2015) to adjudicate between these competing hypotheses using repeated cross-sections of student-reported engagement data drawn from the Maternal and Child Supplement to the National Longitudinal Survey of Youth, the only data in which this analysis is possible.
This analysis will provide a rigorous test of claims that NCLB dramatically reduced students’ engagement (e.g., B. D. Jones, 2007; Meier & Wood, 2004; Nichols & Berliner, 2007) and build up a growing body of policy analysis aimed at exploring the impact of educational policies on important but nonacademic outcomes. Additionally, this analysis will inform states as they implement the recently authorized Every Student Succeeds Act (ESSA; 2015), which allows states’ accountability systems to include measures of school engagement as indicators of school success. For states hoping to leverage school engagement as an indicator, it is important to understand whether and how accountability systems employed under NCLB influenced engagement.
Results across two of three estimated treatment-comparison contrasts are consistent with an immediate NCLB-based increase in self-reported school engagement that diminished over time, ultimately resulting in a negative association between NCLB and engagement by 2010. Though this analysis cannot identify the mechanisms behind this pattern, this analysis provides suggestive evidence that the increased threat of NCLB-based sanctions over time may have had a deleterious association with student engagement.
This study provides a rigorous test of the hypothesis that NCLB influenced students’ school engagement as well as important preliminary evidence that distal education policies can impact students’ engagement with school. The present findings are particularly important as states implement ESSA and modify the testing and accountability procedures put into place by NCLB. Although ESSA specifically requires a nonacademic indicator, little research has been done exploring the effects of consequential accountability in general and NCLB in particular on students’ nonacademic outcomes and particularly along the dimensions suggested by ESSA (e.g., student engagement, educator engagement, school climate, postsecondary readiness, etc.). Thus, these findings can inform states seeking to use engagement as a nonacademic indicator and motivate future research on the nonacademic outcomes of accountability systems (e.g., Holbein & Ladd, 2017; Whitney & Candelaria, 2017).
School Engagement and Student Outcomes
School engagement refers broadly to a student’s investment in or commitment to school. It is conceptualized and measured as an inherently multidimensional construct encompassing students’ behaviors, thoughts, and feelings about school (Fredricks et al., 2004; Reschly & Christenson, 2012; Wang & Degol, 2014; Wang, Fredricks, Ye, Hofkens, & Linn, 2017). Behavioral engagement measures school participation and may include measures of attendance and participation in extracurricular, school-based activities. Cognitive engagement reflects mental investment in school. It can be measured with indicators of school relevance and endorsements of the long-run benefits of school as well as measures of academic effort and thoughtfulness. Emotional engagement includes students’ affective responses to school, peers, and teachers, and is defined as students’ sense of belongingness in or connectedness to school as well as their enjoyment of school. These dimensions are meaningfully linked within individuals and together represent a holistic sense of students’ investment in school.
Developmental theories such as stage-environment fit theory (Eccles et al., 1993; Wigfield, Eccles, Schiefele, Roseser, & Davis-Kean, 2006) and self-determination theory (Deci et al., 1991; Deci & Ryan, 1994; Ryan & Deci, 2009) assert that the social and emotional climate of a school engenders student motivation, which is acted out through engagement. Engaged students internalize the goals, values, and skills promoted by schools, leading to student effort and use of school-based supports in the pursuit of healthy developmental goals, resulting in enhanced academic and socio-emotional outcomes. Conversely, disengaged students become disillusioned with the tasks and requirements of school, do not take advantage of school supports for academic and socio-emotional growth, and may leave school altogether.
Evidence for the importance of engagement comes from an extensive body of literature linking school engagement with a variety of student outcomes, both academic and socio-emotional (Fredricks et al., 2004; Wang & Degol, 2014). Engagement has been linked to both effort in school and academic achievement (L. H. Anderman, 2003; Appleton et al., 2008; Goodenow, 1993; Thompson, Iachan, Overpeck, Ross, & Gross, 2006; Wang & Holcombe, 2010). For example, Wang and Holcombe (2010) found that students’ school environments were linked to their academic achievement in adolescence through their school engagement. Engagement is also related to fewer suspensions (McNeely, 2005) and school completion (e.g., Appleton et al., 2008; Archambault, Janosz, Morizot, & Pagani, 2009; Wang & Peck, 2013). For example, Wang and Peck (2013) demonstrated that students with low levels of school engagement were 19 more percentage points more likely to drop out of school than their peers with higher levels of engagement. Socio-emotionally, decreases in engagement have been linked with increases in substance use, delinquent behavior (Hirschfield & Gasper, 2011; Li et al., 2011; Li & Lerner, 2013; Wang & Fredricks, 2014; Wang & Peck, 2013), and depressive symptoms (Li & Lerner, 2013; Wang & Peck, 2013) among students aged 10 to 17 across diverse racial and ethnic groups. For example, Hirschfield and Gasper (2011) found that engagement was associated with lower level of delinquent behavior both in and out of school among early adolescents, a finding replicated by Li and colleagues (2011). Similarly, Henry, Knight, and Thornberry (2012) demonstrated that the absence of engagement was a robust predictor of high school dropout, substance use, and delinquent behavior.
School Features and School Engagement
Research indicates that although engagement is driven in part by student characteristics, school characteristics also influence students’ engagement, suggesting that engagement can be altered by educational policy. Indeed, previous experimental and observational research has demonstrated school-level variability in student engagement and identified features of schools linked to students’ engagement (e.g., E. M. Anderman, 2002; Battistich et al., 1995; Fredricks et al., 2004; Koth, Bradshaw, & Leaf, 2008; McNeely, Nonnemaker, & Blum, 2002; Payne, 2008). For example, Solomon, Battistich, and colleagues report that an intervention designed to improve the socio-emotional climate of a school by promoting cooperative learning, developmentally informed discipline policies, interpersonal helping, and prosocial behavior increased students’ engagement with school (Battistich et al., 1997; Solomon et al., 1996). Similarly, McNeely and colleagues (2002) report that students have higher levels of engagement in schools with fewer total students, responsive discipline policies, and better classroom management using a nationally representative sample. Using the same data, McNeely and Falci (2004) demonstrated that the beneficial influence of emotional engagement with school was largely driven by students’ reported relationships with their teachers. These findings align with conceptual work highlighting the importance of students feeling autonomous and responded to (e.g., Deci & Ryan, 1994; Ryan & Deci, 2009). Taken together, this literature suggests that schools that respond to students’ needs and interests, discipline students in developmentally appropriate ways, build positive relationships between students and school personnel, and promote a positive social and emotional milieu can effectively foster students’ engagement.
School Engagement and No Child Left Behind
Despite evidence suggesting that school engagement is both important for student development and influenced by the school context, school engagement is not accounted for in many state and federal policy initiatives (Cohen et al., 2009; Fredricks et al., 2004; Osterman, 2000; Wang & Fredricks, 2014), including the controversial federal education policy, No Child Left Behind. Signed into law in January of 2002 and effective through 2015, NCLB required that all public school children in states that accepted federal dollars become proficient in math and reading by 2014. It held schools accountable to this goal by testing whether schools made Adequate Yearly Progress (AYP), as determined by the state, toward 100% proficiency each year. Schools administered a state-created, standards-based assessment each spring, and schools that failed to meet AYP were required by law to take specific corrective actions, including informing parents of schools’ AYP status, offering students transfer to schools meeting AYP, and offering supplemental services. Sanctions became increasingly stringent if schools continued to fail, culminating in a complete restructuring and restaffing of the school.
NCLB emerged from a growing accountability movement in education that had several critics in the 1990s and early 2000s. Scholars asserted that strong accountability provisions, particularly those tied to sanctions as in NCLB, were likely to undermine student engagement through several mechanisms. First, consequential accountability policies could incentivize teachers to focus exclusively on tested material rather than content that may be of interest and motivating to students (Dee, Jacob, & Schwartz, 2013; Griffith & Scharmann, 2008; Hannaway & Hamilton, 2008; McMurrer, 2007; Pederson, 2007). Students may disengage from content that seems relevant only in the context of a test. Second, teachers may teach this content in ways that reduce student interaction and stifle critical thinking, such as lecture-based and test preparation pedagogies specifically tied to how questions are asked on the high-stakes assessments (Au, 2007; Diamond, 2007). This shift in pedagogy may interfere not only with students’ interest in academic content but also their emotional engagement with school. Classroom strategies that reduce opportunities for students’ autonomy and peer interactions have been previously shown to adversely affect students’ bonding with their teachers and each other as well as their investment in school more generally (e.g., Connell & Wellborn, 1991; Downer, Rimm-Kaufman, & Pianta, 2007; McNeely et al., 2002). Third, a focus on test material may incentivize teachers to spend less time getting to know students in order to maximize instructional time. Teacher relationships are a core component of students’ emotional connection to school; the absence of these relationships is likely to decrease engagement (e.g., Murray, 2009; Roorda, Koomen, Spilot, & Oort, 2011; Wang & Holcombe, 2010; Woolley & Bowen, 2007). Finally, the stress engendered by strict accountability may reduce teacher responsivity and increase negative mood—stress that is likely to intensify as the probability of sanctions increases. Teachers may be more likely to view and react to disruptions as threatening to their job security rather than normative youth behavior. This stress and frustration may alter disciplinary strategies and reduce teachers’ ability to bond with students, decreasing students’ emotional bond with school, their sense of safety and belonging at school, and ultimately their engagement (Deci et al., 1991; Fredricks et al., 2004; McNeely et al., 2002; Osterman, 2000).
It is also possible that NCLB had a positive impact on students’ engagement. Students may have responded favorably to the heightened focus on academics and higher academic expectations engendered by accountability policies. High academic expectations may communicate to students that they are important and valued by their teachers, potentially enhancing emotional engagement with school. For example, literature exploring the impact of academic press, the term used to describe school climates that emphasize academic excellence (Goddard, Sweetland, & Hoy, 2000), suggests that an academic focus facilitates bonding with school, a key component of emotional engagement with school (Ma, 2003). Similarly, Dee and Jacob (2011, described in more detail in the following) report a positive NCLB effect on teachers’ report that absenteeism, tardiness, and apathy are not a problem in their school, providing some evidence that teachers perceived enhanced indicators of behavioral engagement post-NCLB. Additionally, academic achievement is one of the strongest predictors of student engagement. Previous work examining the relationship between accountability policies and academic achievement found a positive relationship between accountability and academic achievement (Carnoy & Loeb, 2002; Hanushek & Raymond, 2005; Jacob, 2005; Neal & Schanzenbach, 2010). Insofar as NCLB was able to enhance student achievement, as has been demonstrated in several evaluations (Dee & Jacob, 2011; Wong et al., 2015), it may have also boosted engagement.
Prior work examining NCLB impacts has not been able to adjudicate among these hypotheses. Because of the national scope of the law, it is difficult to identify an appropriate comparison group, and evaluations of the law have been few. Only one national analysis has examined whether NCLB impacted students’ enjoyment of learning. Reback, Rockoff, and Schwartz (2014) used a sample of fifth graders in the 2003–2004 school year to demonstrate that accountability pressure from NCLB, operationalized as whether or not students’ schools were close to the AYP margin, does not adversely affect students’ enjoyment of math and reading as reported by students. Using the same empirical strategy, Whitney and Candelaria (2017) find no influence of NCLB on students’ social and emotional skills, including their perceived interest and competence in academic subjects. However, these studies stand in contrast to a larger body of work that asserts that NCLB, and in particular the testing and accountability components, have increased student anxiety and reduced students’ enjoyment of and engagement in school (e.g., B. D. Jones, 2007; M. G. Jones, Jones, & Hargrove, 2003; Meier & Wood, 2004; Nichols & Berliner, 2007). Moreover, some studies have presented mixed findings. For example, using a rigorous quantitative strategy in a large sample of students in North Carolina, Holbein and Ladd (2017) find that while accountability pressure decreases student absences and tardiness, it also increases student misbehavior.
Research on proposed mechanisms is also inconclusive. The most rigorous designs used to assess NCLB have exclusively examined student achievement. These studies provide strong evidence that NCLB enhanced student mathematics achievement but little evidence that NCLB improved reading achievement (Dee & Jacob, 2011; Wong et al., 2009, 2015). However, scholars have also used teacher and administrator surveys, teacher interviews, and classroom observations to provide evidence that NCLB reduced teacher autonomy, motivation, and morale (Cochran-Smith & Lytle, 2006; Finnegan & Gross, 2007; M. G. Jones et al., 2003); reduced time spent on nontested content and subjects (Dee et al., 2013; Griffith & Scharmann, 2008; Hannaway & Hamilton, 2008; M. G. Jones et al., 2003; McMurrer, 2007; Pederson, 2007); promoted the use of lecture-based and test preparation pedagogies (Au, 2007; Diamond, 2007); and reduced the overall level of instructional support in classrooms (Plank & Condliffe, 2013). Again, most of these studies were local in scope, and several do not include a comparison group.
In sum, the current literature cannot identify whether NCLB has impacted students’ engagement with school, a substantial gap given the potential for this construct to impact important short- and long-term outcomes for youth. Moreover, the literature on the broader impact of NCLB remains contentious and does not permit a strong hypothesis regarding the impact of the law on students’ school engagement. It may be that NCLB’s strict accountability and singular focus on high-stakes test performance reduced student engagement school, as widely hypothesized; however, it may also be that the academic focus created by high-stakes testing and consequential accountability encouraged schools to attend more closely to students’ needs as learners, ultimately facilitating both academic achievement and student engagement.
Estimating NCLB Impacts
Much of the debate over NCLB has been fueled in part by the difficulty of evaluating a complex federal program that simultaneously impacted every public school in the nation; therefore, relatively few studies have evaluated the overall impact of NCLB. Moreover, evidence suggests that NCLB’s deference to states in terms of curriculum standards, test development, and accountability schedules has resulted in uneven implementation across states, with some states only loosely implementing NCLB (Davidson, Reback, Rockoff, & Schwartz, 2013; Steifel, Schwartz, & Chellman, 2007) or postponing their implementation altogether (e.g., “backloading” AYP targets; Chudowsky & Chudowsky, 2008). Indeed, Davidson et al. (2013) report that AYP failure rates varied from 1% to 80% across states. Taken together, the lack of a clear comparison group and the wide variety in treatment implementation has limited research examining NCLB impacts.
The most rigorous evaluations of NCLB to date have addressed these challenges by estimating comparative interrupted time series (CITS) models (e.g., Dee & Jacob, 2011; Lee, 2006; Wong et al., 2009, 2015). CITS models build on a traditional interrupted time series (ITS) model, in which the dependent variable of interest (engagement) is plotted over time. If there is an interruption, or change in the time trend, after the introduction of the law (NCLB in 2002), this change could be due to the influence of the policy. This interruption could occur immediately and would appear as a sudden change in the height of the line, or it could occur cumulatively, appearing as a change in the line’s slope after the introduction of the law (after 2002). In an ITS model, however, unmeasured third variables are likely to influence the time trend and may be correlated with the introduction of the law, resulting in a biased estimate. In a CITS model, a comparison group is introduced that should be influenced by any potential confounds but is not influenced by the treatment of interest. If there is a change in the relationship between the treatment and comparison groups after the introduction of NCLB, this is interpreted as the result of the policy. If the change in the comparison group after the introduction of the policy accurately reflects what would have happened to the treatment group, this differential change can be interpreted as a causal effect; if there are systematic differences between the treatment and comparison group that vary over time, then the estimate may still be biased.
Three previous studies have used a CITS design to explore the national impact of NCLB on academic achievement. These studies took advantage of between-state differences in NCLB’s implementation by creating synthetic comparison groups, or groups of states that were subject to the broader American policy and educational context in the years before and after NCLB’s enactment in 2002 but were arguably not impacted by NCLB itself. For example, Lee (2006) and Dee and Jacob (2011) contrasted states with and without consequential accountability (CQA) policies prior to the implementation of NCLB. These authors took advantage of the 30 states that had implemented some form of CQA in the 1990s, prior to NCLB (see Table 1). These authors reasoned that the thrust of NCLB was the introduction of federally mandated consequences for poorly performing schools; if states already applied consequences to failing schools, then NCLB should have a much smaller impact on these states. Put another way, if CQA does impact student engagement, this impact should be larger after 2002 in states experiencing CQA for the first time. This impact could occur immediately, shown in a change in the intercept for treated versus comparison group states, or over time, shown in a change in the post-NCLB slope for the treated versus comparison group.
States With Consequential Accountability Prior to No Child Left Behind
Source. Dee and Jacob (2011).
Similarly, Wong and colleagues (2009) contrasted states that implemented NCLB with varying levels of rigor. They identified these states by contrasting proficiency rates on state tests with the National Assessment of Educational Progress (NAEP) assessments to demonstrate substantial variation in the difficulty of states’ assessments. Specifically, states with high standards (HS) were defined as those with more similar pass rates on the state-created achievement tests as compared to NAEP, whereas states that have high pass rates on state-created tests relative to NAEP (and thus much higher pass rates overall than the HS states) were classified as low standards (LS) states (the 25 remaining states were classified as “medium standards,” see Table 2). These differences in proficiency standards resulted in differential AYP failure rates, sanctions and corrective actions, and ultimately school restructuring (Wong et al., 2009). Indeed, Wong et al. demonstrated that states in the HS group were more likely to institute new curriculum, solicit outside expert advice, replace school staff, and extend the school day than the LS states. Based on these observed differences, Wong et al. conducted a CITS analysis comparing high standards states (treatment group) and low standards states (comparison group). This contrast estimates the impact of the full suite of NCLB reforms rather than consequential accountability only, though it likely underestimates the true impact of NCLB given that all states did implement NCLB to some extent. As with the timing of CQA contrast, the HS-LS contrast could impact student engagement immediately, as a shift in engagement in the first post-NCLB timepoint (2004), or could reflect cumulative changes, as a differential change in the post-NCLB engagement trajectories for high versus low standards states.
States by NCLB Implementation Standards and Timing of Consequential Accountability
Note. Data from Wong, Cook, and Steiner’s (2009) classification of states into low and high standards implementers of NCLB and Dee and Jacob’s (2011) classification of early and late adopters of consequential accountability. VT and NY not included in high and low standards analyses. HI and ND included in neither the HS-LS nor timing of CQA contrasts due to insufficient observations. NCLB = No Child Left Behind; CQA = consequential accountability; HS = high standards; LS = low standards.
Finally, Wong et al. (2009, 2015) also estimated a public versus private school contrast in which public school students were compared to private school students across all states. Because private schools were largely not subject to NCLB, this contrast is a straightforward comparison of students who did and did not experience NCLB-based changes to their schools. Like the HS-LS contrast, this comparison estimates the impact of the full scope of the NCLB legislation, and impacts could occur immediately or over time; however, estimated NCLB effects based on this contrast should be interpreted in light of the substantial heterogeneity in NCLB’s implementation (e.g., Davidson et al., 2013; Steifel et al., 2007).
Present Study
The present study is an exploratory analysis that replicates the two states-based contrasts, along with a public versus private school contrast, to estimate the association between the implementation of NCLB and students’ engagement with school. By using three contrasts—the public versus private contrast, the high versus low standards contrast (Wong et al., 2009, 2015), and the timing of consequential accountability contrast (Dee & Jacob, 2011)—the present study is able to explore the mechanisms by which NCLB impacted student outcomes. Specifically, although the public versus private and HS-LS contrasts from Wong and colleagues (2009, 2015) examine the full scope of NCLB, the Dee and Jacob (2011) CQA contrast is only able to examine the NCLB impact derived from the addition of accountability policy. By comparing these contrasts, the present study aims to identify whether NCLB as a whole impacted engagement and what role CQA specifically had in changing students’ engagement in the post-NCLB era. The present study will thus inform states as they implement ESSA and may alter their current accountability systems. Given the diverse student outcomes associated with school engagement, estimating the relationship between NCLB and engagement remains an important step in making ESSA an effective reauthorization.
Method
Data
Data are drawn from the Maternal and Child Supplement to the National Longitudinal Survey of Youth (NLSY; for more information, see National Longitudinal Surveys, 2017). The NLSY is a nationally representative survey of 12,686 youth who were ages 14 to 21 when interviewed in 1979. These participants were interviewed annually from 1979 to 1994 and biennially thereafter. The original NLSY data focused on labor market behavior and as such collected extensive information about education, cognitive and noncognitive skills, assets and income, crime and substance use, health, and family life (e.g., marital status, fertility, child care); thus, the NLSY79 contained rich information on mothers’ pre- and postnatal contexts and care, providing a unique opportunity to link maternal information to children’s developmental outcomes. Thus, beginning in 1986, the NLSY began following the children of the original NLSY79 female respondents to assess their health, development, and well-being. Like their parents, this sample of children was interviewed annually until 1994 and biennially thereafter. Youth responded to a “child survey” until they were 14, which included demographic information and a 10-item measure of their school engagement. Because the NLSY has been administering this child survey since 1986, the data can be organized into a repeated cross-section, making it usable for an interrupted time series analysis. Indeed, the NLSY are the only existing data in which a repeated cross-section of self-reported school engagement can be constructed spanning several timepoints both before and after the implementation of NCLB. This is particularly important because despite national conversation regarding NCLB and students’ engagement—and the specific inclusion of engagement as a key construct in ESSA—there has been almost no research exploring the impact of NCLB on engagement that includes an appropriate counterfactual. The NLSY data present the opportunity for such an analysis.
Sample
The sample for this study is drawn from the children of the female NLSY79 respondents and consists of youth aged 10 to 14 who completed the engagement scale between 1988 and 2010. The total unique student level N is 11,512 across states and years; each year, the student N ranges from 539 to 1,733. The sample is restricted to students who have a state identifier and valid information on public versus private school attendance and responded to at least 6 items from the engagement survey. Because some students provided more than one observation between ages 10 and 14, the total number of observations is 14,794.
Though this sample is national in scope, it is not nationally representative. The data are representative only of the children of American females aged 14 to 21 in 1979, not of American students from 1988 to 2010. There are three main implications of this sampling feature that are important for this analysis. First, while all states are represented, samples are nonrandom within states and families sort into states for unknown reasons, thus within-state samples are likely not representative of all students in a given state. Second, sample sizes are variable across states, with some states represented by very few observations and some not contributing to every timepoint. Of the 600 state-year cells, 82 are empty, and 33 have one observation and thus do not contribute to the analysis. Finally, sample sizes are time sensitive. Specifically, because all mothers of students in the sample span six years in age, there are fewer observations in the later years of the time series as mothers become older (mothers are roughly 52 years old in 2010), and sample demographics change as well (see Tables 3 and 4).
Number of Observations by Contrast and Year, Student Engagement
Note. Data are drawn from the Maternal and Child Supplement to the National Longitudinal Survey of Youth. The Wong et al. (2015) column represents the HS-LS contrast, where HS indicates a high standards state, MS indicates medium, and LS indicates low. Two states are omitted from the HS-LS contrast because they cannot be categorized as high, medium, or low standards. The Dee and Jacob (2011) column represents the consequential accountability (CQA) contrast, where treatment states are states in which CQA begins with adoption of No Child Left Behind, and comparison states are states that had prior CQA.
Sample Characteristics by Year
Note. Data are drawn from the Maternal and Child Supplement to the National Longitudinal Survey of Youth. Table shows the percent of the sample in each year that falls into female, white, black, Hispanic, and firstborn categories. INR = income-to-needs ratio.
To address these sampling issues, the present study includes a first stage in the analysis. In this first stage, I regress all observations on demographic characteristics—gender, age, race, income-to-needs ratio, and firstborn status, characteristics that have been linked to students’ engagement and may have changed as mothers aged—and use the residuals from this regression and students’ engagement scores. In this way, I remove the influence of these demographic characteristics from student scores and thus equate the states and years in these background attributes. Moreover, all regressions are weighted by the precision of the state-year or type-year observation (see Analytic Strategy) to down-weight the influence of states with a particularly small number of observations with divergent engagement scores, that is, the states in which I have less confidence about the accuracy of the estimate. Additionally, several sensitivity tests are performed. These tests vary the requirements for inclusion in the sample and assess how robust the conclusions are to weighting decisions. Specifically, the minimum number of individuals contributing to each state year observation is varied to increase the reliability of the individual state observations, raw engagement scores (as opposed to residualized) are used as the dependent variable, unweighted analyses are presented such that each state contributes equally to the estimates, students contributing duplicate observations are removed from the sample, and the time series is shortened so that it ends in 2008 (prior to the year with the smallest sample).
Measures
Student Engagement
The child supplement to the NLSY includes demographic information as well as 10 items assessing students’ emotional, cognitive, and behavioral engagement with school, in line with other scales assessing engagement with school (e.g., Li & Lerner, 2011). In the NLSY, emotional engagement with school was measured with 7 items assessing teachers’ involvement with students, students’ liking of school, ease of making friends, and perceptions of safety, which map on to items used in the widely used scale from the National Longitudinal Study of Adolescent Health (Markowitz, 2017a, 2017b; McNeely & Falci, 2004; Resnick et al., 1997). Sample items include “how satisfied are you with your school,”“most teachers help with personal problems,”“it is easy to make friends,” and “I don’t feel safe at this school” (reverse-coded). Cognitive engagement was measured using 2 items that indicated interest in classroom content: “my schoolwork requires me to think” and “at this school, a person has the freedom to learn what interests him/her.” Finally, an ordinal measure of truancy was used as a proxy for behavioral engagement (for a full list of items and additional information, see Appendix A in the online version of the journal).
These items were combined to make two versions of an engagement scale. In the first, all 10 items were averaged to create an engagement scale ranging from 0 to 3. This scale has been used in previous research (Markowitz, 2017a) and includes items similar to those in other published work assessing school engagement (e.g., Chase, Hilliard, Geldhof, Warren, & Lerner, 2014; Wang & Eccles, 2011). Moreover, it aligns with conceptual arguments about the importance of viewing engagement as an inherently multidimensional construct (e.g., Fredricks et al., 2004; Reschly & Christenson, 2012; Wang et al., 2017) and has acceptable internal reliability (α = .70). Second, a subset of 4 items that focused specifically on aspects of the school environment that were particularly likely to be influenced by NCLB—“most of my classes are boring,”“teachers know their subjects well,”“schoolwork requires me to think,” and “a person has the freedom to learn what interests him/her”—were averaged into an academic engagement scale (α = .36). This scale allows for a test as to whether findings persist in a scale tailored to NCLB’s theory of change, in particular, increasing teacher qualifications, academic rigor, and instructional time on key academic areas.
Demographic Information
Demographic information was used to identify youth’s NCLB treatment status and as controls in first-stage models (discussed in the following). To identify youth’s NCLB treatment status, the NLSY included information on students’ state of residence at each year of data collection (recoded such that year is centered at 2002) and maternal report of whether the child was enrolled in public, private, religious, or home school. This item was recoded into a dummy variable where 1 indicates enrollment in public school, 0 otherwise.
First-stage models included a small set of covariates that have been previously linked to students’ school engagement, including an indicator for child gender (1 if the child was a female, 0 otherwise), child age at the time of assessment (in years), a series of dummy variables indicating child race (Black and Hispanic, with White as the omitted category), a measure of the child’s family’s income-to-needs ratio (INR), and a dummy variable indicating whether or not the child was a firstborn. These variables were chosen both because of links to engagement and the sample likely changes on these dimensions over time. Because of the way the data are constructed (i.e., drawing the children of mothers who were aged 14–21 in 1979 at whatever point in time they are 10–14 years old), as time passes, sample mothers are older and the students are on average Whiter, wealthier, and less likely to be a firstborn (see Table 4 for the distribution of these traits over time). The first-stage models (discussed in the Analytic Strategy section in the following) include these covariates to reduce the impact of these sample changes.
Covariates
Estimates from CITS models may be biased if there are systematic differences between the treatment and comparison group that vary over time. State-level covariates are used to account for time-varying, state-level factors that may impact students’ engagement but are not the result of NCLB. For this reason, I include state-level student-teacher ratio and state-level poverty rate, drawn from the Common Core of Data and the Bureau of Labor Statistics, respectively, in models harnessing between-state variation. In doing so, I ensure that any covariation between these state-level factors, the treatment and comparison groups, and the introduction of NCLB do not bias the estimated NCLB effect.
Analytic Strategy
As noted previously, the present study uses variability in NCLB timing and implementation to assess NCLB impacts on engagement using a comparative interrupted time series (CITS) design. In a CITS specification, regression lines are fit separately for each group (treated and untreated) by time period (before and after NCLB, i.e., 1988–2002 and 2004–2010). If the treatment and comparison time trends change in different ways after the introduction of NCLB (e.g., a differential change in either the height of the line or the slope compared to the pre-NCLB trend), this change is consistent with an NCLB-based impact on school engagement. It is important to look at the combined influence of immediate (intercept) and cumulative (slope) changes particularly because the changes of NCLB were cumulative by design: schools worked toward a long-term goal and were increasingly exposed to the potential for sanctions; this increasing probability of sanctions is both a key mechanism of the law and potentially an important driver of any NCLB-based impacts on students’ engagement. As such, NCLB could have both immediate (intercept) and cumulative (slope) effects.
Three contrasts were used to explore associations between NCLB and school engagement. The first contrast compares public schools to private schools, which largely do not receive Title 1 dollars and therefore were not subject to NCLB. In this contrast, an NCLB effect would be observed if there was a change in the post-NCLB (e.g., 2004–2010) engagement trajectory relative to that of private schools. By comparing public and private schools, the overall impact of NCLB is tested, but this contrast does not allow for state variability in NCLB’s implementation.
The second contrast replicates the high versus low standards contrast introduced by Wong et al. (2009). As discussed previously, Wong et al. capitalize on state variability in the implementation of NCLB to create high, medium, and low standards groups (see Table 2) based on differences between NAEP pass rates and state-created test pass rates. HS states had more difficult state exams and thus had on average more schools failing AYP and took more NCLB-mandated corrective actions. Conversely, LS states had easier state-created achievement tests and thus had a low proportion of schools failing AYP and took fewer corrective actions (Wong et al., 2009). In this way, schools in LS states faced lower accountability pressure than those in HS states and because of the lack of corrective actions, essentially implemented NCLB less strongly. In this contrast, an NCLB effect would be observed if the post-NCLB change in the HS states was more extreme than the change in the LS states. Because the HS states are distinguishable from the LS states in terms of both difficulty of tests and AYP schedules, both post-NCLB intercept (differential changes in the level of engagement from 2002 to 2004) and slope (differential changes in the 2004 through 2010 slopes) changes could be observed. As noted previously, this HS-LS contrast tests the full suite of NCLB-based reforms but likely understates the true impact of NCLB because the comparison group is made of states that at least partially implemented NCLB.
The third contrast is based on the timing of consequential accountability. This specification replicates the dichotomous Dee and Jacob (2011) specification, comparing states that had some form of CQA prior to NCLB with those that did not implement CQA until the passing of NCLB. In this contrast, the treated group experienced CQA for the first time with NCLB, and all others are treated as comparison observations. This contrast explores the impact of CQA as mandated by NCLB rather than the full suite of NCLB-based reforms. As noted previously, this contrast is predicated on the idea that changes after NCLB’s implementation will be larger in states that had not previously experienced CQA. Put another way, NCLB-based changes in the school context will be larger in states with no prior experiences with CQA. Thus, like the HS-LS contrast, it is likely to somewhat underestimate the impact of NCLB because states that had prior CQA should still be experiencing NCLB-based changes in their accountability systems (e.g., Fuhrman, 1999; Goertz & Duffy, 2001). These changes may be immediate (intercept, 2004) changes or cumulative (slope, 2004–2010) changes as sanctions based on accountability accumulate. Because this analysis is predicated on the idea that prior CQA policies may have already impacted students’ engagement; specifically, modeling pre-NCLB trends in engagement with respect to the timing of the introduction of CQA policy may be of interest. To explore this question, a sensitivity test is conducted in which the heterogeneity in timing of the introduction of CQA policy within the comparison group is explicitly modeled by coding the treatment indicator as the number of years a state did not have CQA (Dee & Jacob, 2011). These models more flexibly estimate the impact of each additional year of CQA as well as the NCLB effect. Findings are similar (see Sensitivity Analyses section in the following); however, because of the small number of observations available in the present data, the dichotomous indicator (treatment states are those without prior CQA, comparison are those with prior CQA) remains the preferred specification.
Models
To model the public versus private school contrast, I estimate the following:
In this model Yjt represents type-year school engagement means. β6 represents the change in intercept after the introduction of NCLB for public as compared to private schools, and β7 represents the differential slope change. Thus, the immediate NCLB effect is estimated by β6+β7, and the effect of NCLB in 2010 (the last year of data in this analysis) is estimated by β6+ (β7× 8).
To model the high versus low standards contrast, I estimate the following:
In this model, Yst represents state-year school engagement means. β5 represents the change in intercept after the introduction of NCLB for HS as opposed to LS states, and β7 represents the differential slope change. Thus, the immediate NCLB effect is estimated by β5+β7, and the effect of NCLB in 2010 is estimated by β5+ (β7× 8). Note that this model estimates separate slopes and intercepts for the low, medium, and high standards states; high versus low standards contrasts are presented in the text, following Wong et al. (2015). This specification also includes state and year fixed effects, δ s and γ t , respectively.
Finally, to model the timing of consequential accountability contrast, I estimate the following:
In this model, Yst represents state-year school engagement means. β5 represents the change in intercept after the introduction of NCLB for states without prior CQA as opposed to those who already had CQA, and β6 represents the differential slope change. Thus, the immediate NCLB effect is estimated by β5+β6, and the effect of NCLB in 2010 is estimated by β5+ (β6× 8). This specification also includes state fixed effects, δ s .
Preprocessing, Weighting, and Standard Errors
As noted previously, these analyses used data drawn from the NLSY, which is not a nationally representative sample, unlike other CITS-based research on NCLB. Preprocessing and weighting strategies were used to address this data limitation. First, to create the state-year (or type-year in the public vs. private contrast) means that served as the dependent variable, all (N = 14,794) school engagement scores were regressed on the demographic variables listed in the following, and each observation’s residual from this regression was used as their engagement score. Essentially, this process was used to take each student’s engagement score and create a new score that reflected whether the student had high, low, or average school engagement net of demographic factors. For example, if on average females scored a half-point higher on the engagement scales, then by capturing the residual from a regression of engagement on gender, the preprocessing technique would remove the half-point advantage from female scores, equating them, on average, to their male peers. The model was estimated as shown in the following:
Standard errors were clustered by maternal ID. The mean of each group of individual student state-year (or type-year) residuals was then calculated to be used as the state-year (and type-year) observations in the final models. Results were not sensitive to what covariates are included in the first stage, and a sensitivity test in which models are run without this residualizing process was conducted.
Second, all analyses were weighted using the precision of the estimated state or type mean, defined as the inverse of the squared standard error of the standard error of the state (or type) mean. That is, the standard errors of the state-year observations are averaged by state—the estimated precision is the inverse of the squared standard error of this calculated mean. Thus, states with more precisely estimated state-year means contributed more to the estimated NCLB effects than states with more variable means (typically those states with fewer observations). In this way, the analysis down-weights states that may have artificially high or low engagement scores based on having a small number of observations or states with students with highly disparate engagement scores, in which we can have less confidence. However, as noted previously, sensitivity checks were conducted to explore whether results were sensitive to the use of these weights or a strategy that instead dropped state-year observations made up of a small number of observations. Finally, standard errors were adjusted using STATA’s CLUSTER option.
Results
Table 5 reports findings from the public versus private, high versus low standards, and timing of CQA contrasts for both the full engagement scale and the academic engagement scale. Each panel presents the differential intercept and differential slope for the contrast as well as the estimated impact of NCLB immediately (in 2004), in 2007, and in 2010. The first row, the intercept shift, presents the estimated change in school engagement for the treated group at the first timepoint after the introduction of NCLB (in 2004) relative to the comparison group (e.g., for public vs. private school students, HS vs. LS states, and states with no prior CQA vs. states that had prior CQA). Thus, it does not represent the absolute change in engagement from 2002 to 2004 but rather how different this change was for the treatment versus comparison group. The second row in each panel provides an estimate of the differential slope after the introduction of NCLB. This term represents the difference in the engagement trajectory of the treatment versus comparison group from 2004 to 2010; thus, a negative slope can be interpreted as a decreasing trajectory relative to the comparison group, and a positive slope can be interpreted as an increasing trajectory relative to the comparison group. The last three rows represent the estimated NCLB effect in 2004, 2007, and 2010 using the estimates from the CITS model, calculated as described previously.
Associations Between NCLB Adoption and Engagement From 1988 to 2010, Full Engagement and Academic Engagement Scales
Note. Data are drawn from the Maternal and Child Supplement to the National Longitudinal Survey of Youth. Models using high and low standards and timing of accountability include state-level student-teacher ratio and proportion of families in poverty. All models are weighted by the precision of the type-year or state-year observations used to generate each state-year (or type-year) mean. The standard deviation for the residualized student level for the full engagement measure is .42; for academic engagement scores, the standard deviation is equal to .55 across all contrasts. NCLB = No Child Left Behind; CQA = consequential accountability; HS = high standards; LS = low standards.
Panel 1 displays the estimates for the public versus private school contrasts. This model revealed a positive intercept shift for public school students relative to private school students (b = .05, p = .28) but a negative, near zero slope term (b = –.00, p = .82) for the full engagement scale. Neither of these estimates reached statistical significance, however, suggesting that across all states, NCLB was not associated with changes in engagement for public school students relative to private school students (see Figure 1). Estimates were very similar in the academic engagement model, though a slightly larger negative slope (though still nonsignificant) led to a negative estimated NCLB effect in 2010. This estimate was very small in magnitude however (d = –.02) and nonsignificant.

School engagement in public and private schools from 1988 through 2010.
Panel 2 displays CITS results for the HS-LS contrast. Across both the full engagement and academic engagement scales, this panel reveals statistically significant intercept and slope terms. Specifically, for the full engagement scale, immediately following the introduction of NCLB, students in HS states reported an increase in engagement that was on average .07 points higher than students in LS states (.17 of a standard deviation, p < .01). However, the slope term was also statistically significant and negative (b = –.02, p < .01), suggesting that this initial boost decreased over time by roughly .05 of a standard deviation each year. Thus, by 2010, the NCLB effect was negative (see Figure 2) and roughly a fifth of a standard deviation in size (p < .05). For the academic engagement scale, estimates were similar in magnitude but smaller, thus the 2010 NCLB effect was about half the magnitude (d = –.080) and not conventionally significant (p = .229).

School engagement in high and low standards states from 1988 through 2010.
Panel 3 displays the differential intercept and slope for states that adopted CQA for the first time with the implementation of NCLB as compared to states that adopted CQA prior to NCLB. A slightly different pattern emerged for this contrast. For the full engagement scale, this panel reveals a similar in magnitude but not statistically significant difference in post-NCLB intercepts (b = .05, p = .15) and a statistically significant negative slope term (b = –.02, p < .05) for the treatment relative to control states, equivalent to –.05 of a standard deviation each year. Thus, these models again suggest that the average level of student engagement in states that implemented CQA for the first time with the passage of NCLB decreased relative to students in states that had previously adopted CQA. By 2010, the NCLB impact on students’ engagement in states that adopted CQA as mandated by NCLB for the first time in 2002 was .24 of a standard deviation lower than the impact for students in states that had previously adopted CQA (p < .05; see Figure 3). Findings were similar for the academic engagement scale; however, in these analyses, the intercept term was substantially larger and statistically significant (d = .22, p < .05) such that the CQA findings mimicked those of the HS-LS contrast, that is, an initial boost in engagement followed by a decline that led to a statistically significant negative NCLB effect by 2010.

School engagement in states with and without consequential accountability (CQA) policies prior to No Child Left Behind from 1988 through 2010.
In sum, analyses suggest that the NCLB impact on student engagement depends on which NCLB mechanism formed the basis of the contrast. While the public versus private school contrast revealed no NCLB effect, both the HS-LS and CQA contrasts suggest an immediate positive but increasingly negative NCLB effect on engagement over time, a pattern that was consistent across the full engagement and academic engagement scales.
Sensitivity Analyses
Sensitivity tests were conducted to explore the impact of sample limitations on the present findings. All sensitivity tests were conducted on the full engagement scale because the full engagement scale reflects both the conceptual understanding of engagement in the broader school engagement literature and has substantially higher internal reliability (academic engagement sensitivity analyses available on request).
As noted previously, a key limitation of the NLSY data is that although it is national in scope, it is small and not nationally representative. Specifically, some states are represented by a very limited number of observations (see Table 3 for contrast-year Ns), and some states do not contribute to every year. In the HS-LS contrast, the average sample size per state-year observation is 27 (31 for the high standards treatment group, 37 for the low standards control group, and 22 for the medium standards group); in the CQA contrast, the average sample size per state-year observation is 28 (20 for the treatment group, 32 for the control group).
States with a small number of observations may not accurately reflect state-level engagement, introducing error into the models. The preferred specification used preprocessed, weighted data to account for this limitation; however, sensitivity analyses were also conducted in which states with a small number of observations were removed from the sample entirely to test the robustness of the findings to sampling. Results from these analyses are presented in Appendix B, Table B1 (in the online version of the journal). These analyses suggest that the pattern of results is largely consistent across the alterations to the sample, with some small changes in the magnitude of the effect sizes; moreover, the intercept term for the timing of CQA contrast becomes statistically significant and similar in magnitude to the intercept shift in the HS-LS contrast once states with less than 20 observations are dropped from the model. However, the increasingly stringent cutoffs reduce the sample substantially. For example, requiring states to have at least 20 observations reduced the number of states in both treatment and control groups by half; for example, the number of treatment states contributing to post-NCLB estimates was reduced from 14 to 6 in the CQA analysis. Thus, these estimates may be, in part, identifying heterogeneity in the pattern of NCLB’s impacts among larger compared to smaller states.
A second way the main analysis dealt with the small, nonrepresentative sample was to run a first-stage model in which raw engagement scores were regressed on a host of demographic factors; the residuals from this regression were captured and aggregated by state and year to create the dependent variable. The intent of this preprocessing was to reduce the influence of demographic factors—which were driven by the particular characteristics of the NLSY sample. To explore whether the results were sensitive to the use of this strategy, analyses were run using the raw engagement score as the dependent variable and are presented in Table B2 (in the online version of the journal). Estimates are consistent with those of the residualized models. Findings are nearly identical for both the public versus private and CQA contrasts, and slope and intercept estimates were similar in magnitude and significance for the HS-LS contrast; however, the estimated NCLB effect in 2010 did not reach statistical significance in the raw HS-LS models.
The main analyses also weighted all regressions by the precision of the state-year (or type-year) means to increase the weight of more precise data points. However, it is also possible that by weighting the data in this way, large states overcontribute to the estimate. If NCLB-based processes differ in large versus small states, this weighting strategy could lead to incorrect estimates. Appendix Table B3 (in the online version of the journal) presents weighted versus unweighted analyses. As expected, for the HS-LS, these analyses revealed a similar pattern of estimates across the weighted and unweighted models but much larger standard errors in the unweighted models, in which imprecise data points are weighted equally with more precise data points. Ultimately, this change led to a nonsignificant though sill negative NCLB effect in 2010. In the timing of CQA contrast, however, estimates for both the intercept and slope shifts were much smaller, and the standard errors doubled, resulting in no statistically significant estimates, a substantial change from the main analysis.
Recall that the structure of the NLSY data allows the same individual to contribute multiple data points to the analysis. For example, although the unique student N is equal to 11,512, there were over 14,000 engagement scores in the first stage model. Table B4 (in the online version of the journal) presents analyses in which only the latest observation for each student is retained such that no individual contributes more than one engagement score to the analysis. These findings are largely the same as the main analyses with the exception of a slightly larger intercept term and smaller slope term in the HS-LS contrast, resulting in an insignificant NCLB effect in 2010 rather than the negative effect estimated in the preferred model. Given that just 577 students contributed to both pre- and post-NCLB time trends, this difference is likely due to the lack of precision engendered by losing over 1,000 observations and several state-year observations. This loss of observations is particularly likely to matter for the HS-LS contrast, in which the contrast of interest focuses on a quarter of the states rather than about half the states, as in the timing of CQA contrast, or all public school students, as in the public versus private contrast. As expected, the loss of power only minimally alters the estimates for the other two contrasts.
Additionally, recall that the structure of the NLSY results in a decrease in the number of observations in the later years of the analysis as the mothers of the students in the sample are all roughly the same age. To assess whether the changes in the data at the latest timepoint in the series impacted the estimated time trends, analyses were conducted with the 2010 observations removed. Results from this analysis are presented in Table B5 (in the online version of the journal) and demonstrate that the estimated NCLB impacts are very similar across public versus private and CQA models with and without the 2010 timepoint. However, there are meaningful changes to the estimated HS-LS contrast. Specifically, the removal of 2010 substantially reduces both the intercept and slope terms such that both are near zero and nonsignificant; moreover, the NCLB impact by 2007 is now small in magnitude and not statistically distinguishable from zero (see Figure 2).
Finally, to assess the incremental impact of introducing some type of consequential accountability system prior to NCLB, the timing of CQA contrast was run with the treatment indicator coded as number of years a state did not have CQA. By using a continuous measure of CQA in addition to the NCLB-specific CQA indicator (e.g., CQA implemented after 2002), this analysis is able to compare the impact of CQA prior to NCLB to the impact of NCLB-introduced CQA. In these models, the coefficient on the Treatment × Year variable represents the incremental impact of waiting to implement CQA for an additional year on student engagement, the coefficient on the Treatment × NCLB variable represents the impact of implementing NCLB-mandated CQA in 2002, and the coefficient on the Treatment × Year × NCLB variable represents the impact of NCLB-based CQA in the years following NCLB. Results from this model are presented in Table B6 (in the online version of the journal). First, results from the model estimating the NCLB impact in 2007 for a state that did not have prior CQA compared to a state that implemented CQA in 1997 are quite similar to what was estimated in the dichotomous specification (b = –.045 in the dichotomous specification, b = –.039 in the continuous specification). Second, the estimated coefficient on the interaction between treatment and year (e.g., the impact of waiting to implement CQA for comparison states) is small and nonsignificant (b = .00, p = .122). The coefficient on the interaction between treatment and post-NCLB is positive (b = .02, p < .01). This coefficient indicates the benefit of waiting to introduce CQA until 2002. Taken together, these findings suggest both that CQA was largely not associated with engagement in the comparison states prior to 2002 and that CQA as implemented under NCLB was more strongly related to engagement than pre-NCLB CQA provisions.
Discussion
Prior to the introduction of No Child Left Behind, a subset of educational scholars voiced concern that consequential accountability policies would erode students’ engagement with school and thereby undermine students’ academic achievement and social skill development (e.g., Deci et al., 1991; Osterman, 2000), a concern that continued to be voiced throughout NCLB’s tenure (e.g., M. G. Jones, 2007; Kirp, 2015). However, this concern clashed with literatures associating academic press with engagement (Ma, 2003), demonstrating a positive association between NCLB and mathematics achievement (Dee & Jacob, 2011; Wong et al., 2015), and suggesting that NCLB did not detrimentally impact students’ enjoyment of school (Reback et al., 2014) or teachers’ perceptions of behavioral engagement (Dee & Jacob, 2011). The present study examined directly whether NCLB impacted students’ self-reported school engagement using a comparative interrupted time series design, the most rigorous design that has been used to assess NCLB to date, using the only national data set able to support such an analysis.
This study provides some suggestive evidence that federal education policy may be able to impact students’ engagement with school. First, across two measures of engagement, the public versus private comparison, which contrasted public school students to private school students, found no statistically significant NCLB effect. This is notable as the public versus private contrast is arguably the purest contrast; public school students were treated by NCLB, and private school students were not. However, both previous CITS studies (Dee & Jacob, 2011; Wong et al., 2009, 2015) and other theoretical papers (e.g., Davidson et al., 2013; Steifel et al., 2007) highlight the variability in NCLB’s implementation, suggesting that between-state analyses are warranted.
In both of the between-state contrasts—the high versus low standards and timing of consequential accountability models—the present study found a small, immediate increase in school engagement relative to the comparison group (though this was not conventionally significant for the CQA contrast) followed by diminished engagement in treatment relative to comparison states over time. This pattern was consistent across the full engagement and academic engagement scales. In both contrasts, by 2007, the estimated NCLB impact was negative (though nonsignificant at conventional levels for the HS-LS contrast), and by 2010, the estimated effect was negative and significant, equal to between a fifth and a quarter of a standard deviation (p < .05). The HS-LS and CQA models were robust to analyses that dropped small states, used raw as opposed to residualized dependent variables, and had similar patterns (though not significant for CQA) for the unweighted analyses. However, the HS-LS models were sensitive to both removing students who were observed more than once and dropping the 2010 wave. This last sensitivity is particularly notable given that the slope coefficient drives the negative NCLB impacts. As such, the HS-LS findings in particular should be interpreted with caution.
One way to interpret this pattern is that some of the early changes made by the states—such as the development of streamlined standards, curricula, and tests; provision support to struggling schools; and increased instructional time (U.S. Department of Education, 2007; Wong et al., 2009)—may have boosted engagement but that over time, accountability pressure—specifically the increased likelihood of falling into sanctions—may have eroded school engagement, consistent with previously conducted local studies demonstrating decreases in student engagement in response to high-stakes testing and accountability systems (e.g., M. G. Jones et al., 2003; Nichols & Berliner, 2007). Indeed, the presence of negative slope effects rather than intercept changes drives the findings of this study, and the increasingly negative association between NCLB and engagement parallels the increase in NCLB-based sanctions over time. In 2006, 29% of schools were classified as failing under NCLB, a percentage that rose to nearly 40% by 2010 (Usher, 2012). Though the present study cannot identify what NCLB mechanisms likely impacted engagement, previous research suggests that narrowed curricula, reduced instructional support and autonomy in the classroom, and increased teacher anxiety (e.g., Au, 2007; Diamond, 2007; Finnegan & Gross, 2007; Griffith & Scharmann, 2008; Hannaway & Hamilton, 2008; McMurrer, 2007; Pederson, 2007; Plank & Condliffe, 2013) may have played a role.
These findings are important to understanding the potential impact of educational policy for several reasons. First, despite the negative association between NCLB and students’ engagement, it is meaningful that these analyses demonstrated that distal educational policy—for example, state and federal— may have the power to shape students’ engagement. It is also notable that across both between-state contrasts, there was an immediate boost in student engagement. NCLB was a complex, multifaceted law that impacted students’ experiences in diverse ways. This evidence suggests that some of NCLB’s provisions did promote student engagement; however, as noted previously, it cannot identify which levers were most beneficial. For practitioners and policymakers to make the most of the flexibility ushered in with ESSA, it is critical that research identify the key policy levers for building student engagement and that changes to NCLB-era accountability systems do not discard these valuable provisions.
It is also relevant to educational policy that the introduction of consequential accountability under NCLB seemed to be more detrimental for students’ engagement than prior instantiations of consequential accountability (e.g., Table B6 in the online version of the journal). Prior to NCLB, consequential accountability systems were created at the state level and were more flexible than NCLB, designed with state budgetary constraints in mind, emphasized school improvement plans, and used incentives as well as sanctions (e.g., Fuhrman, 1999; Goertz & Duffy, 2001). These more tailored plans may have been differently associated with student engagement because they were less stressful for teachers and administrators, involved the provision of assistance from state education agencies, and may have led to less school failure than NCLB’s consequential accountability. Moreover, it is plausible that the widespread concern over the law among teachers’ unions and in the public discourse (e.g., Gerson, 2007; Kirp, 2015; Ravitch, 2010; Saad, 2012) may have heightened teacher and administrator sensitivity to accountability pressure under NCLB, particularly in states that had no previous experience with consequential accountability systems to reassure them. With the introduction of the Every Student Succeeds Act and the excitement among educators at the end of NCLB (e.g., Walker, 2015; Weingarten, 2015), states may be able to use NCLB’s systems as a base to transition to tailored, supportive consequential accountability systems and reduce some of the anxiety in schools and among educators, potentially increasing engagement while maintaining the academic emphasis introduced by NCLB.
Finally, it is important to interpret these findings in light of previous work highlighting the positive impact of NCLB on academic achievement (Dee & Jacob, 2011; Wong et al., 2009, 2015) and evidence indicating that NCLB did not have negative impacts on students’ interest in or enjoyment of school (Reback et al., 2014; Whitney & Candelaria, 2017) or teachers’ perceptions of students’ behavioral engagement (Dee & Jacob, 2011). First, these previous studies did not assess students’own report of engagement, nor did they assess the full, multidimensional engagement construct, which may account for the diverging findings.
Second, these studies used samples of varying student ages. While previous research on academic outcomes has found consistent positive impacts of NCLB on fourth-grade mathematics achievement, findings for eighth graders are much more mixed, suggesting that NCLB impacts on engagement may also vary by age. Students in both Whitney and Candelaria’s (2017) and Reback et al.’s (2014) study were in fifth grade; teachers reporting on engagement in the Dee and Jacob study were representative across all student ages. The sample in the present study included youth aged 10 to 14 (roughly fourth through eighth grade) and is thus not directly comparable to the samples used in previous research and should be compared with caution.
Third, it is possible that NCLB had varying impacts on the dimensions of engagement, which may have in turn led to differential associations with engagement-linked outcomes. For example, NCLB may have enhanced behavioral engagement but decreased emotional engagement. Indeed, many of the early NCLB-based changes—extended instructional time, tailored curricula, and specific skill-based assessments—would be consistent with climates of academic press that put a strong focus on students’ behavioral engagement and academic outcomes (Goddard et al., 2000) but not necessarily their emotional engagement. Moreover, in a recent analysis, Holbein and Ladd (2017) find that although the accountability pressure engendered by NCLB improved students’ attendance, it also increased students’ misbehavior, highlighting the potential for nuance in how students reacted to NCLB. NCLB may have made students more likely to do what they were supposed to do academically—which could improve achievement—but may have also negatively influenced how students felt or thought about school. It is possible that the narrowing of curricula and focus on test preparation brought on by NCLB could have improved achievement by creating classrooms that were focused on the skills necessary to do well on high-stakes tests but simultaneously decreased students’ enjoyment of and sense of belonging in school. Notably, the full engagement scale used in the present study drew heavily from items typically used to assess emotional engagement, and these items seem to drive the present findings (see Table B9 in the online version of the journal), suggesting that students’ liking of school and perceptions of school’s emotional support may have been more dramatically influenced by NCLB than other components of engagement. Future research should probe NCLB’s effects on outcomes most commonly linked to emotional engagement in particular, including substance use, delinquent behavior, depressive symptoms, and high school completion. Devising accountability systems that maintain a motivation for positive academic change and a focus on students’ skills while minimizing teacher anxiety and disengaging pedagogical shifts—school features that are more strongly linked to emotional engagement—should be a key goal for researchers, policymakers, and practitioners moving forward.
Limitations
Though this study uses national data and a rigorous analysis strategy, it has important limitations. Most notably, the sample sizes in the present data are small and not nationally representative; instead, they are representative of the children of an age cohort of females (rather than representative of American students from 1988 to 2010). This presents challenges for several reasons. First, the data are not guaranteed to be representative of each state selected. Both the nonrepresentative sample and the small state-year sample sizes contribute uniquely to the potential for error in each state-year estimate. Second, because these are the children of an age cohort, the demographic composition of the sample changes over time as mothers age. Thus, it is possible that the reported findings are an artifact of individual characteristics of the youth in the sample who represent each state rather than an NCLB effect. This concern is somewhat mitigated by the lack of systematic differences between treatment and control groups over time (see Tables B7, B8 in the online version of the journal); the use of students’ residual scores, which removed the influence of characteristics such as race and gender; and the findings’ robustness to weighting and sample selection strategies (Tables B2 and B3 in the online version of the journal). Moreover, the lack of precision in each individual state’s data point should increase error in the models rather than introduce bias. That is, the sample sizes should largely increase standard errors and reduce the ability to detect patterns in the data. If the findings were due to error, it is not clear why they would be so consistent. Nonetheless, this analysis ultimately presents the effect of NCLB on engagement in a specific sample that is not nationally representative, and the analysis remains underpowered and thus error-prone. These features of the data are the fundamental limitation of the analysis and influence both the internal and external validity of the study.
Second, it is important to highlight again the high level of variability in the implementation of NCLB between states but also within states. Local education agencies may have been more or less active—in terms of supports or sanctions—even within states that were coded at the state level as high or low standards. Given the nonrepresentative nature of the sample, if students were disproportionately drawn from local levels that contrasted with the state-level high or low standards code, estimates would be biased toward zero. That is, if students were disproportionately coded as high standards when they were actually drawn from a low-implementing district, this should attenuate any estimated NCLB-based impact. The sample limitations of the present data preclude a test that disentangles district-level NCLB implementation effects; however, this is an important future direction for research because it could more clearly identify the features of NCLB that were most strongly linked to engagement.
Third, there are several other threats to internal validity in the present study. Families included in this study were not randomly assigned to either public or private school or their state. To the extent that the introduction of NCLB differentially shifted parents’ decisions about their child’s educational context, the estimate could be biased upward, that is, parents with children who were more likely to be engaged may also have been more likely to shift to comparison states or private schools after the policy’s introduction. Additionally, though the CITS strategy is designed to capture other historical events that may influence the observed time trend in engagement in the comparison group, it is possible that historical events differ across treatment and comparison states. Though the present study includes time-varying covariates to address this threat, there may be other omitted time-varying variables that were correlated with treatment status and influenced students’ engagement.
Finally, the limited sample size of the present data precludes subgroup analyses. It is likely that NCLB’s impacts vary based on student and school characteristics, such as student age, school proportion of free and reduced lunch, and other indicators; however, the present analysis cannot disentangle such subgroup differences. The present analysis combines responses from 10- to 14-year-old students and averages engagement impacts across elementary and middle schools. Previous research found different patterns of NCLB effects on academic outcomes for fourth and eighth graders, groups that are combined in the present sample. If patterns of NCLB impacts on academic outcomes were varied across these groups, the present study may obscure important variation in NCLB’s impact on engagement across age groups. There are important differences between elementary and middle school contexts, particularly with regards to developmental supports for student engagement (e.g., Eccles, 1999; Eccles et al., 1993). As such, these estimates may mask differences in student responses to NCLB in elementary versus middle schools. Future research should disentangle how different student subgroups respond to accountability policy.
Conclusions and Implications
No Child Left Behind ushered in a new era of accountability in the American educational system. Though NCLB enhanced students’ math achievement, in line with the major goal of the legislation, the present study presents evidence consistent with the hypothesis that some of the features of NCLB’s implementation may have had unintended consequences on students’ engagement with school. Patterns across a variety of specifications and sensitivity analyses suggest that although students’ engagement with school did seem to benefit from some of the early NCLB-based changes to schools, it declined over time. Though the present study cannot identify the mechanisms that led to this decline, both previous research and sensitivity analyses from the present study suggest that future research should consider how accountability pressure impacts teachers’ decisions in the classroom regarding curriculum and pedagogy and in particular, student-teacher relationships. This study in one of the first to provide evidence that distal, federal policy changes may have the power to impact students’ subjective experiences with school; future research should consider the mechanisms by which this happens and continue to rigorously explore the impact of other educational policies on nonacademic outcomes such as school engagement.
Though the Every Student Succeeds Act changes many of NCLB’s provisions, the core focus on accountability as a way to improve student achievement remains. As state and local policymakers work to implement ESSA, they have the opportunity to make substantial changes to their testing and accountability systems, including adding indicators of student engagement as a measure of school success. Such changes provide states with the opportunity to reduce any negative influences of high-stakes testing while simultaneously supporting other indicators of student development. The present analysis provides some evidence that it is important that states do so. Insofar as engagement is vital to students’ growth and development, states and districts must ensure that accountability systems preserve the aspects of NCLB and features of schools that facilitate engagement. Measuring student perceptions of their experiences in school and focusing on how school-level accountability filters down to classroom experiences will provide important information that states can use to guide the design and implementation of their accountability policies. Future research should continue to identify what features of schools are associated with school engagement and how the educational policy context helps or hinders schools in their efforts to build relationships with students and support their development.
Footnotes
This research was supported through an American Psychological Foundation Elizabeth Munsterberg Koppitz Award and a Society for Research in Child Development Dissertation Funding Award. This research was conducted with restricted access to Bureau of Labor Statistics (BLS) data. The views expressed here do not necessarily reflect the views of the BLS.
A
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
