Student Assessment Opt Out and the Impact on Value-Added Measures of Teacher Quality

Abstract

Student assessment nonparticipation (or opt out) has increased substantially in K-12 schools in states across the country. This increase in opt out has the potential to impact achievement and growth (or value-added) measures used for educator and institutional accountability. In this simulation study, we investigated the extent to which value-added measures of teacher quality are affected as a result of varying degrees of opt out, as well as a result of various types of nonrandom opt out. Results show that the magnitude of opt out and choice of classification scheme has a greater impact on value-added estimates than the type of opt-out patterns simulated in this study. Specifically, root mean square differences in value-added estimates increased as magnitude of opt out increased. In addition, teacher effectiveness classification agreement decreased as opt out magnitude increased. One type of opt out, where the highest achieving students in the highest achieving classrooms opted out, had the largest impact on stability than the other types of opt outs.

Keywords

opt out value-added teacher evaluation

Introduction

Students choosing to not participate in annual summative assessments (hereafter “opting out”) is a relatively new phenomenon in United States K-12 education, with substantial increases in some states and districts over the past several years. Student participation in assessments has important implications: the scores are used for a host of instructional and accountability purposes, including student grade promotion or graduation and for accountability of teachers, schools, districts, and states.

Much of the literature related to opting out of assessments is based on recent news reports or press releases about which students were expected to participate in assessments, and who ultimately did. Reports about the characteristics of students who opt out differ across localities, with some states or districts finding that wealthy, higher achieving students opt out, such as in Oregon, while New York education officials reported that lower achieving students in relatively wealthy districts were slightly more likely to opt out.

Pizmony-Levy and Green Saraisky (2016) conducted a survey of approximately 1,600 opt-out activists in early 2016 as a means to better understand parent motivations for supporting the opt-out movement and for allowing their children to opt out. Average income of respondents was $125,000, compared with a median of $53,000 for U.S. households. In addition, 97% reported having completed postsecondary education, with almost 60% reported as having a graduate degree. Finally, 45% of respondents reported that they were teachers or educators, and another 16% reported having teachers or educators in their circle of friends.

Most activists reported positively about their own schools, either their child’s or the ones where they work. Sixty-eight percent responded they would give their own school an A or a B, which is more positive than the U.S. general public, whereas 51% gave schools in their community the same grades. The authors hypothesized that this could be due to one of two situations (or both): Activist respondents are wealthier, and have access to what most would consider better quality schools in their neighborhoods, and/or they reject the current popular notion that schools in the United States are failing.

According to Pizmony-Levy and Green Saraisky (2016), 44% of educators who took the survey reported that they did not support the use of test scores in teacher evaluation. Thirty-two percent reported that standardized tests force teachers to teach to the test, 22% reported that standardized tests take away valuable instructional time, and 25.8% did not support the implementation of Common Core State Standards. They also found that 63.3% of respondents reported opting all of their children out of a state assessment, and 11.2 opted some of their children out. Most of the parents also reported that they would likely opt their children out in the future as well (82.8% very likely; 9.3% likely).

Unverified media reports by the National Center for Fair and Open Testing (hereafter FairTest) (2015), reported that at least 14 states had nonnegligible numbers of students who chose to opt out in 2014-2015, ranging from approximately 4,600 in Pennsylvania to 240,000 in New York. FairTest, along with other organizations, had been advocating for students to opt out of the newly implemented national consortium assessments because some believe they are perceived as more difficult than previous assessments, thus leaving students and teachers uncertain about how they will perform (Clark, 2015). Opting out of a state assessment can have implications for many within the education community who rely on assessment results for decision-making purposes.

Student opt out is likely to affect accountability measures based on test scores, such as achievement measures used for school accountability or value-added measures used in teacher evaluations. Value-added measures purport to represent the extent to which students in a classroom grew in an academic year. In reality, they are the result of conditional status change calculations, which represent the extent to which a student and/or classroom changed in the distribution of similar students or teachers (Castellano & Ho, 2013). This change is then attributed to teachers for evaluation purposes. The comparative nature of these value-added models creates a situation in which student opt out can influence both the accuracy and stability of teacher evaluation measures, depending on several factors, such as the magnitude of opt out and whether opt-out patterns are considered to be nonrandom. Random opt out in large numbers could affect the standard errors of any measure created with student assessments because fewer students are likely included in the calculations. Value-added measures typically only include the students in a teacher’s classroom for that year, which is roughly 30 or fewer students in a given elementary classroom.

Nonrandomness driven by student-level characteristics may affect the accuracy of teachers with large concentrations of a characteristic. For instance, teacher value-added measures could be biased up or down if all English learners chose to opt out of the assessment, depending on how systematically different their performance is as a group from other students taking the assessment. Nonrandomness driven by classroom-, school-, or district-level characteristics could affect measures created for each level as well. For instance, value-added estimates could be biased if all higher achieving students concentrated in certain classrooms opt out of the assessment, because this essentially removes the upper end of the distribution of test takers. This would not only affect these teachers but would also likely affect teachers with no students choosing to opt out, because of the comparative nature of the value-added methodology.

In this study, we investigate the impact of nonrandom opt out on student achievement-based value-added measures used for evaluating teachers. Using simulated data generated from empirical data from one state, the magnitude of opt out is varied in teachers’ classrooms, as well as in the overall sample, to determine the impact on the stability of teacher evaluation measures. The nonrandomness of opt out is then varied by relating it to prior achievement in classrooms to determine the impact on teacher evaluation measures for those with and without opt-out students. We chose to simulate opt out as a function of prior achievement because early media reports found this to be true, though patterns varied by locale. Finally, the interaction between magnitude of opt out and nonrandomness is also varied.

Assessment Participation

Under Section 1111(2)(I)(ii) of the 2001 No Child Left Behind Act (NCLB), the United States Department of Education (USDE) required that 95% of eligible students participate in the Grades 3 to 8 assessments for English Language Arts (ELA) and Math at the aggregate and subgroup levels. This means that 95% of all eligible students were required to participate, as well as 95% within each of the federally protected subgroups, such as English language learners or students with disabilities. Overall and subgroup participation rates were calculated at the school, district, and state levels. Not included in participation rate calculations were the approximately 1% of students with the most significant cognitive disabilities, as long as these students took the state’s alternate assessment. As noted in policy guidance from the USDE in 2013, these participation requirements have historically been enforced by the USDE, with states providing regular retake opportunities for absent students (USDE, 2013). The Institute of Education Sciences (IES) found in 2007 that less than 1% of schools did not make their accountability targets because of the participation rate requirement.

State education agencies missing their 95% participation rate requirement can face sanctions such as a formal request to comply, a cease-and-desist order, or the withholding or suspending of Title I funds that are meant to support low-income students (Camera, 2015). States are not necessarily required to meet a 95% participation rate if they do not receive Title I dollars, which is a major policy lever for USDE. Under the new Every Student Succeeds Act (ESSA) passed in 2015, states are now required to factor low participation rates into school-level accountability ratings and to have some level of discretion over how they do so (Ujifusa, 2015b). Perhaps in preparation for this change, the USDE sent letters to 12 states to ensure that they had a plan to address low-participation rates in the assessment at the state, district, or subgroup levels (Klein, 2015).

Opt Out in the United States

According to Bennett’s 2016 research across states, the greatest proportion of opt out took place in New York, where it was about 20% in ELA and Math. Rhode Island, Colorado, and Maine all had rates higher than the 5% rule set by USED as part of NCLB. Bennett reports that the rate of refusal in high schools was also much higher than at the elementary grades. In Washington state the 11th grade refusal rate was 49% in ELA and 53% in Math, where over all grades, the rate was 2% and 3%, respectively. He also reports that the high school refusal rate was the primary reason states were put on alert by USDE for low-participation rates.

Oregon Department of Education officials reported that approximately 5% of students opted-out in 2014-2015, most of whom were nondisabled White students who traditionally perform well on the assessment (Hammond, 2015). After the 2015 assessment administration ended, Governor Kate Brown urged districts to work with parents to stress the importance of assessments and the potential implications of low-participation rates, while at the same time she signed a bill requiring districts to notify parents twice a year of their right to not participate in the state assessment (Ujifusa, 2015a). This bill also created two school ratings systems, one of which penalizes schools for low-participation rates, while the other does not.

In Delaware, 10% of high school juniors statewide did not participate in the assessment in the 2014-2015 school year (Albright, 2016). A bill designed to allow students to opt out of the state assessment was vetoed by the governor, even after gaining support from the Delaware Teachers’ Union and the State House of Representatives. In New Jersey, where the percentage of students opting out was reportedly just under 10%, a bill was introduced in the state legislature that would allow parents to provide written notice to the school that their child would not be sitting for the assessment (Walker, 2015). The bill, however, was not considered when the senate acted on other legislation related to the state’s participation in Partnership for Assessment of Readiness for College and Career assessments (Clark, 2015).

In 2013, as opt out momentum grew across the state, New York State Education Department (NYSED) officials issued guidance to superintendents and principals of all public schools stating that there was no statute or regulation specifically related to allow students to opt out of the assessment (Katz, 2013). In the guidance, NYSED officials stated that taking state assessments is considered part of the “course of study” and that opting out could negatively affect their child’s school or district accountability standing.

According to media reports cited by NYSED, the percentage of students statewide choosing to opt out from the New York state assessment was at its highest level ever in 2014-2015, at approximately 20%, with estimates as high as 90% in some districts on Long Island and in the eastern part of the state (NYSED, 2015). This represented approximately 240,000 fewer students taking the Grades 3 to 8 assessments in ELA and Math According to NYSED, opt-out students statewide were more likely to come from average or low-need districts, and were more likely to receive scores in the lowest two achievement levels in ELA or Math (NYSED, 2015). Opt out at the high school level in New York was not reported on or studied, because students need to complete a series of Regents exams to graduate—and the test-taking pattern varies by district, school, and even student.

American Institutes for Research (AIR, 2016) compared value-added measure results with and without opt out effects and found similar results for the complete model (i.e., 2013-2014 model with complete data) and after excluding the nonparticipation students from 2014-2015 (incomplete opt-out model). They found very similar results between the models. The R-square was approximately 0.7 across grades, with differences no larger than .01 between the complete and incomplete models. At the student level, the root mean square of the difference between the two model predictions was never larger than 0.5, which translated to one half of one scale score point. The correlation of growth percentiles at the student level for those with student growth percentiles was 0.999 in the complete and incomplete models.

AIR (2016) also calculated teacher mean growth percentiles (MGP) in 2013-2014 with and without nonparticipating students in 2014-2015 and found they correlated about 0.98, suggesting a strong linear relationship between them. They concluded the relationship between the change in a teacher’s MGP and classroom characteristics was not large and/or systematic for most characteristics. Teachers with large positive changes in MGP tended to have lower proportions of economically disadvantaged students. Similarly, large positive changes in MGPs were also related to lower nonparticipation rates.

According to AIR, 82% of teachers were expected to receive the same classification rating used by the state under both models as a result of opt out, 3.7% were expected to increase one rating category, 4.3% were expected to decrease one rating category, and about 0.1% were expected to move both up and down by two rating categories. Almost 3% of teachers would have expected transitions from the top two categories in the complete model to the bottom two in the incomplete model (without opt-out students).

Classification agreement between the complete and incomplete model effectiveness ratings across all teachers was high—at 93%. This is higher than the expected rate of classification agreement AIR reported in the previous section. Teachers in categories of consequence, however, had lower agreement—only 80% of teachers in the bottom two rating categories in the complete model remained in the same category in the incomplete model. Similarly, 90% in the top two categories in the complete model did not change categories in the incomplete model.

Rice, Marland, and Meyer (2016) found that lower achieving students, based on prior achievement scores, in higher achieving districts were more likely to opt out in their analysis of 28 districts in New York. In addition, there was variance across districts in the types of students who were more likely to opt out. Higher achieving students were more likely to opt out in some districts, while lower achieving students did in others. NYSED, like many state education agencies, uses assessment scores for a host of accountability purposes, including status and growth measures.

Opt Out Regulations and Legislation

Thirty states and the District of Columbia currently have specific legislation that would compel students to take state assessments (Croft & Lee, 2016). An additional five states allow for exemptions, such as in Oregon where students can exercise a religious exemption, which is one of several available exemptions to students in different states. Other exemptions include physical disability, medical reasons, or emergencies. Five states do not have explicit opt-out policies, but allow districts or schools to create their own, eight states have no formal policy and do not promote opt out by notifying parents about their rights, and two states allow for opt outs and notify parents. In the wave of 2015-2016 legislation spurred by the increase, most of which were not successful, bills were typically crafted outlining the opt-out process, including parent notification and how parents could request an exemption. For the most part, guidance to parents typically cites Section 111 of NCLB, which states students should take part in the state assessment.

Introduction to the Present Study

Although some important research in the area of opt out has been done, it is still unclear whether specific types and magnitudes of opt out have effects on teacher evaluation. Thus, the purpose of the current study is to consider the extent to which teacher value-added estimates are affected by the magnitude of random and nonrandom opt-out patterns, as well as by the relationship between opt out and prior achievement. The specific questions addressed were the following:

What is the impact of opt out on value-added measures of teaching effectiveness?

a. How does opt out in different magnitudes within a teacher’s classroom impact value-added measures?

b. How do varying degrees of relationship between opt-out patterns and prior achievement impact value-added measures?

What is the impact of opt out on teacher effectiveness classifications?

Method

A simulation study was conducted to examine the amount of bias introduced into value-added estimates under various opt-out conditions and to determine the extent to which opt out affected classification of teacher effectiveness. The details of the simulation are described next. Essentially, observed scale scores were simulated to represent students’ test scores on a typical statewide assessment for four grades, hereafter referred to as Grades 3, 4, 5, and 6. The probability of opting out was simulated using parameter estimates from empirical data. Students were then identified for deselection from the analysis randomly, based on the probability of opting out of the assessment, and based on their prior achievement. Grade 6 observed value-added estimates were calculated in separate models using Grades 3, 4, and 5 as conditioning years and with a shrinkage estimator applied to account for small numbers of students included in the estimates.

The data were generated using a multivariate sampling approach from Castellano and Ho (2015) to produce a nested structure observed in real data; that is, students nested within classrooms. Furthermore, to simulate realistic data, the parameters in this simulation were based on real test data from empirical analysis from one state. Correlations and root mean square differences (RMSDs) were investigated across conditions to better understand the extent to which error is a function of opt out. Two state approaches were used to classify value-added estimates to determine rates of agreement between complete and incomplete value-added estimates across replications for every classroom.

Empirical Data

Four years of empirical assessment and demographic data for 32,722 students in Grades 3 to 8 in 122 schools in 28 districts in one state were made available to generate parameter estimates for this simulation study. The included school years spanned 2011-2012 to 2014-2015. Because these data represent a subset of the state, we compared several generating parameter estimates with publicly available data from the state’s website. As mentioned, we generated sixth-grade scale scores in Math, so we restricted the subsample data to only those students, which resulted in 8,023 students (Table 1). This is approximately 4% of the state’s total sixth-grade population.

Table 1.

Descriptive Statistics for Sixth-Grade Math in Sample and Statewide.

	Sample	Statewide
Number of opt-out students	1,563	47,177
Number of participating students	6,460	141,167
Total enrolled	8,023	188,344
% of opt out	0.19	0.25
SD of opting out	0.4	n/a

Note. SD = standard deviation; n/a = not applicable.

Identifying Students as Opt Out in Empirical Data

As part of the requirements under the evaluation process in this state, each district must report to the state teacher–student linkage information that identifies a teacher-of-record for every student. Also included are the number of minutes a student is in a teacher’s classroom during the course of the year. The state uses this information when calculating growth measures for use in evaluation and returns to each district a data file with a reason for why the student was or was not included in growth calculations.

Using the empirical data provided by the state, students were considered opt outs if they were linked to teachers for the entire year but had no valid current year test score. This means that the student was in a tested grade and linked to a teacher but had no valid test score for the same year. Students who did not meet minimum enrollment and attendance duration requirements were dropped from analysis, unless they were also identified as not having a valid current year test score, in which case they were also considered opt outs. Using this approach does likely slightly inflate the percentage of opt out, because it includes students who could have received a medical exemption from the state. All students with valid test scores were considered test-takers for the purposes of the analyses.

Data Generation

Generating Observed Scale Scores

Scale scores were generated using a multivariate normal sampling approach used in Castellano and Ho (2015) and with parameter estimates from empirical data, where within- and between-classroom deviations were sampled from multivariate distributions and summed to create student-level observed scores. This sampling allowed observed scale scores to be generated for each student in a classroom with the addition of a common classroom effect. The multivariate sampling procedure begins with Equation 1:

(\begin{matrix} Y_{g}^{B} \\ X_{1 g}^{B} \\ X_{2 g}^{B} \\ X_{3 g}^{B} \\ O_{g}^{B} \end{matrix}) \equiv (\begin{matrix} Y_{g}^{B} \\ X_{g}^{B} \\ O_{g}^{B} \end{matrix}) N_{5} (μ, Σ^{B}),

(1)

where $μ$ is a vector of average scale scores across all students in the generated data for each year, as well as the percent of students who will opt out of the assessment. The dimensions of the average scale score and percent of opt-out matrix $μ$ are 5 × 1. $Σ^{B}$ is the variance–covariance matrix for the average classroom scale scores and opt out with dimensions of 5 × 5. $Y_{g}^{B}$ is the current year classroom deviation from the average score, $X_{g}^{B}$ is a classroom deviation for each of the 3 prior years, and $O_{g}^{B}$ is the classroom deviation for opting out. For scale scores, $μ$ was set to 310 for each year to mirror the average in one state, and because average scale scores in a state tend to remain relatively stable across years, assuming tests are not vertically scaled and have within-grade scales. For the probability of opting out, this was varied across three conditions: 5%, 10%, and 20%. The covariances between opting out and scale scores was set to zero for all years, except for the immediate prior year.

In Equation 2, we have the multivariate sampling procedure for within-class deviations:

(\begin{matrix} Y_{ig}^{W} \\ X_{1 ig}^{W} \\ X_{2 ig}^{W} \\ X_{3 ig}^{W} \\ O_{g}^{W} \end{matrix}) \equiv (\begin{matrix} Y_{ig}^{W} \\ X_{ig}^{W} \\ O_{g}^{W} \end{matrix}) N_{5} (0, Σ^{W}),

(2)

where 0 is a 5 × 1 column vector that represents the average of the within-classroom deviation for each of the 4-generated years of scale scores and probability of opting out, and $Σ^{W}$ is the 5 × 5 variance–covariance matrix for generating student-level deviations from the classroom mean for each year. $Y_{ig}^{W}$ is the within-classroom deviation for the current year, $X_{ig}^{W}$ represents the deviation for each of the prior years, and $O_{ig}^{W}$ is the within-classroom deviation for opting out. To generate current and prior year scale scores and the probability of opting out, we used Equations (3), (4) and (5):

Y_{ig} = Y_{g}^{B} + Y_{ig}^{W},

(3)

X_{ig} = X_{g}^{B} + X_{ig}^{W},

(4)

O_{ig} = O_{g}^{B} + O_{ig}^{W},

(5)

where $Y_{ig}$ is the current year test score for each student in a classroom that is the sum of the between and within-group deviations, $Y_{g}^{B}$ and $Y_{ig}^{W}$ , and $X_{ig}$ is a matrix containing each of the three prior scale scores, the dimensions of which are $N \times J$ , where $N$ equals the number of students and $J$ equals the number of prior scores. $O_{ig}$ is the probability that a student will opt out of the state assessment in the current year.

Data generation required the use of student- and teacher-level correlations of scale scores across years, the student-level standard deviation of scale scores, and the intraclass correlation (ICC) observed in real data, which is the proportion of the variance attributed to classroom-level differences in scale scores. Intertemporal correlations of scale scores at the student level were set to 0.85 between adjacent years, 0.83 for scores with a 2-year lag (i.e., current with 2 years prior, 1 year prior with 3 years prior), and 0.75 for scores with a 3-year lag (i.e., current with 3 years prior).

For generating the probability a student opts out, we used the student-level correlation between the dichotomous indicator for opting out from the empirical data and the immediate prior year scale score, the correlation between average prior achievement and percent of students opting out in a classroom, the student-level standard deviation of the dichotomous indicator for opting out, and the ICC of opting out. The dimensions of the correlation matrix that includes scale scores and probability of opting out, denoted as $R$ , are 5 × 5. Correlations between all years of scale scores and opting out were set to zero, with the exception for the correlation between opting out and the immediate prior year.

Using the administrative data, correlations across years at the teacher-level were set to 0.90 for adjacent years, 0.85 for scores with a 2-year lag, and 0.80 with a 3-year lag. The correlation between average prior achievement and the percent of students opting out in a classroom was set to −0.05. These correlations are expressed in a 5 × 5 matrix denoted as $R^{B}$ . Student-level standard deviations were set to 35 for scale scores, which were held constant across years. The student-level standard deviation of opting out was set to .4. Scale score and opting out standard deviations are expressed in a diagonalized matrix, $D$ . The intraclass correlation, expressed in $ω$ , was set to 0.22 and also held constant across years for scale scores, and set to 0.13 for opting out. The $ω$ matrix is a diagonalized 5 × 5 matrix.

First, the total variance–covariance matrix, $Σ$ , which includes within- and between-classroom differences, was calculated using the student-level standard deviations in $D$ , and intertemporal correlations in $R$ in Equation (6) as follows:

Σ = DRD .

(6)

Next, the between-classrooms variance–covariance matrix, $Σ^{B}$ , was calculated, using the intraclass correlation contained in $ω$ , the student-level standard deviation in $D$ , and the classroom-level intertemporal correlations in $R^{B}$ in Equation 7:

Σ^{B} = (\sqrt{ω} D) R^{B} (\sqrt{ω} D) .

(7)

The difference between these two matrices results in the within-classroom variance–covariance matrix $Σ^{W}$ that was used to generate the student-level within-class deviations from the classroom average in Equation (8):

Σ^{W} = Σ - Σ^{B} .

(8)

Estimating Value-Added Measures of Teacher Quality

To estimate value-added measures of teacher quality, we used a common method also used in Guarino, Reckase, Stacy, and Wooldridge (2014) that estimates a teacher effect through the use of dichotomous indicators for each of the 1,000 teachers. The model is parameterized as follows:

A_{ig} = λ_{1} A_{i, g - 1} + λ_{2} A_{i, g - 2} + λ_{3} A_{i, g - 3} + E_{ig} β + e_{ig}

(9)

where $A_{ig}$ is the current year test score for student $i$ , $λ_{1}, λ_{2}, λ_{3}$ are slope parameters, $A_{i, g - 1}$ , $A_{i, g - 2}$ , $A_{i, g - 3}$ are prior year scale scores, $E_{ig}$ are indicator variables for specific teachers, $β$ are teacher estimates of effectiveness (fixed effects), and $e_{ig}$ is the student-level error term. This model was used because it is relatively straightforward in calculating teacher fixed effects and allows for the calculation of teacher-level standard errors. Variations of this model are also used in practice in several places, such as Los Angeles Unified School District and Hillsborough County Schools in Florida. We also applied a shrinkage estimator that accounts for reliability of the value-added estimates, so that we can investigate the extent to which shrinkage can help mitigate changes in classification when compared with unshrunk estimates. The shrinkage adjustment is the ratio of true variance in teacher value-added to total variance, which is then applied to value-added estimates and standard errors. Univariate shrinkage is implemented using the following equation:

{\tilde{α}}_{j} = (\frac{ω^{2}}{ω^{2} + σ_{j}^{2}}) {\hat{α}}_{j},

(10)

where ${\hat{α}}_{j}$ is the unshrunk value-added for teacher j, $σ_{j}^{2}$ is the squared standard error of ${\hat{α}}_{j}$ , $ω^{2}$ is the variance of the teacher effects $α_{j}$ , and ${\tilde{α}}_{j}$ is shrunk value-added. The estimate of $σ_{j}^{2}$ is produced from the value-added regression. The variance $ω^{2}$ is estimated as the variance of the ${\hat{α}}_{j}$ across teachers minus the mean of the squared standard error estimates ${\hat{σ}}_{j}^{2}$ :

{\hat{ω}}^{2} = Var [{\hat{α}}_{j}] - Mean [{\hat{σ}}_{j}^{2}] .

(11)

Simulation Conditions

As mentioned, the mean probability of opt out was set to 5%, 10%, and 20%, which is meant to simulate realistic, varying degrees of opt out in the data. We simulated 100 data sets for each condition, for a total of 300 data sets. In each data set, students were chosen for opt out (or deselected from analysis) randomly (Condition 1 below), based on their probability of opting out generated in the previous steps (Condition 2), or based on their place in the prior achievement distribution (Conditions 3 and 4). There are four conditions for each magnitude of opt out that simulate possible real-life scenarios across states.

Random: Refers to students who were dropped randomly from analysis. Including a random opt-out condition serves as sensitivity check for our other conditions, against which we can compare our other nonrandom conditions;

Highest probability: Refers to the condition where students were dropped based on the probability of opting out, which was predicted using prior achievement from the actual state data. Students with the highest probability of opting out were selected until we reached the desired magnitude. This condition most closely mirrors a real opt-out condition because it is based in empirical data, though we recognize students with the highest probability in reality may not all choose to opt out;

Lowest achieving: Refers to the condition where 50% of opt-out students had the lowest prior achievement in the top quartile of prior achievement of all classrooms, and the other 50% of opt-out students were randomly selected from the other three quartiles. This was meant to simulate a situation where a high percentage of students might feel pressured to opt out, because they may be considered the lowest achievers in their high-achieving classrooms.

Highest achieving: Refers to the condition where 50% of opt-out students had the highest prior achievement in the top quartile of prior achievement of all classrooms, and the other 50% of opt-out students were randomly selected to opt out. This was meant to simulate the highest achievers in high achieving classrooms deciding to opt out because they do not need the “signal” a state assessment provides.

Additionally, magnitude when applied to a condition, refers to the 5%, 10%, and 20% opt-out conditions that were simulated for each of the four opt-out conditions.

Value-added estimates were calculated once with all students in the analysis for the 100 complete data sets, and once for each of the four conditions with students deselected from analysis. This results in 500 value-added estimates of effectiveness for teachers: one complete value-added estimate, and four “incomplete” based on opt out simulation conditions.

The number of students associated with teachers was varied to have a mean of 30 and standard deviation of 10, which mirrors a typical elementary school classroom in the empirical data. The minimum classroom size was 1 student and the maximum was 63. However, as noted later, classrooms with fewer than 11 students were excluded from stability analysis, because states and districts often set a minimum number of students required to receive an effectiveness rating. Excluding classrooms smaller than 11 students limits inferences in this study to general education settings, because inclusion classrooms typically have fewer than 10. The total sample size of students per replication was approximately 30,000, and the total sample size of teachers per replication was 1,000.

The stability of the complete and incomplete value-added models was examined by calculating the RMSD between the incomplete and complete value-added models across replications from AIR (2016) in Equation (12):

RMSD = \sqrt{n^{- 1} \sum_{i = 1}^{n} {(VA_Incomplet e_{jk} - VA_Complet e_{jk})}^{2}}

(12)

The difference between the incomplete and complete VA estimate was calculated for each classroom, j, as well as k replications (k = 1, 2, . . ., K), where K = 100. The result will be on the value-added scale, which is represented in student standard deviation units. We also calculated the RMSD between value-added estimates obtained under the missing at random condition and estimates from the other four missing not at random conditions.

Last, to investigate the practical implications of opt out on value-added estimates, we classified teachers into four rating categories using two state approaches, New York and Florida, with teachers classified based on their complete and incomplete value-added estimate for each of the data sets as a means for comparison between the two states. The states were chosen because their approaches are fairly different in how far above or below average a teacher needs to be to be classified into a rating of consequence; however, both require the use of confidence intervals. We also used New York’s approach to classification because AIR (2016) provided classification agreement statistics against which we can compare. The number and proportion of classifications in agreement between complete and incomplete value-added estimates across all replications and conditions were calculated.

Results

Data Generation

As a first check, we review simulation diagnostics to ensure the data generation process performed as expected and to support in understanding how prior achievement is related to opt-out patterns. In Table 2, correlations between percent opt out in a classroom and average prior achievement are listed. Correlations are zero in the random condition across all magnitudes, and −0.04 to −0.05 in the highest probability condition (by design). In the other two conditions, the correlations between prior achievement and percent opt out in a classroom are much higher, which, as mentioned, was intentional to create extreme scenarios.

Table 2.

Correlations Between Percent Opt Out and Prior Achievement for All Conditions.

	5%	10%	20%
Random	0.00	0.00	0.00
Highest probability	−0.04	−0.04	−0.05
Highest achieving	0.46	0.69	0.60
Lowest achieving	0.64	0.50	0.74

Table 3.

Correlations Between Complete and Incomplete Value-Added Estimates from Each Condition and Magnitude.

	Unshrunk Fixed Effects			Shrunk Fixed Effects
	5%	10%	20%	5%	10%	20%
Random	0.9962	0.9826	0.9921	0.9963	0.9923	0.9831
Highest probability	0.9958	0.9913	0.9812	0.9959	0.9819	0.9916
Highest achieving	0.9954	0.9915	0.9819	0.9956	0.9919	0.9827
Lowest achieving	0.9955	0.9914	0.9814	0.9957	0.9918	0.9824

Table 4.

RMSD for Unshrunk and Shrunk Fixed Effects for Each Condition and Magnitude.

	Unshrunk F.E.			Shrunk F.E.
	5%	10%	20%	5%	10%	20%
Random	0.019	0.028	0.042	0.016	0.023	0.035
Highest probability	0.020	0.029	0.043	0.017	0.024	0.036
Highest achieving	0.025	0.035	0.050	0.020	0.027	0.039
Lowest achieving	0.021	0.029	0.042	0.017	0.024	0.036

Note. RMSD = root mean square difference; F.E. = fixed effects.

Table 5.

Value-Added Classification Rules for Florida (FL) and New York (NY).

	Florida	New York
Highly effective	VAM score is positive and both the 68% and 95% confidence intervals are entirely positive	MGP (or VAM estimate) is 1.5 standard deviations above the mean, and the confidence interval does not include zero
Effective	VAM score is not classified as Highly Effective, Needs Improvement, or Unsatisfactory	MGP (or VAM estimate) is less than 1.5 standard deviations above the mean and more than 1 standard deviation above the mean, and the estimate can have any confidence interval
Needs improvement (FL)/developing (NY)	VAM score is negative and the 68% confidence interval is entirely negative, but the 95% confidence interval includes 0	MGP (or VAM estimate) is less than 1.5 standard deviations below the mean and greater than or equal to 1 standard deviation, and the upper limit of the confidence interval is below the mean
Unsatisfactory (FL)/ineffective (NY)	VAM score is negative and both the 68% and 95% confidence intervals are entirely negative	MGP (or VAM estimate) is 1.5 standard deviations below the mean, and the lower limit of the confidence interval is less than 0.75 standard deviations below the mean.

Note. VAM = value-added measure; MGP = mean growth percentile.

The standard deviation of unshrunk fixed effects (hereafter referred to as unshrunk F.E.) ranged from 0.22 to 0.25 across magnitude and opt-out conditions. For shrunk fixed effects (hereafter referred to as shrunk F.E.), the standard deviation ranged from 0.18 to 0.20. Correlations between average prior achievement and unshrunk fixed effects, as well as shrunk fixed effects, ranged between 0.07 and 0.08.

Stability of Value-Added Estimates

Correlations

As a first step toward investigating stability of value-added estimates, we calculated the Pearson’s correlations of complete estimates with the incomplete estimates from each of the four opt-out conditions and three magnitude conditions, for a total of 12 correlation coefficients. In Table 3, we see that the correlations are all higher than 0.99 for the 5% and 10% magnitude conditions. The correlations range from 0.97 to 0.98 for the 20% condition, which is only slightly lower than the other two magnitudes. These results suggest the estimates are stable.

Root Mean Square Difference of Value-Added Estimates

As outlined in the Method section, we calculated the RMSD between the complete value-added estimates with all students included, and for the incomplete value-added estimates for each opt-out condition (4 conditions) in each of the three magnitude conditions. The RMSDs for the random condition serve as a baseline, by which we can compare estimates from the other three conditions to determine the extent to which the simulated nonrandomness impacts the estimates.

As we see in Table 4, RMSDs increase in each condition as the magnitude of opt out increases for both unshrunk and shrunk estimates, with slightly smaller RMSD for shrunk estimates. We see an average of 0.016 to 0.025 for the 5% magnitude condition across replications, 0.023 to 0.035 for the 10% condition, and 0.036 to 0.050 for the 20% condition. An RMSD of 0.05 in the 20% condition represents an average difference in value-added estimates of almost 0.25 of a standard deviation (Table 2), which is sizeable. This can be interpreted to mean that a teacher could expect to move up or down 0.25 of a standard deviation in value-added estimates if 20% of students opt out.

Across opt-out conditions, we see that RMSDs are fairly consistent, with the exception of the highest achieving condition, where we see a slight increase over the random condition. This increase of approximately 0.004 to 0.008 in RMSD across each magnitude of opt out is fairly minimal, but does represent a difference that is due to this type of nonrandomness, where 50% of the students opting out are the highest achieving in the highest achieving classrooms. Figure 1 demonstrates that average prior average prior achievement is fairly constant across the percent of opt out in all conditions, except in the highest achieving condition. Here, we see that average prior achievement increases as percent opt out increases, which is by design as part of the simulation. Figure 2 demonstrates for each condition how the difference in value-added increases as the magnitude of opt out increases. The difference in value-added is fairly consistent across the percentage of opt out except for the highest probability condition, which we might expect given the simulation design. The student-level probability of opting out was calculated using the student-level correlation with prior achievement, as well as the correlation between average prior achievement in the classroom and the percent of students opting out.

Figure 1.

Average prior achievement by percent opt out by magnitude and condition (mspline smoothing, bands = 25).

Figure 2.

Difference in complete and incomplete value-added estimates by percent opt out: All conditions (mspline smoothing, bands = 25).

Classification Agreement

As a final investigation into the stability of the value-added estimates, and one that would have the most direct consequences for teachers, we classified the complete and incomplete fixed effects into “effectiveness” categories of teachers used by two states—Florida and New York (see Table 5). We then calculated the percentage of teachers where the complete and incomplete rating were in agreement. We discuss the results of each state in more detail next. In Table 3, we see that classification agreement using Florida’s approach is similar across opt-out conditions and increases as the magnitude of opt increases for both unshrunk and shrunk fixed effects estimates. For instance, in the 5% opt-out condition where we use unshrunk estimates, teacher effects are classified the same 94% of the time. This means that the teacher would receive the same rating in the no opt-out condition and the one where approximately 5% of students opt out of the total sample. However, the percent of classification agreement decreases as the magnitude of opt out increases, where we see that approximately 13% to 15% of teachers would receive a different rating in Florida if 20% of students opted out, depending on the type of opt out.

We also see in Table 6 that there is essentially no difference in classification agreement when shrunk fixed effects estimates are used, when compared with the unshrunk estimates. Classification agreement is also slightly higher for the highest and lowest achieving conditions, which may not be an intuitive result. This increase could be due to the lower number of teachers receiving ratings in these two conditions. We investigate in a later section. Last, classification changes were typically ±one rating category for all conditions and magnitudes. The average percentage of teachers changing more than category was less than 0.1% across.

Table 6.

Average Number and Percent of Teachers Remaining in Same Rating Category—Florida Classification System.

		Florida
		Random		Highest probability		Highest achieving		Lowest achieving
	Magnitude	n	%	n	%	n	%	n	%
Unshrunk F.E.	5%	91,535	0.94	91,262	0.94	91,531	0.94	91,715	0.94
	10%	87,827	0.91	87,409	0.90	88,146	0.91	88,217	0.91
	20%	80,626	0.85	79,898	0.85	77,625	0.87	78,307	0.86
Shrunk F.E.	5%	91,196	0.94	90,684	0.93	90,948	0.94	91,312	0.94
	10%	86,946	0.90	86,728	0.90	87,276	0.91	87,379	0.91
	20%	79,645	0.84	79,000	0.84	76,780	0.86	77,762	0.86

Note. F.E. = fixed effects.

Results using New York’s classification approach are presented in Table 7, where we see similar results to the Florida approach where classification agreement is similar across opt-out conditions. However, on average, agreement is slightly higher across all conditions and magnitudes. This is likely due to more rigorous classification rules for what constitutes as different from average that would lead to a rating of consequence. One noticeable difference from the Florida approach is that agreement is lower when shrunk estimates are used for classification, when compared with the unshrunk estimates. Agreement is approximately 4 to 5 percentage points lower when shrunk estimates are used across all magnitudes and conditions. The lower agreement could be due to the increased standard errors that are used in creating confidence intervals and the more rigorous rules for classification into rating categories.

Table 7.

Average Number and Percent of Teachers Remaining in Same Rating Category—New York Classification System.

		New York
		Random		Highest probability		Highest achieving		Lowest achieving
	Magnitude	n	%	n	%	n	%	n	%
Unshrunk F.E.	5%	94,480	0.97	94,353	0.97	94,452	0.97	94,532	0.97
	10%	92,519	0.96	92,071	0.95	92,427	0.96	92,522	0.96
	20%	88,059	0.93	87,349	0.93	83,914	0.94	85,077	0.94
Shrunk F.E.	5%	90,776	0.93	90,640	0.93	90,788	0.94	90,786	0.93
	10%	88,552	0.91	88,373	0.91	88,665	0.92	88,773	0.92
	20%	83,679	0.88	82,885	0.88	79,828	0.89	80,896	0.89

Note. F.E. = fixed effects.

Classification Agreement by Prior Achievement Quartile

As a further investigation into classification agreement, we also calculated agreement by prior achievement quartiles, because we might expect that those with higher achieving students or classrooms to receive different ratings. As a reminder, in the higher achieving condition, 50% of excluded students were the highest achieving in their high-achieving classrooms. In the lowest achieving condition, 50% of excluded students were the low-achieving in high-achieving classrooms. We expect two possible outcomes for these teachers, the first of which is they would not receive ratings because they were more likely to have higher magnitudes of opt out. The second is they may receive different ratings because their relative rank in the distribution changed. In Table 8, we see teachers of higher achieving students would have lower classification agreement if those same students had chosen to opt out. Using shrunk estimates in Florida’s approach where 20% of students opt out, we see that about 70% of teachers would receive the same rating in the highest and lowest achieving conditions. The percent agreement for top quartile teachers increases using New York’s approach to 81% for both conditions. Results are fairly similar across conditions and across prior achievement quartiles.

Table 8.

Percent of Teachers Remaining in the Same Rating Category by Prior Achievement Quartile (20% Condition)—Florida and New York Classification Approaches.

	Prior achievement	Random	Highest probability	Highest achieving	Lowest achieving
Florida	Bottom quartile	0.84	0.84	0.89	0.89
	Second	0.84	0.84	0.90	0.89
	Third	0.84	0.84	0.90	0.89
	Top quartile	0.84	0.85	0.70	0.72
New York	Bottom quartile	0.89	0.88	0.92	0.91
	Second	0.88	0.88	0.91	0.91
	Third	0.88	0.88	0.91	0.91
	Top quartile	0.88	0.88	0.81	0.81

Finally, in Table 9, we show the percent of teachers who would be excluded under each opt out and magnitude condition, where we see that results are fairly similar for the 5% and 10% magnitude conditions, but increase in the 20% magnitude condition. In the 20% highest achieving condition, approximately 9% of teachers do not receive a value-added estimate. In the lowest achieving, approximately 7% do not.

Table 9.

Number and Percent of Teachers Excluded From Value-Added Estimation.

	5%		10%		20%
	n	%	n	%	n	%
Random	787	0.01	844	0.01	3,327	0.03
Highest probability	834	0.01	1,018	0.01	4,076	0.04
Highest achieving	1,041	0.01	1,331	0.01	8,642	0.09
Lowest achieving	922	0.01	1,161	0.01	7,447	0.07

Discussion

The purpose of this study was to investigate the extent to which student opt out of state assessments used for accountability impacts value-added measures of teacher effectiveness. As mentioned in the Introduction section, there was a substantial increase in the number and proportion of students choosing to forego assessments administered in some states, the reasons for which appear to vary across locales. In New York, the State Education Department found that students who chose to opt out were from wealthier districts and were slightly more likely to be lower achieving than students who chose to take the assessment. Rice et al. (2016) found corroborating results in New York and added that these opt-out students in districts they studied were also slightly more likely to require special educational services. In Oregon, opt-out students were reported to be wealthier, higher achieving students (Hammond, 2015).

Given the demographic trends of this phenomenon, it is fair to say that opting out is potentially nonrandom, and that students who are no longer included in the test-taking population are systematically different. Accepting these facts, one could hypothesize that excluding these students from accountability measures (both achievement and growth) could potentially affect calculations and the resulting inferences about schools and educators. This study specifically focused on the extent to which growth measures, as implemented in a value-added model and used for educator accountability, are affected by nonrandom opt out trends in various magnitudes.

There are several prominent findings that contribute to the discussion about the impact of opt out on value-added estimates. The magnitude of opt out did appear to have a large impact on stability statistics, where we saw that RMSDs of value-added estimates more than doubled when opt out increases from 5% to 20% of students choosing to opt out. As the magnitude of opt out increased, classification agreement dropped 7% to 10% using Florida’s classification approach, and 3% to 5% when using New York’s approach. The types of opt out that we simulated did not appear to have a strong impact on classification agreement for a majority of teachers in the study. As mentioned, classification agreement was lowest for teachers of high-achieving students in the two conditions that were meant to represent extreme examples of opt out. In the highest achieving, 20% opt-out condition, where half the students were the highest achieving, we saw classification agreement drop to 70% using the Florida classification approach. As mentioned, this was meant to represent an extreme example, where all high achieving students systematically choose to opt out of the assessment, which has not yet been reported in the United States. In the more realistic scenario where students were chosen to opt out based on their probability, classification agreement was consistent across prior achievement quartiles.

The use of a shrinkage estimator did not appear to have a substantial impact on value-added estimates either, with only a slight decrease in the RMSD when shrinkage was applied across conditions. Perhaps somewhat contradictory, we see a decrease in classification agreement when shrinkage is applied and when using New York’s classification approach. The decrease could be due to the size of the shrinkage adjustment for small numbers of students made to value-added point estimates or standard errors for small numbers of students.

A substantial portion of teachers were completely excluded from classification in the highest and lowest achieving conditions, which could also be driving changes to teacher classifications. As mentioned, the correlations between complete and incomplete value-added point estimates were highly correlated at above 0.98 for all conditions, which means that rank ordering was relatively similar for those with estimates in both scenarios. Exclusion because of opt out also represents a challenge for those designing the evaluation system if they do not have plans in place for a substitute measure when value-added cannot be calculated.

As may be expected, the classification approach employed by states and districts has a substantial impact on stability estimates. We see lower classification agreement across opt out conditions when using Florida’s approach when compared with New York’s, which could be due to Florida’s less restrictive rules around which teachers receive a rating of consequence. New York requires point estimates to be 1.5 standard deviations above or below the mean, and the confidence interval to also be significantly different from the mean. In Florida, point estimates only need to be above the mean and the confidence interval cannot include zero. Their rules as implemented could allow for small changes in a point estimate or confidence interval to result in a change to teacher classification.

Limitations

This is a simulation study, which carries with it some limitations regarding generalization to realistic settings. While this certainly represents many important aspects of generating nested classroom scale score data, there are some factors that were not controlled. For instance, students in realistic settings are affected by grade-, school-, and district-level influences as well, which were not included in this study. The ICCs of scale scores and percent of opt out were used to represent between classroom differences that suggest nonrandom assignment of students to teachers, but the teacher-level ICC neglects school and district-level differences in achievement. Including school and district effects might also create more variation in teacher fixed effects generated as part of this study and should be considered in the next round of analysis.

In addition, the empirical data used to generate the parameter estimates were a subsample of the state and did not fully represent the state as well. Only 37.6% of students were considered as living in poverty in our sample, where 51.9% were statewide. This fact may affect the generated parameter estimates somewhat, if students living in poverty tend to have different growth trajectories than those who do not (which is the case in other locales). That said, we based our simulation conditions on the data observed from actual test administrations in a state, and so there should be reasonable generalizability, and the opt-out conditions we simulated could realistically occur at any time.

Last, we created two extreme examples of opt-out conditions because very little has been published on the types of students who opt out. We were able to obtain data from one state that allowed us to simulate a realistic scenario (highest probability), but even that was designed as a function of only prior achievement because of data availability challenges. Additionally, the design of the other two conditions (highest achieving and lowest achieving) were chosen by the author and may not approximate realistic scenarios. That stated, the results we see in those two conditions were not substantially worse than the one based in empirical data, which could allay some state or district concerns about their own opt-out patterns.

Conclusion

This study has several implications for accountability and teacher evaluation efforts across the United States. States put these growth measures in place to hold teachers, schools, and districts accountable for improving student learning, and ultimately, many teachers are not accountable because of a reduced number of students eligible for inclusion in value-added estimates. Thus, opt out can affect which teachers are classified as effective or not. The possibility that teachers (or their friends) may encourage students to opt out suggests this can be a particularly troubling problem.

Furthermore, most states employ a normative classification scheme for teacher effectiveness, where place in the distribution of fixed effects ultimately determines a teacher’s classification. Given how opt-out results affect this distribution, the results provide another reason for states and districts to consider using a criterion-referenced classification system. As seen in the literature, this requires experts to determine what qualifies as low, average, and high “growth.” A criterion-referenced classification system may be preferable to a system where teachers are more likely to change classifications because of another teacher’s (or their students’) behavior. However, criterion-referenced classification systems bring with them their own challenges, such as continuing to define standards and thresholds. Additionally, state assessment designs change every several years, which could also make maintenance of the system more difficult.

The results also suggest that states and districts should consider standard errors when classifying teachers into effectiveness categories. In this article, we used standard errors when classifying teachers into effectiveness categories to mitigate against misclassification, and as such, advocate that states and districts do the same. Growth estimates appear to be relatively robust to opt out, but the same may not be true of achievement measures used for school accountability if opt out is, in fact, systematically driven by student characteristics. Finally, as we read in the literature, many parents and teachers have concerns about test-based accountability. Policymakers should continue to investigate the impact of using assessments for various decisions in education, both intended and unintended, to ensure that the system is achieving desired results for students.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID ID

Joshua Marland

References

Albright

(2016, January 14). Veto override on testing opt-out fails in house. The News Journal. Retrieved from http://www.delawareonline.com/story/news/education/2016/01/14/opt-out-vote/78785656/

American Institutes for Research. (2016). 2014–15 growth model for educator evaluation (Technical Rep.). Washington, DC: Author. Retrieved from https://www.air.org/search/site/2014%E2%80%9315%20growth%20model%20for%20educator%20evaluation%3A

Bennett

(2016). Opt out: An examination of issues (ETS Research Report Series). Princeton, NJ: Educational Testing Services. doi:10.1002/ets2.12101

Camera

(2015). States seek guidance in face of “opt out” push. Education Week, 34(26), 15-17. Retrieved from http://www.edweek.org/ew/articles/2015/04/01/states-seek-guidance-in-face-of-opt-out.html

Castellano

K. E.

A. D.

(2013). A practitioner’s guide to growth models (Research Rep.). Washington, DC: Council of Chief State School Officers. Retrieved from https://scholar.harvard.edu/files/andrewho/files/a_practitioners_guide_to_growth_models.pdf

Castellano

K. E.

A. D.

(2015). Practical differences among aggregate-level conditional status metrics: From median student growth percentiles to value-added models. Journal of Educational and Behavioral Statistics, 40(1), 35-68.

Clark

(2015, January 30). Assembly bill would allow students to opt out of testing without penalty, sponsor says. NJ.Com. Retrieved from http://www.nj.com/politics/index.ssf/2015/01/parcc_opt-out_bill_introduced_in_nj_assembly.html

Croft

Lee

(2016, June). Survey and analysis of state opt-out and required test participation legislation (ACT Working Paper No. 2016-03). Retrieved from http://www.act.org/content/dam/act/unsecured/documents/Working-Paper-2016-03-Survey-and-Analysis-of-State-Opt-Out-and-Required-Participation.pdf

Florida Department of Education. (2016). 2015-16 Annual Legislative Report on Teacher Evaluations. Retrieved from http://www.fldoe.org/core/fileparse.php/7503/urlt/1516AnnualLegisReportTeacherEval.pdf

10.

Guarino

Reckase

Stacy

Wooldridge

(2014, February). A comparison of growth percentile and value-added measures of teacher performance (Working Paper No. 39). Retrieved from http://education.msu.edu/epc/publications/documents/WP39AComparisonofGrowthPercentileandValue-AddedModel.pdf

11.

Hammond

(2015, June 9). Oregon risks losing $140 million for enabling kids to skip Common Core tests, feds warn. The Oregonian. Retrieved from http://www.oregonlive.com/education/index.ssf/2015/06/new_oregon_testing_law_could_j.html

12.

Institute of Education Sciences. (2007). National assessment of Title I final report: Summary of key findings (Report No. NCEE 2007-4014). Washington, DC: United States Department of Education.

13.

Katz

(2013). Information on student participation in state assessments (Government letter). Retrieved from www.p12.nysed.gov/assessment/ei/2013/student-participation.pdf

14.

Klein

(2015, December 22). Ed. dept. to states: Even under ESSA, you need a plan for high opt-out rates [Web log post]. Retrieved from http://blogs.edweek.org/edweek/campaign-k-12/2015/12/ed_dept_to_states_under_essa_need_plan_for_opt-Outs.html

15.

New York State Education Department. (2015). State education department releases spring 2015 grades 3-8 assessment results. [Press Release]. Retrieved from http://www.nysed.gov/news/2015/state-education-department-releases-spring-2015-grades-3-8-assessment-results#_ftn2

16.

New York State Education Department. (2017). NYS Grades 4-8 teacher growth scores: From MGP to HEDI Ratings and Scores 2016-17. Retrieved from http://www.nysed.gov/common/nysed/files/programs/state-growth-measures-toolkits/2016-17-classification-assignment-scores-teachers.pdf

17.

The National Center for Fair and Open Testing. (2015). More than 670,000 refused tests in 2015. Retrieved from http://www.fairtest.org/more-500000-refused-tests-2015

18.

No Child Left Behind (NCLB) Act of 2001, 20 USC § 6301 et seq.

19.

Pizmony-Levy

Green Saraisky

(2016). Who opts out and why? Results from a national survey on opting out of standardized tests. Retrieved from https://academiccommons.columbia.edu/doi/10.7916/D8K074GW

20.

Rice

Marland

Meyer

(2016, March). The impact of student assessment opt out on achievement and growth metrics in New York state. Paper presented at the Association for Educational Finance and Policy annual conference, Denver, CO.

21.

Ujifusa

(2015a, June 23). Testing opt-out bill signed by Oregon Gov. Kate Brown; Delaware next? [Web log post]. Education Week. Retrieved from http://blogs.edweek.org/edweek/state_edwatch/2015/06/testing_opt-out_bill_signed_by_oregon_gov_kate_brown_delaware_next.html

22.

Ujifusa

(2015b, December 23). Education department asks 13 states to address low test-participation rates. [Web log post]. Education Week. Retrieved from http://blogs.edweek.org/edweek/campaign-k-12/2015/12/twelve_states_asked_to_address.html

23.

U.S. Department of Education. (2013). State and local report cards. Washington, DC: Author. Retrieved from http://www2.ed.gov/programs/titleiparta/state_local_report_card_guidance_2-08-2013.pdf

24.

Walker

(2015). New Jersey senate passes PARCC opt-out resolution. Retrieved from https://www.heartland.org/news-opinion/news/new-jersey-senate-passes-parcc-opt-out-resolution