Abstract
Background:
To limit the influence of attrition bias in assessments of intervention effectiveness, several federal evidence reviews have established a standard for acceptable levels of sample attrition in randomized controlled trials. These evidence reviews include the What Works Clearinghouse (WWC), the Home Visiting Evidence of Effectiveness Review, and the Teen Pregnancy Prevention Evidence Review. We believe the WWC attrition standard may constitute the first use of model-based, empirically supported bounds on attrition bias in the context of a federally sponsored systematic evidence review. Meeting the WWC attrition standard (or one of the attrition standards based on the WWC standard) is now an important consideration for researchers conducting studies that could potentially be reviewed by the WWC (or other evidence reviews).
Objectives:
The purpose of this article is to explain the WWC attrition model, how that model is used to establish attrition bounds, and to assess the sensitivity of attrition bounds to key parameter values.
Research Design:
Results are based on equations derived in the article and values generated by applying those equations to a range of parameter values.
Results:
The authors find that the attrition boundaries are more sensitive to the maximum level of bias that an evidence review is willing to tolerate than to other parameters in the attrition model.
Conclusions:
The authors conclude that the most productive refinements to existing attrition standards may be with respect to the definition of “maximum tolerable bias.”
Keywords
Sample attrition has been identified as potentially the most serious flaw that can lead to biased impact estimates in randomized controlled trials (RCTs; Greenberg & Barnow, 2014). 1 In short, attrition undermines the greatest strength of the RCT—a treatment and control group that are equivalent in expectation with respect to all variables, both observed and unobserved. This is because the process by which individuals leave the study sample could be related to both treatment status and the outcomes of interest, leading to systematic differences between the individuals remaining in the treatment and control groups. Although attrition might be unrelated to outcomes and treatment status, it is rarely possible to confirm this, because the researcher typically does not control or understand the process by which individuals leave the study sample. Consequently, findings from RCTs that experience sample attrition should be regarded as less credible than findings from RCTs that experience no sample attrition. Missing outcomes data potentially leading to bias in general—not just in RCTs—is a well-known issue that has received considerable attention in the methodological literature (Little & Rubin, 2002; Puma, Olsen, Bell, & Price, 2009; Rubin, 1976).
In this article, we define sample attrition to mean outcome data that are missing for the randomized sample, regardless of the reason the data are missing. Some examples of the mechanisms by which outcome data could be missing include: (1) survey nonresponse, (2) lack of consent to data collection among the randomized sample, (3) sample units falling out of administrative data, and (4) endogenous exclusions of sample members by the researcher. However, loss of sample that occurs prior to randomization does not count as attrition.
Although survey nonresponse often does not appear to introduce bias in the context of descriptive studies (e.g., Groves, 2006; Groves & Peytcheva, 2008), we believe attrition is a greater concern in an RCT because the effects of attrition could differ between the treatment and control groups. The RCTs, by their very nature, involve an effort to differentially affect the outcomes of study participants (through the interventions to which subjects are randomly assigned), not just passively measure participants’ outcomes. This attempt to differentially affect the outcomes of study participants in the treatment and control groups could also differentially affect attrition (both the rates of attrition and the characteristics of those who attrite). For example, a study that uses a lottery to randomly assign students to attend a charter school or traditional public school could affect some parents’ decisions to stay in a school district or move to a new school district. Some of the parents whose children were not assigned to the charter school might move their children out of the school district (e.g., into a private school), which could result in greater attrition in the control group if the study is unable to collect outcome data for children who leave the school district. At follow-up, this endogenous attrition could result in a sample in the treatment group that is no longer comparable to the sample in the control group.
To limit the potential for attrition bias to influence assessments of intervention effectiveness, three federally sponsored systematic evidence reviews have established a standard for acceptable levels of sample attrition in RCTs based on the attrition model we describe in this article. These evidence reviews include the What Works Clearinghouse (WWC; http://ies.ed.gov/ncee/wwc/), the Home Visiting Evidence of Effectiveness Review (HomVEE; http://homvee.acf.hhs.gov/), and the Teen Pregnancy Prevention Evidence Review (PPRER; http://www.hhs.gov/ash/oah/oah-initiatives/teen_pregnancy/db/).
The WWC attrition standard was the progenitor of the other two standards. Initially, the WWC attrition standard consisted of arbitrary cutoffs with respect to the overall attrition rate and the difference in attrition rates between the treatment and control groups. This initial standard had no empirical basis and no theoretical basis beyond the basic understanding that less attrition is likely preferable to more attrition. However, starting with version 2.0 of the WWC Procedures and Standards Handbook (WWC, 2008), the WWC attrition standard transitioned from a set of arbitrary cutoffs to a range of cutoffs based on a mathematical model of attrition bias with the choice of key parameters informed by empirical data analysis.
Although others have developed approaches to bounding bias in a variety of contexts (e.g., Lee, 2009; Manski, 1990), we believe the WWC attrition standard might be the first use of model-based, empirically supported bounds on attrition bias in the context of a federally sponsored systematic evidence review. Meeting the WWC attrition standard (or one of the attrition standards based on the WWC standard) is now required in order to receive the highest evidence rating by the WWC (or other evidence reviews).
The purpose of this article is to explain the WWC attrition model, describe how evidence reviews use the model to establish attrition bounds, and assess the sensitivity of attrition bounds to key parameter values. This article is not a broad assessment of evidence review standards (e.g., it will not discuss how researchers might be incentivized to alter their practices to meet standards). We conclude with a discussion of opportunities to refine existing attrition standards or to tailor the attrition standard to other substantive areas. For readers interested in strategies for avoiding attrition in their own studies or methods for handling attrition at the analysis stage, we suggest consulting the literature on those topics (e.g., Brueton et al., 2014; Deke & Puma, 2013; Puma et al., 2009; Zweben, Fucito, & O’Malley, 2009).
Origins of the WWC Attrition Standard
Pre-2008, the WWC’s attrition standard consisted of fixed cutoffs that varied by topic area. Principal investigators for each topic area selected those cutoffs. For example, the elementary school mathematics topic area required that studies have an overall attrition rate no greater than 20% and a difference in attrition rates between the treatment and control groups no greater than 7% (WWC, 2007), whereas the dropout prevention topic area required that studies have an overall attrition rate no greater than 30% and a difference in attrition rates between the treatment and control groups no greater than 5% (WWC, 2006). The best justification for these cutoffs might have been their similarity to those that the Office of Management and Budget (OMB) and the National Center for Education Statistics (NCES) used for survey response rates.
The OMB had established a response rate target of 80% for federal data collection efforts. OMB guides federal agencies on collecting statistical information, including response rates. In a 2006 OMB memo, the administrator of the Office of Information and Regulatory Affairs states that surveys with an anticipated response rate less than 80% should have plans to evaluate nonresponse bias and a clear justification as to why the response rate is adequate. The OMB does not require plans for addressing nonresponse bias in cases where the response rate is greater than 80%.
The NCES’s (2002) Statistical Standards had established a response rate target of 85%. According to the standard, “any survey stage of data collection with a unit or item response rate less than 85 percent must be evaluated for the potential magnitude of nonresponse bias before the data or any analysis using the data may be released” (NCES Standard 4-4-1).
A key motivation for developing the current WWC attrition standard was that the OMB and NCES standards did not seem appropriate for controlling attrition bias at acceptable levels in the context of RCTs. The OMB and NCES standards on attrition appear to be informed by the response rates that previous surveys had achieved, not by analyzing what attrition bias might be under plausible assumptions regarding the relationship between the propensity to respond and outcomes of interest. (E.g., as justification for its standard, the OMB cites that two thirds of the federal data collection efforts considered in its 2006 memo achieved response rates of 80% or greater.) Also, the OMB and NCES standards do not acknowledge differential attrition between treatment and control groups. This is understandable, given the broad range of studies (most of which are not RCTs) to which these standards are meant to be applied.
Because these standards did not appear to have any theoretical or empirical support in the context of RCTs, in 2008, the WWC Statistical, Technical, and Analysis Team (STAT) developed the WWC attrition model and accompanying attrition standard. 2
The WWC Attrition Model
If attrition were unrelated to outcomes, then attrition would not lead to bias. When attrition is related to outcomes, different rates of attrition between the treatment and control groups can lead to biased impact estimates. Furthermore, when attrition is related to outcomes and that relationship differs between the treatment and control groups, then attrition can lead to bias even if the attrition rate is the same in both groups. The focus here is to specify a model showing how bias depends on the correlation between outcomes and attrition and the combination of overall and differential attrition in an RCT. This model is intentionally a simple, reduced-form statistical model, not a structural/behavioral model. The model is only intended to facilitate making useful distinctions between findings with respect to attrition bias, not to understand why attrition happens or how to prevent it.
To set up the model, consider a variable representing an individual’s latent (unobserved) propensity to respond, z. Assume z is normally distributed with a mean of 0 and standard deviation (SD) of 1. If the proportion of individuals who respond is P, 3 an individual is a respondent if his or her value of z exceeds a threshold, z*:
where Φ is the standard normal cumulative distribution function. For example, if 75% of individuals respond (P = 0.75), an individual is a respondent if his or her value of z exceeds the value corresponding to the 25th percentile in the z distribution (i.e., exceeds Φ−1[1 − 0.75]).
The outcome at follow-up, y, is the key variable of interest. We assume that y has a normal distribution. Moreover, we assume that y has a mean of 0 and an SD of 1, given that any variable can be standardized in this way. The relationship between y and z can then be modeled as
where α is the correlation between z and y, and u is a random variable that is independent of z. 4 Note that the model assumes no effect of the treatment on the mean outcome, there are no covariates, and the model does not account for imputed outcomes (see the Discussion section). If α is 1 or –1, the entire outcome is explained by the propensity to respond. If α is 0, none of the outcome is explained by the propensity to respond, which is the case when attrition is completely random.
The model is constructed so that the intervention has no impact on the mean outcomes of those randomized. However, the intervention can affect two things: (1) the correlation between outcomes and the latent propensity to respond, α, and (2) the response rate, P (equivalently, z*). Therefore, we specify Equations 1 and 2 separately for treatment and control group members (subscripted by t and c, respectively):
Because there is no true impact of the intervention on mean outcomes in this model, an unbiased estimator of the impact should, in expectation, find no difference in outcomes between the treatment and control groups. Therefore, in the presence of attrition, bias is equal to the difference between the expected values of yt and yc among respondents. Based on the properties of truncated normal distributions, the analytic formula for the bias, B, is
where φ is the standard normal density function. Note that if α t and α c are both 0, then bias is 0 regardless of the values of Pt and Pc . Similarly, if are Pt and Pc both 1, then bias is 0 regardless of the values of αt and αc .
Equation 5 shows that bias can be generated by treatment–control differences in the response rates (Pt and Pc ) or in the correlation between y and z (α t and α c ). If neither P nor α differs between these groups, then there is no bias because the same kinds of individuals respond from both groups. 5 One example of a scenario in which P and α might realistically be the same in the treatment and control groups is when attrition occurs only during a period when the study participants are blinded to their assigned condition and have not yet experienced their assigned intervention. For example, if students are randomly assigned at the end of the current school year to receive an intervention in the next school year, then study participants could be effectively blinded to their study condition over the summer so long as the results of randomization are only revealed at the beginning of the next school year. In that scenario, any changes in school enrollment that take place over the summer (before announcing the results of random assignment to parents and students) are unlikely to affect P or α.
However, if response rates differ between the treatment and control groups (Pt ≠ Pc ), then bias occurs even when α t = α c , because respondents in the treatment and control groups have different average values of z and, thus, different average values of y. Moreover, if α t ≠ α c , then impact estimates will be biased even if the response rate is the same in both groups; respondents from the two groups will have different average values of y stemming from the differences in α. It is possible that a difference in the rate of attrition between groups could offset a difference between α t and α c . However, in developing attrition bounds, we conservatively assume the opposite—that these differences are reinforcing, not offsetting.
Discussion
An important limitation of this model is that it does not account for covariates, although we recognize that studies that adjust for covariates correlated with both outcomes and attrition might generate more credible findings than those that do not. Evidence reviews could consider refining the attrition standard by favoring studies that adjust for covariates that are related to both outcomes and attrition (adjustment methods include regression adjustment, multiple imputations, or nonresponse weights). One approach could be to rate studies that adjust for covariates using attrition bounds based on more optimistic values of α t and α c . For example, if a study adjusts for a baseline measurement of the outcome it might be appropriate to rate the study using more optimistic values of α t and α c . In contexts (like education) where baseline measures of outcome variables are available and highly correlated with follow-up measures, adding this complexity to the attrition standard might be worthwhile.
A closely related issue is that the attrition model (and standard) also does not account for the possibility that missing values of an outcome variable could be imputed using nonmissing values of related outcome variables. For example, if a student is missing a follow-up math test score but is not missing a follow-up reading test score, then a researcher might use the reading test score to impute the missing math test score. Studies that perform such imputations to account for missing outcome data might generate more credible findings than studies that do not. In contexts where it is common to have multiple highly correlated outcome variables, and to have missing values for some outcomes but not others (i.e., missing items but not missing cases), adding this complexity to the attrition standard might be worthwhile. As with covariate adjustment, this could be accomplished by attrition bounds based on more optimistic values of α t and α c .
Using the Model to Develop Attrition Bounds
We can use the model specified in the previous section to calculate the expected bias for any combination of overall and differential attrition given estimates of α t and α c . With those estimates of bias, we can then construct bounds on acceptable rates of overall and differential attrition. In this section, we describe the process for estimating α t and α c and illustrate attrition bounds corresponding to particular values of α t and α c .
Estimating α t and α c
If we observed outcomes for both respondents and nonrespondents in the treatment and control groups, we could calculate α t and α c directly. In the following equations, Δ g denotes the difference in outcomes—in effect size units—between respondents and nonrespondents in group g (either the treatment [t] or control [c] group). It can be shown that
which implies that
Of course, we cannot observe outcomes for nonrespondents, so we cannot observe Δ g directly. Instead, we can estimate Δ g using a proxy variable for the outcomes of interest. A good proxy variable is available for all randomized units and is highly correlated with the outcome (the more highly correlated the proxy is with the outcome, the more likely that differences between treatment and control group respondents in the proxy variable are accurate reflections of differences in the outcome of interest). In the context of education studies examining impacts on academic test scores, a promising proxy variable is a baseline version of the test used at follow-up. The baseline test is a promising proxy variable because it is highly correlated with follow-up test scores (the correlation between pre- and posttests is often .70 or greater; Bloom, Richburg-Hayes, & Black, 2007; Schochet, 2008) and often available for all randomized units.
For the WWC, data from previously conducted RCTs informed the selection of parameter values. Using data from those studies, the WWC calculated the difference in average baseline test scores between respondents and nonrespondents within each group g,
An important limitation on using pretests as a proxy for posttests in calculating
Bounding attrition bias
Given estimates of α t and α c , we can calculate the expected bias for every combination of overall and differential attrition rates. We can then assess whether the expected bias for each combination of overall and differential attrition rates exceeds a maximum tolerable level (for the WWC, the level was set at 0.05 SD).
The WWC uses two different sets of assumptions regarding values α t and α c . The basis for these assumptions is described in the Appendix. The “optimistic” assumptions (α t = 0.27 and α c = 0.22) are intended for situations where the threat of attrition bias is believed to be relatively small. For example, the optimistic assumptions might be appropriate for studies of targeted curricular interventions that are less likely to affect decisions about whether to leave or enter schools during the study period. The “pessimistic” assumptions (α t = 0.45 and α c = 0.39) are intended for studies where the threat of attrition bias is believed to be relatively large. For example, the pessimistic assumptions might be appropriate for studies of more intensive or controversial interventions that could have larger effects on the types of students who leave or enter schools during the study period (i.e., the interventions are more likely to lead to larger differences between α t and α c ). The WWC principal investigators choose between either optimistic or pessimistic assumptions for studies reviewed in their topic area (most choose optimistic assumptions). The HomVEE and PPRER evidence reviews only use one set of assumptions. Both of those evidence reviews adopted the pessimistic assumptions from the WWC.
Figure 1 illustrates the combinations of overall and differential attrition that yield expected bias below the maximum tolerable amount for two different sets of assumptions for α t and α c . The leftmost region shows combinations of overall and differential attrition that yield attrition bias less than 0.05 under the WWC’s pessimistic assumptions (α t = 0.45 and α c = 0.39), the middle region shows additional combinations that yield attrition bias less than 0.05 under the WWC’s optimistic assumptions (α t = 0.27 and α c = 0.22), and the rightmost region shows combinations that yield bias greater than 0.05 even under optimistic assumptions.

What Works Clearinghouse attrition bounds.
Sensitivity of Attrition Bounds to Key Parameter Values
The attrition bounds depend on two key parameters—the correlation between the latent propensity to respond and outcomes (α t and α c ) and the maximum tolerable bias. In this section,we examine the sensitivity of attrition bounds to these parameters.
Sensitivity to values of α t and α c
The sensitivity of WWC attrition bounds to the use of pessimistic (α t = 0.45 and α c = 0.39) or optimistic (α t = 0.27 and α c = 0.22) assumptions is evident in Figure 1. The highest acceptable overall attrition rate under optimistic assumptions is about 10 percentage points higher than under pessimistic assumptions, and the highest acceptable differential attrition rate under optimistic assumptions is about 4 percentage points higher than under pessimistic assumptions.
To more systematically investigate sensitivity, we have created two additional figures. In Figure 2, we hold the average of α t and α c constant at 0.42 (the same as the WWC conservative parameter values) but vary the difference between α t and α c . Specifically, the boundary between the leftmost region and the middle-left region corresponds to a difference of 0.12 (twice the difference for the WWC conservative parameter values), the boundary between the middle-left and middle-right regions corresponds to a difference of 0.06 (the same as the WWC conservative parameter values), and the boundary between the middle-right and rightmost regions corresponds to a difference of 0.03 (half the difference for the WWC conservative parameter values). When there is little difference between the treatment and control groups in the correlation between outcomes and the latent propensity to attrite, we can accept very high levels of overall attrition (higher than 80%). Conversely, when there is a larger difference between the treatment and control groups in this correlation, we can only accept much lower levels of overall attrition (less than 30%).

Sensitivity of attrition bounds to varying the difference between α t and α c .
In Figure 3, we hold the difference between α t and α c constant at 0.06 (the same as the WWC conservative parameter values) but vary the average. Specifically, the boundary between the bottom and middle-bottom regions corresponds to an average of 0.57, the boundary between the middle-bottom and middle-top regions corresponds to an average of 0.42 (the same as the WWC conservative parameter values), and the boundary between the middle-top and top regions corresponds to an average of 0.27. 6 When the overall correlation between outcomes and the propensity to attrite is low, we can accept a higher differential rate of attrition (up to about 8 percentage points). Conversely, when the overall correlation between outcomes and the propensity to attrite is high, we can only accept a lower differential rate of attrition (up to about 4 percentage points).

Sensitivity of attrition bounds to varying the average of α t and α c .
Discussion
Varying the level of α t and α c affects the acceptable differential rate of attrition, and varying the difference between α t and α c affects the acceptable overall rate of attrition. That is, when stayers are very different from leavers, we want the difference in attrition rates between the treatment and control groups to be small. When leavers in the treatment group are very different from leavers in the control group (a case where the intervention affects the type of individuals who leave the study sample), we want the overall rate of attrition to be small.
Our analysis for the WWC of data from education RCTs suggests that although students who left those study samples were different from students who remained in the study sample, the interventions did not appear to have a very big effect on which students left the study samples. 7 Our findings were consistent—low-achieving students changing schools more often than high-achieving students—but for reasons (e.g., perhaps parents’ employment) that are unrelated to what is happening in school. This is why most topic areas in the WWC use the optimistic parameter assumptions that allow overall attrition rates that may seem high and differential rates that may seem low. In cases where we think that interventions really do affect sample attrition (e.g., interventions to reduce dropping out of school), then lower levels of overall attrition are needed.
Sensitivity to Maximum Tolerable Bias
Given values of α t and α c , the attrition bounds are set to keep attrition bias from exceeding a maximum tolerable level. The WWC set the maximum tolerable bias to 0.05 SDs because that magnitude was deemed small relative to the WWC definition of a substantively important impact (0.25 SDs, see WWC 2008). 8 This means that in Figure 1, combinations of overall and differential attrition at the boundary between the leftmost and middle regions yield attrition bias of 0.05 SDs under conservative parameter assumptions, whereas points along the boundary between the middle and rightmost regions yield attrition bias of 0.05 SDs under optimistic assumptions.
In Figure 4, we illustrate how the attrition bounds would change if we maintain the WWC conservative parameter assumptions but change the maximum tolerable bias. The boundary between the leftmost and middle-left regions corresponds to a maximum tolerable bias of 0.025 SDs, the boundary between the middle-left and middle regions corresponds to a maximum tolerable bias of 0.05 SDs (the conservative WWC boundary), the boundary between the middle and middle-right regions corresponds to a maximum tolerable bias of 0.075 SDs, and the boundary between the middle-right and rightmost regions corresponds to a maximum tolerable bias of 0.10 SDs.

Sensitivity of attrition bounds to maximum tolerable bias.
Discussion
Attrition boundaries are highly sensitive to maximum tolerable bias. For example, if the maximum tolerable bias is 0.025 SDs, then the highest overall attrition rate allowed would be slight greater than 20%, but if the maximum tolerable bias is 0.10 SDs, then the highest overall attrition rate would be greater than 80% (all under the WWC’s conservative parameter assumptions).
An important implication of this sensitivity is that attrition boundaries would ideally be much stricter in contexts where small impacts are deemed to be substantively important. In developing the WWC attrition standard, we defined the maximum tolerable bias to be 0.05 SDs because we regarded that as small relative to the previously established WWC convention that impacts greater than or equal to 0.25 SDs are substantively important. But if a smaller impact—for example, 0.10 SDs—is deemed substantively important, then the maximum tolerable bias should be lower.
At least in education, smaller impacts are increasingly regarded as substantively important. Although the WWC definition of substantively important is consistent with Cohen’s (1988) classification of impacts in the range of 0.20 to 0.25 as small, prominent education researchers have more recently argued that Cohen’s benchmarks are often not an appropriate basis for designing evaluations in education and that smaller impacts can be important to detect. Hill, Bloom, Black, and Lipsey (2008), Lipsey et al. (2012), and Schochet (2008) suggested multiple criteria for assessing what a “meaningful” impact would be for a given intervention in a given context. These criteria often lead to the conclusion that impacts smaller than 0.20 SDs can be meaningful. Similarly, Kane (2015) argued that impacts as small as 0.08 SDs can be meaningful.
Conclusion
The WWC attrition standards are a data-informed approach to incorporating information about attrition and possible attrition bias into systematic reviews. The standards rely on an explicit model and a set of parameter values that are partially informed by data. We believe that these standards (and the underlying model) represent a positive step forward in how systematic reviews incorporate information about attrition into study ratings. However, there are many opportunities to refine and extend these standards—they are certainly not perfect. This article has closely examined two such opportunities involving the values used by evidence reviews for key model parameters—(1) the correlation between outcomes and the latent propensity to attrite and (2) the maximum level of attrition bias that can be tolerated.
Before considering opportunities to refine the attrition standard or use it in new settings, it is useful to keep in mind what we can and cannot do with the attrition standard:
We can make useful relative distinctions among groups of studies. All else being equal, impact estimates from studies meeting the attrition standard are likely to be less influenced by attrition bias than estimates from similar studies that do not. This is because studies with higher attrition are more likely to yield biased impact estimates. Also, the attrition standard enables us to systematically manage the trade-off between overall and differential attrition. Finally, this approach enables us to reduce the influence of bias in impact estimates in a way that is systematic and partially informed by data, which we believe is preferable to the more arbitrary cutoffs that were used prior to the development of this standard.
We cannot make absolute claims about bias in any given study. Although we select attrition boundaries with a goal of keeping attrition bias below a maximum tolerable level, we cannot claim that we are able to achieve that goal. This is primarily because (1) our empirical proxies of bias from past studies could be wrong (because pretests are an imperfect proxy for posttests), (2) our empirical proxies of bias from past studies might not apply to future studies (because, e.g., future studies might involve different populations of study participants), and (3) the attrition standard uses a binary categorization (studies can only meet or not meet the standard) that takes little study-specific information into account other than attrition rates (e.g., the standard does not account for covariate adjustment). We caution readers that the attrition standard should not be interpreted as providing a credible estimate of attrition bias for any given study.
We cannot fully predict the incentives these standards might create for researchers. In our experience, the development of evidence standards has focused primarily on how to systematically make useful distinctions between existing studies with respect to the credibility of findings. How standards might create incentives for researchers to change their practices in future studies has generally been a secondary consideration. However, as evidence reviews grow in importance, researchers might increasingly adapt their practices to meet evidence standards. Seftor (2016) reports that the number and proportion of studies meeting WWC standards have increased over the time, which may be indicative of the incentive effects of standards. In many cases, this could be positive—for example, the existence of evidence standards might encourage researchers to conduct more RCTs (since RCTs have the potential to attain higher evidence ratings than nonexperimental studies). In other cases, the implications are less clear. For example, a research team might find itself in a situation where the attrition rate in the control group is much higher than in the treatment group. To attain an acceptable differential attrition rate between the treatment and control groups, the team might divert resources from increasing the response rate in the treatment group to increasing the response rate in the control group. Is this a positive or negative response to the attrition standard? The answer depends on the underlying reasons for attrition and how efforts to increase response rates differ by experimental condition. If it is generally more difficult to obtain outcome data in the control group (e.g., because members of the control group are not in regular contact with program implementation staff), then it might be necessary and appropriate to expend greater effort obtaining outcome data in the control group. In that case, it might be a positive outcome if the attrition standard encourages a research team to shift resources toward data collection in the control group. However, this might not be a positive incentive in every study. For example, if the research team not only expends greater effort to boost response rates in the control group but also uses substantially different tactics (e.g., if they use monetary incentives in the control group but not the treatment group), then this effort could lead to systematic differences in the characteristics of respondents in the treatment and control groups. We urge researchers to think more broadly than evidence review standards when conducting studies—the quality of a study depends on more than those factors that an evidence review can assess.
With those considerations in mind, we see two main opportunities to refine existing standards or to adopt the attrition model to create attrition standards in new areas.
Increase empirical support for assumptions regarding the correlation between outcomes and the latent propensity to attrite. The empirical support for the WWC attrition standard is based on an analysis of differences in pretest scores of students with and without posttest scores from past experimental studies of curricular interventions targeting test score outcomes for disadvantages student populations. To tailor the attrition standard to different types of outcomes, interventions, or populations, we would ideally have estimates of how baseline outcome measures relate to sample attrition at follow-up in past experimental studies conducted in those other contexts. We encourage evaluators to report this kind of information in their studies. We also encourage evaluators to report the correlation between baseline covariates and outcomes—this information could help inform a refinement of the attrition standard that would give studies credit for covariate adjustment. However, we note that it is not possible to report this kind of information in contexts where baseline outcome measures cannot exist. For example, in a study where the outcome is delay of first pregnancy for teenagers, there is no baseline outcome measure analogous to an academic pretest. In those contexts, it might be possible to identify a range of predictors that cumulatively explain as much variation in the outcome as a pretest explains in a posttest. We also encourage researchers to explore other strategies to help the field better understand the relationship between outcomes and attrition in these contexts. For example, in some contexts missing responses to survey questions can be compared to administrative records (e.g., Groves, 2006; Groves & Peytcheva, 2008). Also, in the case of studies that use surveys (or other forms of primary data collection) to measure outcomes, researchers might compare the outcomes of subjects who respond quickly to data collection efforts to the outcomes of subjects who only respond after multiple follow-up attempts. The difference between these two groups could shed light on the correlation between outcomes and the latent propensity to respond to data collection efforts.
Vary attrition bounds based on varying values of the maximum tolerable bias. We find that attrition bounds are highly sensitive to how we define maximum tolerable bias. In contexts where impacts smaller than 0.25 SDs are deemed substantively important, the existing attrition bounds are likely to be too lenient—possibly by a wide margin. For example, Kane (2015) provided an example of a context in which an impact as small as 0.08 SDs is substantively important. In that scenario, it would be sensible to define maximum tolerable bias as something much smaller than 0.05 SDs, particularly if studies have the statistical power to detect impacts of just 0.08 SDs. One approach evidence reviews could take is to frame maximum tolerable bias in terms of controlling the type I error rate, which is the approach Stock and Yogo (2005) took in testing for weak instrumental variables. A consequence of adopting that approach is that the attrition standard would be stricter for larger studies, which might not always be appropriate. Another approach would be continuing to use a definition of substantively important that is independent of sample size but that varies with respect to combinations of interventions, outcomes, and the populations from which study samples are selected. Lipsey et al. (2012) provided guidance for assessing what constitutes a meaningful impact in different contexts.
Footnotes
Appendix
In this Appendix, we provide additional background on how we selected values for the parameters α t and α c . Here we summarize text from a What Works Clearinghouse (WWC, 2013, 2014) white paper on the attrition standard and an addendum to that white paper. Those documents describe the attrition model and how the WWC Statistical, Technical, and Analysis Team (STAT) selected parameter values to create the attrition standard. The content of this Appendix is drawn entirely from those documents—it is not an original contribution of this article. Please see those documents for more detail.
As a first step in selecting parameter values, WWC STAT sought to determine what values would be consistent with the previous attrition standard. Before the attrition model was developed, the WWC used fixed cutoffs for acceptable overall and differential attrition rates. As described in the first section, those cutoffs were informed by a combination of the judgment of WWC topic area principal investigators (PIs) and survey response rate requirements from the Office of Management and Budget (OMB) and National Center for Education Statistics (NCES). A combination of overall and differential attrition rate cutoffs that PIs commonly used at the time was 20% overall and 7% differential rate. The WWC STAT examined a range of parameter values and found several candidates that were roughly consistent with the previous standards. Those values included (1) α t = 0.27 and α c = 0.22 and (2) α t = 0.45 and α c = 0.39 (see the original documents for the full range of values examined). The former ultimately became the basis for the “optimistic” attrition standard (it yielded a bias of 0.04 SDs for an overall attrition rate of 20% and a differential rate of 7%), and the latter ultimately became the basis for the “pessimistic” attrition standard (it yielded a bias of 0.06 SDs for an overall attrition rate of 20% and a differential rate of 7%).
The WWC STAT also wanted to assess whether candidate parameter values were consistent with data from previously conducted randomized controlled trials (RCTs) in education funded by the Institute for Education Sciences. The previously conducted RCTs examined were: Evaluation of 21st Century Community Learning Centers (James-Burdumy et al., 2005); Evaluation of Education Technologies in Reading and Mathematics (Dynarski et al., 2007); Evaluation of Supplemental Reading Comprehension Interventions (James-Burdumy et al., 2010); and Evaluation of Teachers Trained Through Different Routes to Certification (Constantine et al., 2009).
Using baseline test scores and attrition data from these studies, we calculated estimates of α t and α c using Equations 6 and 7. Values of α t ranged from 0.01 to 0.28, and values of α c ranged from –0.07 to 0.26. The difference between α t and α c ranged from 0.01 to 0.17.
Because the parameter values α t = 0.27 and α c = 0.22 were both consistent with the previous WWC attrition standard (which, in turn, was informed by OMB and NCES standards as well as PIs’ judgment) and because those values are within the empirical range estimated from previous RCTs, WWC STAT chose these values as the optimistic parameter values for the attrition standard. The values α t = 0.45 and α c = 0.39 were chosen as the pessimistic standard because they yield a higher bias than the optimistic values for the same overall and differential attrition rates of 20% and 7% (0.06 SDs instead of 0.04 SDs).
Authors’ Note
John Deke led the development of the WWC attrition model and standard. Hanley Chiang substantially improved the mathematical representation of the model.
Acknowledgment
We would like to acknowledge the members of the WWC STAT (listed alphabetically: Jill Constantine, Mark Dynarski, Adam Gamoran, J. R. Lockwood, Chris Lonigan, Dan McCaffrey, Mike Puma, and Neil Seftor) who provided valuable feedback and advice throughout development of the standard.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
