Abstract
This article reports a systematic review and meta-analysis of studies that use regression discontinuity to examine the effects of placement into developmental education. Results suggest that placement into developmental education is associated with effects that are negative, statistically significant, and substantively large for three outcomes: (a) the probability of passing the college-level course in which remediation was needed, (b) college credits earned, and (c) attainment. Several sensitivity analyses suggest these results are not a function of particular stylized studies or the choices made in assembling the meta-analytic database. Two exploratory moderator analyses suggest that the negative effects of placement into developmental education are stronger for university students than for community college students and worse for students placed in reading or writing than in math. This work can inform debate and research on postsecondary policies and on alternative mechanisms for ensuring that college students have the skills needed to meet their goals.
Almost two in five beginning college students are placed in developmental education (Snyder, de Brey, & Dillow, 2016). Broadly speaking, the term “developmental education” connotes a set of policies and practices designed for students who are underprepared to do college-level work in a given area. The goal of this experience is to give students the knowledge, skills, and habits that will help them be successful in the college-level version of the course (Bailey et al., 2016). The growing use of developmental education reflects an increasingly normative transition from high school to college, which while predicated on completion of secondary schooling, does not necessarily imply adequate preparation for what is deemed “postsecondary” work.
The specific mechanisms for deciding which students should be placed into developmental education vary. These policies are sometimes set at the state level (as is true in Florida), sometimes set at the system level (as in the California State University system), and sometimes set by individual institutions. Community colleges and other open access institutions generally require all students to take placement exams. Institutions that require an entrance exam like the Scholastic Aptitude Test or American College Testing (ACT) often use a tiered system. For example, in Tennessee (Boatman & Long, 2010), students scoring below 26 on the ACT’s math subtest (approximately two thirds of all test takers score below this threshold) are required to take the COMPASS Algebra test, a placement exam developed by ACT. Students scoring 50 and above are placed into college algebra, while students scoring below 50 are placed into intermediate algebra. Depending on the specific policy in place, students may or may not be able to retake the placement exam.
Nationally, about 60% of students taking a placement exam are recommended for placement into developmental education, but not all students recommended for placement actually end up in the courses (see Bailey, 2009). According to the National Center for Educational Statistics (Snyder et al., 2016), rates of developmental course taking are somewhat higher in community colleges (about 42%) than in public and private doctoral degree–granting institutions (about 25% and 22%, respectively), but even in these latter institution types, developmental course taking is common. Math is the most common subject in which remediation is needed, with participation rates (about 15%) that are two to two and a half times the participation rates in English, reading, and writing (which range from 6% to 7%).
Placement into developmental education adds costs and, critically, time to a student’s journey to a degree or certificate. With respect to student costs, Barry and Dannenberg (2016) estimate that each developmental course costs students $3,000 and adds $1,000 in student loan debt (and this analysis did not include the opportunity costs that students experienced). In addition, states are increasingly concerned about “paying twice” for courses taken both in high school and in college. Nationwide, Breneman and Haarlow (1998) estimate cost of development education to be $1 billion at public postsecondary institutions in 1996 dollars, while Pretlow and Wathington (2012), using similar methodologies, arrived at about $1.13 billion (again in 1996 dollars) for the 2004–2005 year. More recently, Barry and Dannenberg (2016) put the estimate at $1.5 billion (in 2011 dollars, or about $1.05 billion in 1996 dollars).
The research reported in this article examines the predictors of success at the postsecondary level (e.g., Credé, Roch, & Kieszczynka, 2010). Specifically, we are interested in understanding the effects of utilizing developmental education when it comes to college academic outcomes. Given the high personal and societal costs of developmental education, the effectiveness of developmental education has become an important public policy question that has spurred both research and reform efforts (e.g., Complete College America, 2012). Most simple comparisons of students assigned to developmental education relative to those not assigned suggest that assignment to developmental education is associated with several negative outcomes, not least of which is a much lower likelihood of postsecondary attainment (i.e., graduation or certification). For example, using data from the National Educational Longitudinal Survey from the 1992 high school class, Attewell, Lavin, Domina, and Levey (2006) found that for students attending 2-year colleges, graduation rates were about 30% higher for students who did not enroll in at least one developmental education course than students who did (36% vs. 28%). For students attending 4-year institutions, the picture is even bleaker, with students enrolling in at least one developmental course graduating at a much lower rate (52%) than students not enrolling in developmental courses (77%). But it is far from clear whether these lower completion rates are caused by developmental education. There have been no formal, high-quality systematic reviews on the effects of student placement into developmental education. In part due to the difficulty of studying these effects, existing reviews have generated conflicting conclusions and have been contentious (e.g., see Bailey, Jaggars, & Scott-Clayton, 2013; Goudas & Boylan, 2012). This article offers a state-of-the-art systematic review and meta-analysis on the effects of placement into developmental education. We examine the effects of placement on four indicators of college attainment, including credit accumulation and degree or certificate completion.
Studying the Effects of Placement Into Developmental Education
Some of the observed differences in outcomes between students placed into developmental education in at least one subject and students not placed into developmental education are real in the sense that they reflect different levels of academic opportunities, preparation, and motivation. However, the raw statistics do little to untangle the causal effects of being placed into developmental education. There are two aspects to this problem. One is the distinction between enrollment and assignment. Attewell et al.’s (2006) data point to the negative association between enrollment in developmental education and attainment, but some students assigned to developmental education never take a developmental education course, either because they somehow avoid the placement decision and go directly into the college-level course, or because they take assignment to developmental education as a signal that they are unlikely to succeed in college and drop out (see Bailey, Jeong, & Cho, 2010; Scott-Clayton & Rodriguez, 2015); if true, this suggests that Attewell et al.’s (2006) analysis understates the negative impact of assignment to developmental education.
The second part of the problem is untangling the causal relationships (Goldrick-Rab, 2010). To test the effect of assignment to developmental education, researchers could identify a group of students for whom an institution’s policy suggests developmental education is needed and randomly recommend students for placement into either the developmental course or into the college-level course in the subject in which remediation is needed. For example, Aiken, West, Schwalm, Carroll, and Hsiung (1998) randomly assigned students to either placement into Freshman Composition or into a developmental writing course followed by Freshman Composition. However, their analyses were conditional on either passing the first assigned course (either developmental writing or Freshman Composition, depending on assignment) or on passing Freshman Composition, depending on the specific analysis 1 (see also Moss, Yeaton, & Lloyd, 2014). Much more common are nonrandomized experiments that adopt a similar approach of conditioning, in one way or another, on success in the developmental course. A randomized experiment that followed students regardless of whether they actually enrolled in the developmental course (or even enrolled in college) and assessed an outcome that is not dependent on course participation (e.g., whether or not students ultimately passed the college-level course in which remediation was needed) would provide a fair test of whether placement into developmental education is helpful to students. Given the scarcity of randomized trials in this area, it seems likely that institutions are reluctant to randomly assign students to developmental education or not. But because students are typically assigned to developmental education on the basis of a test score, regression discontinuity is a viable option for studying the effects of assignment to developmental education.
The Regression Discontinuity Design
The basic requirement of a regression discontinuity (RD) design study is that assignment to conditions is done using a score on a continuous variable. An example is the Tennessee process described by Boatman and Long (2010) above. Students are assigned to college algebra if they score 50 and above on the COMPASS Algebra placement test, and to developmental algebra if they score below 50. Thus, in an RD study, (a) groups are formed by design and (b) the assignment mechanism is completely known if (c) the cut score is adhered to (all of these features are shared with randomized experiments). The fact that the assignment mechanism is known allows for unbiased inferences if the assumptions of RD are met and if the data are analyzed properly.
Relative to randomized experiments, RD studies have lower statistical power (Schochet, 2009) and are dependent on more assumptions, some of which are untestable (Valentine & Thompson, 2013). Despite these drawbacks, RD is growing in popularity as researchers become more familiar with its strengths and the conditions under which it is particularly useful. Google Scholar (as of February 26, 2017) lists about 30 hits for “regression discontinuity” in 1995, about 250 in 2005, 1,320 in 2010, and 3,920 in 2016. Good primers on RD are available (e.g., Jacob, Zhu, Somers, & Bloom, 2012; Shadish, Cook, & Campbell, 2002; What Works Clearinghouse [WWC], 2015), but the basic logic underlying RD is easy to visualize. When the placement test score is presented on the x-axis of a graph and the dependent variable on the y-axis, a treatment effect can be seen as a visual break (or discontinuity) at the cut point between the scores of students who do and do not receive the intervention. Therefore, RD is similar to an interrupted time series approach, except that assignment is based on a score instead of time.
All studies should be evaluated for the rigor with which they were designed and analyzed (Valentine & Cooper, 2008), and this statement is especially true for RD studies. While still a developing field of study, the WWC’s (2015) RD standards provide a good example of how a quality assessment of an RD study might be carried out. The WWC articulates five quality markers for RD studies. These are (a) the variable used to create groups cannot be manipulated, (b) data loss due to attrition should be minimal, (c) there must be no evidence of a discontinuity anywhere other than at the cutoff variable, (d) the functional form of the relationship between the variable used to create groups and the outcome is properly specified (i.e., if the relationship is quadratic it should be modeled as such), and (e) the analyses are constrained to a proper “bandwidth” around the cutoff variable.
Systematic Review and Meta-Analysis
Systematic reviewing and meta-analysis are now the standard set of tools that researchers use to investigate the effectiveness of policies, procedures, and practices when multiple studies pertaining to the specific research question exist. As described below, we found 11 studies using RD that examine the effects of placement into developmental education. Though meta-analysis of RD studies is rare, it is not unknown (Deke, Dragoset, Bogan, & Gill, 2012; Quinn, Lynch, & Kim, 2014), and we anticipate that it will become more common in the future. In the Method section, we discuss some of the important considerations needed to support a meta-analysis of RD studies, using the set of RD studies we located as examples. We begin with a description of how we located, assessed for inclusion, and coded the studies in our analyses. After discussing meta-analysis in the context of RD studies, we present our findings on the effects of placement into developmental education on four outcomes: (a) college-level credits earned, (b) whether or not students eventually passed the college-level course in which remediation was needed, (c) student grades in the course in which remediation was needed, and (d) whether students earned a degree or certification. As will be seen, the data mostly suggest statistically significant and potentially important negative impacts on these outcomes. We conclude with suggestions about how placement into developmental education might be improved, and a discussion of the cautions and limitations that go along with our work.
Method
Literature Search
This review is part of a larger project examining interventions for developmental education students. Included studies used RD to examine the effects of placement into developmental education. We did not set inclusion or exclusion criteria around other parameters (e.g., outcomes measured) and searched for both published and unpublished studies. The electronic literature search was initially conducted in ProQuest (ProQuest Education journals and ProQuest dissertations), EBSCO (Education Research Complete and Academic Search Premier), ERIC, Wilson Education Full Text, the Social Science Citation Index, and PsycInfo, from 1993 to March 2013. Search terms were divided into three groups: (a) terms that identified the document as a study involving developmental education (developmental OR noncredit OR basic skills OR compensatory OR under achievement OR underachiev* OR remedia*); (b) terms that identified the context as postsecondary education (e.g., universit* OR “institution of higher learning” OR “community college” OR “technical college” OR “junior college” OR “institutions of higher learning” OR “community colleges” OR “technical colleges” OR “junior colleges” OR “liberal arts” OR “Historically Black colleges and universities” OR “Hispanic Serving Institutions”); and (c) a term that identified the document as a study that used RD (discontin*). Documents with at least one search term from each these categories were screened for relevance by at least two trained individuals who worked independently. Disagreements were resolved by a third screener. We included only studies that examined the effects of placement into developmental education relative to placement directly into the college-level course (and not, e.g., studies that examined the effects of placement into different levels of developmental education; see Melguizo, Bos, Ngo, Mills, & Prather, 2016).
We also conducted ancillary searches to find studies of the effects of placement in developmental education. First, because the Journal of Higher Education does not publish abstracts, we hand searched that journal from 1993 forward. In addition, we conducted Google Scholar searches for relevant studies, and forward citation searches on the researchers who authored relevant papers. The last literature searches were run in November 2015.
Coding
Two reviewers working independently coded studies identified as potentially relevant. We coded characteristics related to study context, the developmental education placement process, the sample, and the study’s outcomes. These characteristics included institution type (community college, university), the number of institutions in the study, whether the study was published, the process used to place students into developmental education (e.g., the specific placement test used and the cutoff for placement), and information about the students in the sample (e.g., whether the study included only first-time, full-time students).
Analytic Model and Analysis Issues
We used standard meta-analytic techniques to synthesize the results of eligible studies. These techniques included inverse variance weighting, which allocates proportionally more weight to larger studies. Many studies presented the results of multiple models (e.g., models with more or fewer covariates). Rather than adopting a robust variance estimation approach, we chose models in a deliberate attempt to maximize the conceptual similarity of the studies in the analysis. Therefore, when we had a choice, we always selected the model with (a) the largest number of control variables in it, (b) the narrowest bandwidth, and (c) results that were as close to 3 years from the time of assignment as possible (except for attainment, for which we selected the longest follow-up point).
Researchers undertaking a meta-analysis need to consider whether to employ a fixed-effect or a random-effects analytic model. Using the fixed-effect model, study effect sizes can be thought of as estimating a single population value, and therefore any differences in effect sizes across studies are treated as solely due to random sampling and identifiable covariates. Using the random-effects model, reviewers assume that studies do not in fact share a single population value but instead come from a distribution of effect sizes. Therefore, any differences in effect sizes across studies are due to random sampling error, any identifiable covariates, and other random factors that cannot be identified.
The choice between fixed-effect and random-effects models can be an important one, because the confidence intervals arising from a random-effects analysis will never be smaller and are often larger than their fixed-effect counterparts; this has implications for both the statistical significance tests and interpreting the likely range of population effects. Often, the random-effects model is thought to be the most defensible choice, in part due to its somewhat better generalization properties (Hedges & Vevea, 1998). However, one issue with the random-effects model is that if the number of studies is small, the estimate of the between-studies variance component (i.e., the extent to which population effect sizes differ from one another) is both highly uncertain and highly unstable. That is, the between-studies variance component is estimated with a great deal of error, and it can be very sensitive to the inclusion of new information (e.g., a new study in an updated review). Due to these considerations, we report both the fixed-effect and the random-effects models in this review. In addition, we report several sensitivity analyses as robustness checks.
Finally, we should note that three studies examined the effects of placement into developmental education in multiple subjects (Boatman & Long, 2010; Calcagno & Long, 2008; Scott-Clayton & Rodriguez, 2015). Within each study, we treat these effects as independent. However, it is possible that some students could have been placed into developmental education in multiple subjects, and therefore be in our analyses more than once. For example, a student in Calcagno and Long’s (2008) study could have been placed into developmental math and developmental reading and might have appeared in both of their bandwidth-constrained analyses; this would violate the statistical assumption of independence. We do not know the extent to which this combination of events happened. However, in Boatman and Long (2010), 17% of students were recommended for placement into developmental education in two subjects, and 5% were recommended for placement into three subjects. Therefore, in that study, the maximum overlap is 22%, but the overlap within the optimal bandwidth (i.e., students who scored between 47 and 52 on the math placement test and between 65 and 70 on the reading placement test) is likely much smaller (though probably not zero).
Meta-Analyzing RD Studies: RD Bandwidth
Randomized experiments can be thought of as estimating an average treatment effect. That is, in a simple randomized experiment the comparison of interest is the mean of one group relative to a mean of another group. RD studies are often thought of as estimating a local average treatment effect, with “local” defined as the group of participants who are relatively close to the cutoff. In the context of studying the effect of placement into developmental education, RD can be thought of as comparing students at the margin of college readiness, some of whom were assigned to developmental education and some of whom were assigned to college-level courses. Statistical procedures can be used to determine the optimal bandwidth within which treatment effects should be estimated (Imbens & Kalyanaraman, 2012). Some RD researchers use the entire sample instead of a bandwidth sample. This is probably reasonable only under extreme circumstances (e.g., the treatment has a constant effect on participants regardless of how far they are from the cutoff). Researchers interested in using RD studies in meta-analysis should code for the presence or absence of bandwidths in the studies in their meta-analytic data set and if so, whether a statistical procedure to determine the optimal bandwidth was used. Furthermore, if possible researchers should statistically test whether effect sizes vary as a function of bandwidth (e.g., by conducting within study comparisons of the effects observed for wide and narrow bandwidths).
Meta-Analyzing RD Studies: Cutpoints
As noted, all RD studies use one or more cutpoints to assign students to conditions. This means that, in many real-world applications that feature some degree of local control over the assignment process, a somewhat different sample is being used across the studies in the meta-analysis. This problem is analogous to a meta-analysis of randomized trials in which some of the trials are targeted at very low-achieving students and others target students who are average achieving. Assuming that treatment effects are not homogeneous, this between-study variability associated with the cutpoints should be taken into account, and one way to do that is by incorporating the between sample variability via a random-effects approach, which we have done here. In some applications of meta-analysis with RD designs, it may be possible to control for the cutpoints used in each study (though this that was not possible here as many different placement tests were used, and we do not have enough information about either the tests or the samples to equate the test scores across studies).
Meta-Analyzing RD Studies: Adherence and Selective Sorting
In RD studies, some individuals may not comply with their condition assignments, especially if one condition is generally seen as more desirable than the other. This issue is similar to crossovers in a randomized experiment. And as in randomized experiment, crossovers could result in a bias in the estimate of the effect of being placed into developmental education. In examining the effects of placement into developmental education, college student counselors can sometimes override course placement recommendations, but probably the biggest threat to adherence is the student sorting associated with retesting. That is, institutions vary in the extent to which they allow students to retake the placement test. If students are allowed to retake the test, this has the potential to create a selection issue and carries with it the potential for bias. The main issue is that if retesting is allowed, students scoring in the developmental range are much more likely to take the test again than are students who score in the college range (for whom the probability of retaking the test is essentially zero). Thus, retesting means that some students originally assigned to the developmental group will end up in the college-level group. In Tables 1 to 4, to the extent that we are able we document retesting policies, and in most cases, retesting does not seem to be a problem. But, some authors were silent on this point. Calcagno and Long (2008) provided separate results for a subgroup of institutions that appeared to either not allow or to severely limit retesting, and we used the results from the “no retesting” group in our analyses below.
Study characteristics and outcomes: credits earned
Note. APSU = Austin Peay State University; CSCC = Cleveland State Community College; JSCC = Jackson State Community College; SE = standard error; ACT = American College Testing; ESL = English as a second or foreign language; SES = socioeconomic status. Effect sizes are unstandardized regression coefficients, so represent the observed effect in terms of credit hours earned. Because all sample sizes are not small, a z test for each effect size can be given by the effect size ÷ standard error. For Martorell and McFarlin (2011), the regression coefficient represents the total number of college-level credits attempted over a 6-year period. For institution type, 2 = community college, 4 = university. For placement test, CPT = Florida College Entry Level Placement Test; CUNY = City University of New York’s placement test; TASP = Texas Academic Skills Program.
Study characteristics and outcomes: ever pass college-level course
Note. SE = standard error; ESL = English as a second or foreign language; SES = socioeconomic status. Effect sizes are ordinary least squares regression estimates with a binary dependent variable, so represent the observed effect in terms of percentages passing the college-level course (e.g., −0.147 means that students assigned to developmental education passed the first college-level course in which remediation was needed at a rate that was 14.7 percentage points less than the rate at which students assigned directly to the college-level course passed it). Because all sample sizes are not small, a z test for each effect size can be given by the effect size ÷ standard error. For Lesik (2006), we assumed that the base rate of passing was 50%, which resulted in the most optimistic effect size possible. The 0.307 effect size represents a translation of the logged odds ratio reported in Table 4 of 1.43. For institution type, 2 = community college, 4 = university. For placement test, CPT = Florida College Entry Level Placement Test; CUNY = City University of New York’s placement test.
Study characteristics and outcomes: achievement in college-level course
Note. SE = standard error; SES = socioeconomic status. Effect sizes are unstandardized regression coefficients, so represent the observed effect in terms of student grades (e.g., 0.22 means that students assigned to developmental education scored 0.22 grade points higher on average than students assigned directly to the college-level course). Because all sample sizes are not small, a z test for each effect size can be given by the effect size ÷ standard error. Horn et al. (2009) did not report the standard error for the regression coefficient but did report that the coefficient’s p value was less than .05 (but presumably larger than .01); 0.216 is the standard error that yields p = .011. The model without covariates has a standard error of 0.245 so 0.216 does seem reasonable. For institution type, 2 = community college, 4 = university.
Study characteristics and outcomes: attainment
Note. SE = standard error; SES = socioeconomic status. Effect sizes are unstandardized regression coefficients, so represent the observed effect in terms of percentage of the sample earning a certificate or degree. Because all sample sizes are not small, a z test for each effect size can be given by the effect size ÷ standard error. For institution type, 2 = community college, 4 = university. For placement test, CPT = Florida College Entry Level Placement Test; CUNY = City University of New York’s placement test; TASP = Texas Academic Skills Program.
Meta-Analyzing RD Studies: RD Is a Model-Based Approach
RD researchers very often use additional control variables to help increase the precision of the estimates. One side effect of this practice is that researchers can end up using fairly different estimation models. The magnitude of a regression coefficient arising from a model depends on the other variables in the model. For example, the regression coefficient describing the relationship between self-efficacy for algebra performance and actual algebra performance will change if a potential confounding variable, like math anxiety, is entered into the model (because math anxiety is correlated with self-efficacy for algebra). As can be in Tables 1 to 4, the models in our analyses varied somewhat. Most controlled for a robust set of student background variables including prior academic achievement and socioeconomic status, but two studies (Lesik, 2006; Moss & Yeaton, 2006) only included the placement test score in their estimation models (which is a requirement of an RD analysis). As a result, even if all of the studies were estimating the same population parameter, we would expect that their individual estimates would vary. Therefore, the fact that we observed somewhat different models across studies likely contributed to additional between-study heterogeneity.
One could ask whether it is even sensible to combine meta-analytically effects that arise from different model specifications. In this case, we believe that models are generally similar in that they share the most important covariate (the placement score) and tend to employ other covariates in similar domains. For example, in the analysis of the effects of placement into developmental education on college credits earned, there were 16 independent samples. All of the courses use the placement score. Gender, race, and SES were also common; hence, the core covariates tended to be similar. As a result, we believe that our models were sufficiently similar to support meta-analysis, though this will be an important consideration for future researchers thinking about conducting meta-analysis with RD studies.
Results
The literature search uncovered 11 reports, with a total of 21 independent samples, that use RD to investigate the effects of placement into developmental education (henceforth, we refer to independent samples as “studies”). However, Harmon (2011) does not appear in our analyses, as that study did not examine one of our four primary outcomes. 2 The studies varied widely in size. The within-study sample sizes of the analyses we used ranged from 185 to 59,334, with a median sample size of about 1,000 students (in all, well over 100,000 students are represented in the meta-analytic database).
Credits Earned
Sixteen analyses examined the effect of placement into developmental education on college credits earned. As can be seen in Table 1, credits earned were typically examined about 3 years after assignment. The mean effect size under fixed-effect assumptions was −1.86 credits, p < .001. The homogeneity test was statistically significant, Q(15) = 43.18, p < .001 (I2 = 68%), indicating more variability in the observed effect sizes than would be expected if sampling error alone drove variation in effect sizes. The mean effect size under random-effects assumptions was −3.00 credits, p = .002. Below, we report two sensitivity analyses and two exploratory moderator analyses on this data set.
Ever Pass College-Level Course for Subject?
Table 2 houses the effect size estimates for the six analyses involving whether students ever passed the college-level course in which remediation was needed. For both fixed- and random-effects models, the mean effect size was a 7.9–percentage point reduction in the proportion of students eventually passing the college-level course in which remediation was needed (e.g., from 75% to 68%), p < .001 for the fixed-effect model and p = .004 for the random-effects model. The homogeneity test was statistically significant, Q(5) = 27.31, p < .001 (I2 = 81%).
Achievement in College-Level Course if Taken and Completed
Table 3 contains effect size estimates for nine analyses addressing the academic performance in the college-level course in which remediation was needed, conditional on students taking and completing that course. Of our four main outcomes, this is the one that is most likely to be biased by treatment-induced attrition, though the direction of this bias is difficult to predict. For the fixed-effect analysis, the estimated effect size is 0.00 (p = .98). For the random-effects model, the estimated effect size is +0.01 grade points (p = .94). The homogeneity test was not statistically significant, Q(8) = 15.16, p = .06 (I2 = 46%).
We should note here that while Scott-Clayton and Rodriguez (2015) also measured academic achievement in the college-level course in which remediation was needed, they did so by dummy coding achievement as whether students earned a B in the college-level course. In and of itself, this does not create a problem for our analysis, but Scott-Clayton and Rodriguez (2015) coded as “0” any student who either (a) earned less than a B or (b) never took the college-level course. Because this analysis conflates two aspects of the educational experience that we think should be kept separate, we did not use the two effect sizes from this study in our meta-analysis. Both were negative and statistically significant.
Degree Attainment
Thirteen studies examined the effect of placement into developmental education on degree or certificate attainment (see Table 4). For both fixed- and random-effects models, the mean effect size was a 1.5-percentage point reduction in the proportion of students eventually earning a degree (e.g., from 30% to 28.5%; p = .03 for both models). The homogeneity test was not statistically significant, Q(12) = 13.39, p = .34 (I2 = 7%).
The raw magnitude of this effect depends on (a) the size of the incoming class and (b) the proportion of these students assigned to developmental education. At the institution level, in small institutions and in institutions with low developmental education placement rates, the negative effect of placement into developmental education will not matter much. But in larger institutions, and in institutions with higher placement rates, this effect might be large enough to matter. For example, imagine a typical mid-sized university with 6,000 incoming students, 25% of whom are assigned to developmental education. This institution could be expected to award 22 or 23 fewer degrees in that class than it would have if placement into developmental education had no effect on attainment (i.e., if the graduation rate among nondevelopmental students is 60%, then 58.5% of the 1,500 developmental students are expected to earn a degree, and the difference between the two attainment rates is 22.5 degrees).
Of course, at the policy level, the consequences are staggering. Assume that in a given year, 2.5 million students start their college careers in either a university or a community college setting, that one third of these students are placed into developmental education, and that the overall 6-year graduation rate is 34%. The 1.5–percentage point reduction can be thought of as suggesting that 35% of students not placed into developmental education and 33.5% of students placed into developmental education will graduate in 6 years. This works out to a loss of about 12,500 certificates or degrees for that year’s cohort of students.
Exploratory Moderator Analyses
Our data set of studies examining the effects of placement into developmental education on credits earned is the only one large enough to support even tentative moderator analyses; we report two of these analyses below. The first examines the effects observed in community colleges relative to universities, and the second examines effects observed separately for reading, writing, and math. Even though we approached these hypothesis tests with specific predictions in mind, we believe that they are best conceptualized as exploratory analyses because, as Lipsey and Wilson (2001) observed, studies have personalities in the sense that their traits tend to cluster together. For a meta-analysis, this means that study characteristics tend to correlate with one another, confounding univariate analyses of the relationship between study characteristics and outcomes. As a result, moderator analyses in meta-analysis should generally be multivariate so that study characteristics can be examined net of other characteristics in the model. However, meta-regression (the meta-analytic analog to multiple regression) generally requires a large number of studies for both reasonable statistical power and stable estimates. The analyses below are univariate and as such warrant an extra level of caution when interpreting them.
Effects for 2- Versus 4-Year Institutions (Credit Accumulation Only)
In our meta-analytic data set, we have five estimates of the effects of placement into developmental education on college credit accumulation that are based on 4-year institutions and 11 estimates that are based on 2-year institutions. For universities, the fixed-effect and random-effects mean effect size is −4.64 credits, p = .002. The homogeneity test within these five estimates was not statistically significant, Q(4) = 2.46, p = .65. For community colleges, a somewhat different picture emerges. The mean effect size under fixed-effect assumptions was −1.56 credits, p = .001. The homogeneity test was statistically significant, Q(10) = 36.84, p < .001. The mean effect size under random-effects assumptions was −2.62 credits, p = .03.
Effects for Different Subjects
Our meta-analytic data set includes four analyses of developmental education for reading, three analyses of developmental education for writing, and seven analyses of developmental education for math. For math, the fixed-effect and random-effects mean effect size is −0.08 credits, p = .90. The homogeneity test within these seven studies was not statistically significant, Q(6) = 7.38, p = .29. For reading, the mean effect size under fixed-effect assumptions was −5.45 credits, p < .001. The homogeneity test was statistically significant, Q(3) = 8.58, p = .04. The mean effect size under random-effects assumptions was −4.87 credits, p = .01. For writing, the mean effect size under fixed-effect assumptions was −1.93 credits, p = .02. The homogeneity test was not statistically significant, Q (2) = 5.63, p = .06. The mean effect size under random-effects assumptions was −3.18 credits, p = .11.
Sensitivity Analyses
Because we have the most information on credits earned, we used this dataset to conduct several sensitivity analyses. First, we Winsorize the meta-analytic weights and next, we drop studies one at a time from the analysis. Both of these strategies are intended to ensure that our results are not being driven by a single study. Finally, five studies allow us to tentatively test the extent to which study results are sensitive to the bandwidth that was used.
Influence Analyses
As can be seen in Table 5, under fixed-effect assumptions, two studies (Hodara, 2012 and Scott-Clayton & Rodriguez’s, 2015 math analysis) have relative weights of 25% and 33%, suggesting that these studies are large relative to the other studies in the data set. Perhaps more important, Boatman’s (2012) community college reading analysis is very influential. By this, we mean that the analysis’ weight (which is above the mean) and effect size (the absolute value of which is the largest in the database) combine to exert a large influence on the fixed-effect analysis of college credits earned.
Relative weights and relative influence (college credits earned analysis)
Note. For each study, Relative Weight is the percentage of the total weight contributed by the study, and Relative Influence is defined as the weight times the square of the distance from each study’s effect size to the grand mean divided by the sum of the weights (i.e., wi(Xi − XGM)2/Σwi). APSU = Austin Peay State University, CSCC = Cleveland State Community College, JSCC = Jackson State Community College.
In Table 6, we first report our primary analyses for comparison. Then, we present the results of the primary analysis with two outlying weights Winsorized. We defined an outlier using Tukey’s (1977) rule (i.e., an outlier is an observation that is more than two standard deviations beyond the 75th percentile). As we suspected, Hodara (2012) and Scott-Clayton and Rodriguez’s (2015) math analysis were identified as outliers. We then trimmed the weights iteratively (recoding the weights so that they were no longer outliers, then rechecking for outliers) until no outliers were identified. This process had the effect of inflating the standard errors for these two studies (from 0.925 to 1.462 for Hodara, and from 0.796 to 1.462 for Scott-Clayton & Rodriguez, 2015). As can be seen, the patterns of statistical significance were unchanged across the mean effect size under fixed-effect assumptions, the mean effect size under random-effects assumptions, and the homogeneity analysis. Winsorizing resulted in a much larger point estimate for the fixed-effect analysis and had virtually no effect on the random effects analysis. The estimate of between study heterogeneity dropped somewhat with Winsorized weights (I2 values were 68% for the main specification vs. 59% for the Winsorized analysis).
Sensitivity analyses (credits earned)
Note. ES = effect size (expressed as college credits earned), SE = standard error, p = probability, Q (df) = homogeneity test statistic value and degrees of freedom. The statistic I2 expresses the proportion of variability in effect sizes that is attributable to differences across studies (as opposed to sampling error). APSU = Austin Peay State University, CSCC = Cleveland State Community College, JSCC = Jackson State Community College.
Next, we addressed potentially influential studies by dropping one study at a time from the main analysis of the effects of placement into developmental education on college credits earned. Again, most of the changes are minor, but dropping Boatman’s (2012) community college reading effect results in a large change to the fixed estimate (from −1.86 to −1.36 credits) and to the random-effects estimate (−3.04 to −2.25 credits). Dropping both Hodara (2012) and Scott-Clayton and Rodriguez’s (2015) math analysis resulted in less dramatic increases to both fixed-effect and random-effects estimates. Across these “drop one study” analyses, the statistical conclusions did not change (i.e., the mean effect was negative and statistically significant under fixed- and random-effects assumptions, and the homogeneity test was statistically significant), and the substantive interpretations of the effects were highly similar.
RD Assumptions
Two studies (Calcagno & Long, 2008; Martorell & McFarlin, 2011) provide effect sizes both for all students in the analysis and for a specific bandwidth. Similarly, three studies in our meta-analytic database (Hodara, 2012; Moss et al., 2014; Scott-Clayton & Rodriguez, 2015) used at least two bandwidths as a sensitivity check. In Table 7, we present the effects observed in these five studies across a total of 13 analyses and also provide a statistical test for the difference in the effect sizes across each of these analyses. The statistical tests are z tests using procedures described in Borenstein, Hedges, Higgins, and Rothstein (2009) for computing the variance of two correlated variables. This procedure requires that researchers know or estimate the extent to which the standard errors are based on independent information. Though not realistic, we chose zero for this value because doing so yields the smallest possible standard error. This means that the statistical tests in Table 7 are more likely to result in a rejection of the null hypothesis of no difference, even when there is no actual difference between the estimates and as such represent a “worst case” scenario.
Sensitivity analysis: narrow bandwidth
Note. ES = effect size; SE = standard error.
As can be seen, only 1 of the 13 tests resulted in a rejection of the null hypothesis (p = .048). Correcting for multiple comparisons using any common procedure (e.g., a Bonferroni correction or the Benjamini–Hochberg correction) yields nonsignificant results for all tests. Furthermore, there was no consistency in the direction of the differences, and the median p value across these 13 analysis is .41. As such, we cannot find evidence in these studies that the observed effect sizes were unduly influenced by our decision to use the most narrow bandwidth given in the studies.
Discussion
This article reviewed evidence on the effects of placement into developmental education as evaluated with RD designs, and as such represents the most rigorous review of the effects of placement into developmental education to date. If the causal inferences are correct and our effect sizes are reasonably accurately estimated, the meta-analyses of studies using RD to investigate the effects of placement into developmental education suggest that placement into developmental education results in statistically significant and substantively sizable negative impacts. Relative to their peers who are also on the margin of college readiness but who were placed into college-level courses, students placed into developmental education earned fewer college credits after about 3 years (our estimates ranged from about 2 to 3 credit hours, depending on model specification), were about 8 percentage points less likely to eventually pass the college-level course in which remediation was needed, and were about 1.5 percentage points less likely to earn a certificate or degree. We cannot reject the null hypothesis that marginal students placed into developmental education perform similarly (i.e., earn similar grades) in the college-level course in which remediation was needed relative to marginal students placed into the college-level course. The results for college credits earned were not sensitive to either outlier effect sizes (there were none) or outlier weights. Influential studies similarly did not affect the statistical significance of the results, though in the fixed-effect model, there was some variation in the effect sizes observed depending on which studies were in the analysis (effect sizes in the random effects model were very similar regardless of which studies were in the analysis). There is no evidence that the observed effect sizes were influenced by the decision to focus on the narrowest bandwidth presented in the studies in the review.
The exploratory moderator analyses using the studies that assessed college credits earned suggest that the negative effects of placement into developmental education are stronger for university students (but still statistically significant and negative for community college students), and for students placed into developmental education in reading and writing (recall that for writing, the fixed-effect estimate was statistically significant but the random-effects estimate was not, p = .11), but not math (the fixed- and random-effects estimates were close to zero and were not statistically significant). This latter point merits additional research attention. Using a sample of community college students enrolled in college-level English, Roksa, Jenkins, Jaggars, Zeidenberg, and Cho (2009) found that the probability of passing that course was unrelated to placement test scores. Though just one study, this finding raises questions about the adequacy of placement test scores as a basis for assigning students to developmental education.
How Can Educational Systems and Institutions Improve the Situation?
This study was designed to assess—across multiple studies in many contexts—if placement into developmental education helps students be successful in college. It was not designed to address how or why any positive or negative effects might have occurred. That said, because students were about 8 percentage points less likely to eventually pass the college-level course in which remediation was required, it is reasonable to speculate that the college-level course in which remediation was required represents an important roadblock for students assigned to developmental education. For example, Bailey (2009) concluded that developmental education is “not every effective . . . partly because the majority of students referred to developmental education do not finish the sequences to which they are referred” (p. 12; see also Bailey et al., 2010). Furthermore, much of the national conversation on developmental education has focused on misplacement rates. As mentioned earlier, placement is generally based on a single test. No one believes—or at least, no one should believe—that these tests are perfect indicators of college readiness (see Armstrong, 2000). A general principle of psychological measurement is when a construct (like college readiness) is measured imperfectly, one way to improve measurement is to measure the construct in multiple ways. Incorporating information that many institutions already have—such as high school grade point average and scores on standardized entrance tests—into placement decisions is a relatively easy way to modify the placement rubric that has the potential to reduce misplacement rates. Title 5 §55502 of the California Code of Regulations explicitly recognizes this by requiring institutions to use multiple measures for placement into developmental education, and even placement test developers recommend that institutions use multiple measures for placement (Westrick & Allen, 2014). If we were responsible for running an institution, attempting to reduce misplacement rates by using multiple measures would be where we would start reform efforts.
Furthermore, it is not clear that all students need a semester-long course to achieve college readiness, and researchers have been experimenting with other ways to accomplish this goal. For example, Logue, Watanabe-Rose, and Douglas (2016) conducted a randomized experiment in which algebra instruction was embedded into a college-level statistics course supplemented with weekly workshops that focused on algebra. Compared with students who took the usual developmental algebra course, or that course supplemented with weekly workshops, students taking the college statistics course earned more college-level credits over three semesters (21 to about 15 in the other two groups), and were more likely to pass the course to which they were assigned. Other possible ways of remediating deficits include summer bridge programs, targeted one credit presemester tune-up courses, and by providing additional supports (e.g., mandatory tutoring sessions during the semester). The important point is that educational leaders should think carefully about who gets placed into developmental education and develop flexible systems to help students develop the skills that they need to be successful in college (see Bailey, 2009).
Limitations and Conclusion
An important conceptual limitation is that this study did not address the effect of placement at different levels of developmental education (e.g., elementary vs. intermediate algebra). Due to the relatively small numbers of students placed at the lowest levels of developmental education, and the fact that all else being equal statistical power in RD is much lower than in a randomized experiment, it is likely that a series of randomized experiments will be needed to address this question.
With respect to the questions that we were able to address, perhaps the greatest threat to the conclusions we draw in this article is that our analyses are based on studies with characteristics that differ in fundamental and probably important ways. Most obviously, we included studies that examined the effects of placement into developmental reading, writing, and math, and studies that occurred in both community colleges and universities. We were only able to test these two potential modifiers of the effects of placement in developmental education for one outcome (credits earned) because we had too few studies to support parallel analyses for the other outcomes. Those analyses did suggest that there is reason to suspect heterogeneous effects (e.g., placement into developmental education appears to have more negative effects on university students than on community college students). However, these analyses were not multivariate, and therefore could confound the effects of other study characteristics with the ones we were examining. Readers can draw some reassurance from our extensive sensitivity analyses, which suggest that our results are not unduly influenced by exceptional studies or by some of the important decisions we made when assembling our meta-analytic data set.
Even exercising appropriate caution in drawing causal conclusions from our research, based on the studies we review, it is very difficult to walk away with the conclusion that placement into developmental education helps students. More than 75% of the estimates in our meta-analytic database are negative, and the meta-analytic estimates for the probability of passing the college-level course in which remediation was needed, college credits earned, and attainment are all negative, statistically significant, and large enough to be meaningful. Our hope is that this work spurs thoughtful debate and research on placement policies and on alternative mechanisms for ensuring that college students have the skills needed to meet their goals.
Footnotes
Notes
Authors
JEFFREY C. VALENTINE is a professor in and the coordinator of the Educational Psychology, Measurement, and Evaluation program in the College of Education and Human Development at the University of Louisville. A social psychologist by training (University of Missouri-Columbia, 2001), most of his work involves using, explaining, and seeking to improve meta-analytic techniques as a means of helping policymakers and practitioners identify effective interventions that improve the health, well-being, and educational outcomes of children, young adults, and families. He is co-editor, with Harris Cooper and Larry Hedges, of the second edition of the Handbook of Research Synthesis and Meta-Analysis; associate editor of the peer-reviewed journal Research Synthesis Methods; and statistical editor for the Cochrane Collaboration’s Psychological, Developmental, and Learning Problems Group.
SPYROS KONSTANTOPOULOS is professor of measurement and quantitative methods at the department of counseling, educational psychology, and special education at the college of education at Michigan State University. He received his BA from the University of Athens in Education, his first MS from Purdue University in Educational Psychology and Research Methods, his second MS from the University of Chicago in Statistics, and his PhD from the University of Chicago in Research Methods. His research interests include the extension and application of statistical methods to issues in education, social science, and policy studies. His methodological work involves statistical methods for multilevel data structures. His substantive work encompasses research on class size effects, teacher and school effects, and the social distribution of academic achievement.
SARA GOLDRICK-RAB is professor of higher education policy & sociology at Temple University, and founder at the Wisconsin HOPE Lab, the nation’s only translational research laboratory seeking ways to make college more affordable. Dr. Goldrick-Rab’s commitment to scholar-activism is evidenced by her broad profile of research and writing dissecting the intended and unintended consequences of the college-for-all movement in the United States. In more than a dozen experimental, longitudinal, and mixed-methods studies, she has examined the efficacy and distributional implications of financial aid policies, welfare reform, transfer practices, and a range of interventions aimed at increasing college attainment among marginalized populations. She provides extensive service to local, state, and national communities, working directly with governors and state legislators to craft policies to make college more affordable, collaborating with non-profit organizations seeking to examine the effects of their practices, and providing technical assistance to Congressional staff, think tanks, and membership organizations throughout Washington, DC.
