Abstract
Students with disabilities experience poorer post-school outcomes compared with their peers without disabilities. Existing experimental literature on “what works” for improving these outcomes is rare; however, a rapidly growing body of research investigates correlational relationships between experiences in school and post-school outcomes. A meta-analytic review provides means for assessing which experiences show the strongest relationships with long-term outcomes and variability in these relationships by outcome, research design, and population. This article presents a meta-analysis of in-school predictors of postsecondary employment, education, and independent living of youth with disabilities, examining 35 sources and 27 samples (N = 16,957) published from January of 1984 through May of 2010. Predictors showed differing relationships with education versus employment. Some of the least studied predictors, especially those involving multistakeholder collaboration, had larger effects than predictors more typically the focus of correlational research. Implications for future research and practice are considered.
Keywords
Individuals with disabilities have long experienced poorer post-school outcomes than those without disabilities, and evidence suggests that despite some improvement, these disparities persist (e.g., Fabian, Lent, & Willis, 1998; Newman et al., 2011). A well-developed literature on secondary transition interventions for youth with disabilities would include studies using rigorous methods to identify what works for improving post-school outcomes of this population, including experimental designs with post-school follow-up (Test, Fowler, et al., 2009). Although such studies remain scarce, sufficient correlational and quasi-experimental literature exists to begin answering questions about interventions’ effectiveness (Cobb et al., 2013). Correlational research cannot rule out confounds such as maturational effects but does allow practitioners to determine which interventions show the greatest promise for influencing secondary transition, and whether short-term targets for intervention (e.g., increases in academic achievement or changes in social skills prior to school exit) relate to later, post-school outcomes (Test, Mazzotti, et al., 2009). Furthermore, correlational studies can help identify conditions when interventions are optimally effective, especially if multiple studies are compared using meta-analytic techniques (i.e., moderator analyses; Borenstein, Hedges, Higgins, & Rothstein, 2009).
As summarized in recent reviews (Cobb et al., 2013; Test, Fowler, et al., 2009; Test, Mazzotti, et al., 2009), important progress has been made in establishing an informative correlational literature, including identification of a variety of interventions as having potential or moderate research support (with the strong level reserved for experimental designs; National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, 2011). A major contribution to this literature was provided by the initial National Longitudinal Transition Study (NLTS; Wagner, Blackorby, Cameto, & Newman, 1993). This study, the first attempt to examine prediction of postsecondary outcomes in a nationally representative sample, has been used in much correlational research to date (see Test, Mazzotti, et al., 2009). Despite this progress, questions regarding the conditions under which interventions are effective need to be answered to confidently implement evidence-based interventions in educational programs. These include questions regarding the strength of prediction by interventions or short-term intervention targets (“how much”) and the breadth of their prediction across various specific transition outcomes (“for what”), research designs and contexts (“when”), and populations (“for whom”; Cobb et al., 2013). Additionally, researchers have begun to explore ways of coordinating efforts in delivering interventions among various stakeholders such as school or nonschool professionals or parents (“with whom”). Systems of classification such as the Kohler Taxonomy for Transition Programming (Kohler, 1993) have highlighted the importance of such multistakeholder collaboration, but to date, the literature on this topic has not been examined empirically (Landmark, Ju, & Zhang, 2010).
Building on a prior systematic review of evidence in the existing correlational literature (Test, Mazzotti, et al., 2009), the present meta-analysis aims to elucidate the strength and limits of what works in this literature, by examining prediction of postsecondary outcomes from in-school interventions, short-term intervention targets (e.g., academic achievement, social skills), and types of multistakeholder collaboration (e.g., transition planning, parent involvement). In so doing, the study attempts to strengthen the foundation for emerging next-generation research examining post-school outcomes of secondary transition through rigorous, experimental designs and through describing strength and generalizability of effects across outcomes, context, research design, and populations (Cobb et al., 2013). The study also aims to provide guidance for future studies, especially those using the recently released data from the National Longitudinal Transition Study–2 (NLTS-2), by identifying where variability in effects by outcome, research design, and population may need to be considered in correlational designs.
Summarizing Correlational Research on Secondary Transition Interventions
Prior Systematic Reviews of What Works
Researchers have conducted numerous experimental studies to determine effective strategies for teaching secondary transition skills to students with disabilities (e.g., Ayres, Langone, Boon, & Norman, 2006; Bates, Cuvo, Miner, & Korabek, 2001); however, almost all of these studies assess outcomes prior to secondary school completion (e.g., skills acquisition) rather than following school (Test, Mazzotti, et al., 2009). A recent systematic review of literature on secondary transition interventions (Cobb et al., 2013) failed to find any experimental studies and only one quasi-experimental study (Baer, Daviso, Flexer, Queen, & Meindl, 2011) assessing post-school follow-up. For the most part, studies investigating associations between in-school variables and positive post-school outcomes are correlational (e.g., Halpern, Yovanoff, Doren, & Benz, 1995; Harvey, 2002; Heal & Rusch, 1994; Wehmeyer & Schwartz, 1997; White & Weiner, 2004). These studies often assess factors studied in the experimental literature, either as independent variables (e.g., vocational education) or as proximal outcomes (i.e., outcomes assessed in school, such as social skills, self-determination), and link these to post-school education, employment, and independent living. By examining these predictors, correlational studies provide a missing link, identifying which experiences, programs, and in-school outcomes in the experimental literature are associated with post-school success (Test, Mazzotti, et al., 2009).
To facilitate use of the literature in practical settings, the National Secondary Transition Technical Assistance Center (NSTTAC) has recently sought to summarize evidence on supporting post-school transition of students with disabilities through two systematic literature reviews. The first review focused on experimental studies (i.e., single-subject and group designs) to determine evidence-based practices to teach students specific transition-related skills (Test, Fowler, et al., 2009). The second review (and precursor to the present study) synthesized the correlational literature (i.e., nonexperimental studies, excluding single-case designs) on in-school predictors of post-school outcomes for secondary students with disabilities (Test, Mazzotti, et al., 2009). This review identified 16 in-school predictors of post-school success, providing an initial summary of what works in the correlational literature: (a) Career Awareness, (b) Community Experiences, (c) Exit Exam Requirements/High School Diploma Status, (d) Inclusion (in general education), (e) Interagency Collaboration, (f) Occupational Courses, (g) Paid Employment/Work Experience, (h) Parental Involvement, (i) Program of Study, (j) Self advocacy/Self-determination, (k) Self-care/Independent Living Skills, (l) Social Skills, (m) Student Support, (n) Transition Programs, (o) Vocational Education, and (p) Work Study (i.e., participation in work study programs). These same predictors were used in the current analysis. Complete descriptions of the predictors and associated findings are available in Test, Mazzotti, et al. (2009). Test and associates identified several notable limitations of their work. Of perhaps greatest significance was the use of a vote counting strategy—that is, simply identifying the number of studies showing significant effects in certain ranges (i.e., small, medium, and large; Cohen, 1988). This limitation is further described in the following section.
Advantages of Meta-Analyses Versus Vote Counting
Meta-analyses hold several advantages over impressionistic methods of analyzing systematically collected studies of what works (e.g., vote counting). First, these methods produce more reliable, precise estimates than qualitative review methods, vote-counting, or simple averaging of effects, through weighting by sample size and other methodological characteristics, and including nonsignificant as well as significant findings (Borenstein et al., 2009). Meta-analyses also allow levels of reliability to be characterized through definition of confidence intervals. Specifically, confidence intervals allow for assessing the probability that effects are spurious and, if not, the likely range of “true” effects (i.e., effects that would be approximated with continued accumulation of research; Hedges & Olkin, 1985). These meta-analytic strategies (i.e., inclusion of nonsignificant effect sizes, pooling, weighting, and construction of confidence intervals) are also useful in addressing threats to validity of inferences such as the file drawer effect (misrepresentation of effect sizes due to excluded studies; Hedges & Olkin, 1985). Furthermore, the estimates provided by a meta-analysis can be used to forecast how change in a predictor might affect postsecondary outcomes. For example, following available best practice guidelines (Lipsey et al., 2012), planners can readily use effect size data to anticipate the likelihood of an outcome given the presence of a predictor (e.g., estimates of the proportions of students who will be employed following school exit in a population receiving vocational education vs. one not receiving such services).
Beyond What Works: Moderator Analysis
Another advantage of the meta-analytic approach is its ability to test variability of effects through moderator analyses. Moderator analyses can examine whether variability exists among effects across different predictors, outcomes, populations, or research designs. In the absence of such analyses, researchers and practitioners lack definitive guidance on which of multiple promising interventions or short-term outcomes most influence postsecondary transition, and which are generalizable to the populations, settings, and outcomes of interest.
Examining Moderation by Predictor, Taxonomy Category, and Outcomes
Vote-counting methodology treats all effects in the same manner; thus, all predictors or categories associated with significant effect sizes would be regarded similarly in a vote-counting study, even if they varied widely in magnitude. Alternatively, if differences among predictors are compared impressionistically, without use of meta-analytic adjustments (as in Test, Mazzotti, et al., 2009), their meaningfulness may not be clear. Simple comparisons of effect size magnitude do not account for imprecision due to small numbers of studies or small samples. A seemingly large difference in effect sizes for different predictors or categories might not be meaningful if based on only a few studies or small sample sizes. Similar considerations hold for comparing effects across different types of outcomes. Lacking these meta-analytic comparisons, researchers or educators might erroneously conclude that an intervention is equally effective regardless of the outcome involved, where accurate weighting and comparison would show meaningful differences (or conversely, that an intervention is more effective for one vs. another outcome when it is not). Research suggests that interventions improving one transition outcome may fail to affect, or may even compromise, other outcomes, especially among at risk youth (e.g., education vs. employment outcomes; Haber, Loker, Deschênes, & Clark, 2008).
Examining Moderation by Demographic Variables
Beyond the type of intervention, experience, or outcome involved (what), moderation may occur by population (who). Among study population moderators, demographic variables are crucial to examine, as prediction may vary by gender, ethnicity, or disability (Hasnain & Balcazar, 2009; Wehmeyer & Schwartz, 2001; Wittenburg & Maag, 2002). Lacking knowledge of whether such moderation exists, researchers or practitioners might apply interventions that are associated with outcomes in general, but not for their specific population. This is most likely to be a problem for populations less typical in special education research, such as those that are predominantly female or have high concentrations of minority students or students with low incidence disabilities.
Moderation by Research Design
Research design may also be relevant in considering strength of evidence, especially for correlational literature. Length of follow-up may be an important variable in gauging the strength of effects, with some researchers considering follow-up of less than 2 years to be short term and thus not credible for meaningful forecasting of effects (Leff, Conley, & Elmore, 2005). All studies included in the Test, Mazzotti, et al. (2009) analysis assessed post-school outcomes, but in many cases, these assessments occurred less than 2 years from school exit. Study quality may also be important. For example, in correlational research, especially in studies including many predictors, inappropriate use of univariate techniques may inflate the size of estimates (Thompson, Diamond, McWilliam, Snyder, & Snyder, 2005). Poor study rigor also increases estimate imprecision, leading to noise that may detract from the ability to make meaningful distinctions between effects (Hunter & Schmidt, 2004).
Beyond Specific Interventions
Test, Mazzotti, et al. (2009) used an inductive approach for characterizing predictor categories rather than preexisting classification strategies. Categories were thus specific to the research and not aligned with existing means for classifying interventions in the secondary transition field. Multiple frameworks have been developed for describing the critical components of secondary transition programming (e.g., the National Standards and Quality Indicators for Secondary Education and Transition of the National Alliance for Secondary Education and Transition, 2005; Guideposts for Success of the National Collaborative on Workforce and Disability for Youth, 2009). For the current study, the Taxonomy for Transition Programming (Taxonomy; Kohler, 1993) was selected due to its capturing important distinctions in the literature predicting postsecondary outcomes, especially the distinction between specific interventions and efforts to coordinate their use among formal and informal stakeholders.
The Taxonomy, developed from program evaluation research and then expert review (Landmark et al., 2010), is composed of five categories: (a) Student-focused Planning (e.g., IEP development, student participation in IEP meetings); (b) Student Development (e.g., instruction in employment or life skills career and vocational curricula, support services, structured work experiences); (c) Interagency Collaboration (e.g., collaborative service delivery and frameworks); (d) Program Structure (e.g., program philosophy, program evaluation, policies, resource development, and resource allocation); and (e) Family Involvement (level of involvement or interventions to improve involvement of parents or family members). Of the five categories, three focusing on strategies for coordinating efforts of stakeholders in and outside of school—Student-focused Planning, Interagency Collaboration, and Family Involvement—were of particular interest for this study. In a recent review of the Taxonomy, Landmark et al. (2010) characterized these strategies as promising based on large effect sizes found in some existing research. Findings from the current meta-analysis supporting this assertion would push the field to strengthen its efforts to develop methods for multistakeholder collaboration.
The Present Study
Research on improving postsecondary outcomes stands at an important juncture. As shown by recent systematic reviews, experimental research examining post-school outcomes remains lacking; however, many correlational studies have been completed, including studies from a large, nationally representative sample (i.e., NLTS). As described above, because systematic reviews conducted to date (i.e., Cobb et al., 2013; Test, Mazzotti, et al., 2009) have not used meta-analytic methods, they cannot address several important research and practice concerns related to strength and generalizability of effects. As researchers begin the next phase of secondary transition research, including efforts to use data from the recently completed NLTS-2, guidance on these issues is needed.
The present study adopted a meta-analytic approach in addressing the following research questions in an updated sample of correlational studies (i.e., through May of 2010): (a) what is the average size and typical range of effects in existing correlational literature relating in-school predictors to post-school outcomes, both overall and for particular outcomes (i.e., employment, education, and independent living outcomes); (b) what effect sizes characterize existing research on the Test, Mazzotti, et al. (2009) predictors (i.e., Career Awareness, Community Experiences, Exit exam requirements/high school diploma status); (c) what overall effect sizes are associated with the Taxonomy (1996) categories (i.e., Student-focused Planning, Student Development); (d) are there differences among effect sizes within sets of in-school predictors (i.e., Test, Mazzotti, et al., 2009, categories, Taxonomy categories); (e) are there differences among effects in predicting specific outcomes (i.e., employment, education, independent living, and productivity); (f) are methodological features of studies associated with outcomes, including statistical design, length of follow up, and overall quality as determined by widely used criteria (Thompson et al., 2005); (g) do gender, ethnicity, and disability impact prediction of post-school outcomes?
Specific hypotheses included that (a) differences would be found among effect sizes for Test, Mazzotti, et al. (2009) predictors and Taxonomy categories; (b) Test, Mazzotti, et al., predictors and Taxonomy categories focusing on collaboration among stakeholders in school and other settings (e.g., Student-focused Planning, Interagency Collaboration, and Parent Involvement) would have larger estimates than other categories (e.g., Student Development and Program Characteristics); (c) relationships between predictors and outcomes would differ across outcome categories, with stronger relationships obtained between work outcomes and interventions focusing on work-related skills, and between educational outcomes and interventions focusing on education-related skills; (d) effects would be smaller for studies using theoretically determined covariates, including long-term follow-up, or adhering to the Thompson et al. (2005) quality indicator criteria; (e) hypothesized meaningful between category heterogeneity (e.g., among effects for different outcomes or predictors) would be detected more consistently among studies adhering to versus those not adhering to quality indicator criteria; (f) effects would be stronger for samples with higher proportions of majority versus minority groups (i.e., higher proportions of male or Caucasian students or those with high incidence disabilities).
Method
Because this meta-analysis sought to complement the Test, Mazzotti, et al. (2009) systematic literature review, researchers began the new search process by examining all articles identified in the Test, Mazzotti, et al. search and evaluating whether to include these studies using the current review’s criteria. The prior review excluded studies that failed to meet a set of quality criteria recommended by Thompson et al. (2005) for correlational research. In contrast, for the current review, studies were included whether or not they met the Thompson et al. criteria, and status on these criteria was used as a moderator variable (see Analytic Plan below) and for testing the hypothesis that distinctions between effects would be more likely for studies meeting criteria than those that did not. A new electronic search was conducted using identical methods to those in Test, Mazzotti, et al. (2009) to identify all peer-reviewed articles and unpublished dissertations from January 1984 through May 2010 examining predictors of post-school success. The databases used included Academic Search Premier, Educational Administration Abstracts, Education Research Complete, Educational Resources Information Center (ERIC), MasterFILE Premier, Middle Search Plus, and PsycINFO.
Full and truncated versions of the following search terms were used: correlation, correlate, correlational, predictor, relationship, students, youth, adolescents, young adults, disability, middle school, high school, transition, education, special education, outcomes, post-school, postsecondary, post-school outcomes, in-school, postsecondary education, employment, independent living, and quality of life. Researchers also reviewed reference lists of articles from electronic searches meeting inclusion criteria to identify additional articles. Finally, experts were contacted to determine if any further published or unpublished works meeting criteria could be found, with at least three contact attempts made for each expert. Using these methods, 332 articles were found, including 28 sources from the Test, Mazzotti, et al. (2009) review.
Inclusion and Exclusion Criteria
To be included, a study had to examine (a) a predictor or predictors related to a secondary educational program, practice, or skill or achievement in school; and (b) a post-school education, employment, and/or independent living outcome or outcomes. Through reviewing abstracts and data analysis sections of the articles, researchers determined that 255 of the original 332 articles failed to meet these criteria. Additionally, articles were excluded because they (a) did not clearly differentiate students with and without disabilities in analyses (n = 3), (b) did not include students with disabilities (n = 23), (c) were not available in English (n = 2), (d) were from before 1984 (n = 2), or (e) provided insufficient data to determine effect size and authors did not provide these data on request after three attempts (n = 12). These exclusions brought the number of excluded sources to 297, with 35 sources included, 21 of which were from the original Test, Mazzotti, et al. (2009) review, and 14 of which were unique to the current analysis.
The 35 sources yielded 27 discrete samples (due to instances where multiple studies used the same sample) with 16,957 participants and 317 effect sizes. Samples (rather than studies) were used as the unit of analysis (as further described in the Analytic Plan section). Studies from the NLTS, but not NLTS2, were included. This was because no studies from NLTS2 relating in school predictors to post-school outcomes had been completed within the timeframe reviewed, and the current analysis was intended to inform NLTS2 analyses using studies published prior to the release of data for post-school NLTS2 waves.
Coding
Researchers coded all sources and associated effects using a form developed for the meta-analysis. An initial set of codes was created from those in the prior systematic review, supplemented by additional codes for the moderator analysis. Each article was coded by one of four trained doctoral students (the second through fifth authors), with articles divided roughly equally among the students and randomly assigned. For 25% (n = 7) of the articles, a second student coder was randomly assigned, with that student and the primary coder both coding each of these articles independently. All disagreements between independent codes of paired students were reconciled by the coders in consultation with the lead author so that a single code for each coding decision was determined for entry in the final database for analyses. Independent codes were also retained in a separate database for calculation of interrater reliability. Percent agreement per code was calculated by dividing the number of agreements by total coding decisions multiplied by 100. Interrater reliability ranged from 89.8% to 100% with a mean of 98.0%. Coding was also carefully reviewed for consistency and spot-checked for accuracy by the first author.
Study Characteristics Coding
Coded study characteristics are shown in Table 1. These included (a) institutional setting (school vs. community); (b) geographic setting (urban vs. rural); (c) sample size; (d) statistical design/effect size type, including bivariate coefficients, coefficients controlled for theoretically selected covariates (theoretically driven), and for covariates selected without explicit theoretical basis (exploratory); (e) predictor timing, or when predictors were measured (e.g., 11th grade); and (f) quality, operationalized as whether studies met criteria for methodological rigor in correlational research proposed by Thompson et al. (2005) (further described in the next section). In addition, ns and percentages for each category of several demographic variables were coded, including (a) gender; (b) age; (c) ethnicity (as defined by the U.S. census); (d) minority ethnicity (i.e., students of color vs. Caucasian students); (e) lowest socioeconomic status (as operationalized by each study); and (f) disability type (i.e., IDEA disability category). Although initial plans were to code additional characteristics related to disability beyond IDEA categories (e.g., low vs. high incidence, such as severe or moderate vs. mild intellectual disability), sources generally lacked this level of detail so coding of these characteristics was discontinued.
Study characteristics
Note. NLTS = National Longitudinal Transition Study; NELS = National Education Longitudinal Study; NLSY = National Longitudinal Survey of Youth; QI = quality indicator; MR = mental retardation; LD = learning disability; ED = emotional disturbance; NA = not specified in article.
Studies were coded as urban if students were drawn exclusively from urban settings; all other types of samples were coded “other.”
Both Baer et al. (2011) and Flexer et al. (2011) drew from a single sample; Flexer et al. contained the complete sample, whereas Baer et al. (2011) included only youth with developmental or intellectual disabilities.
Sample for Benz et al. (1997); Doren and Benz (1998); Halpern et al. (1995a); Doren and Benz also includes subsamples by gender, analyzed separately for test of moderation by gender.
Sample for Blackorby et al. (1993); Heal and Rusch (1994); Heal and Rusch (1995); Heal et al. (1997); Heal et al. (1998); Heal et al. (1999); Rylance (1998); and Wagner et al. (1993); note that Rylance (1998) includes youth with emotional disturbances only.
Sample for Fabian et al. (1998) and Luecking and Fabian (2000).
Sample is distinct from Halpern et al. (1995a; also used in Doren & Benz, 1998), though drawn from the same publication; thus, these two samples count as one toward the study total.
Sample for Rabren et al. (2002) and Rabren et al. (2003); Rabren et al. (2003) includes youth in vocational rehabilitation services only; Rabren et al. (2003) also includes subsamples by gender, analyzed separately for test of moderation by gender.
Study included samples from three FETPIP evaluation periods with descriptive data provided separately for each; thus, for purposes of the meta-analysis, Repetto et al. (2002) was treated as three separate samples (Ns of 148, 29, and 52).
Sample for Wehmeyer and Palmer (2003) and Wehmeyer and Schwartz (1997).
Coding of Thompson et al. (2005) quality criteria
To define study quality, a widely cited set of criteria for rigor in educational research proposed by Thompson et al. (2005) was used. These included criteria related to measurement, practical (or clinical) significance, common analytic errors, and use of confidence intervals. Relevant criteria had been adapted for the Test, Mazzotti, et al. (2009) review. These adapted criteria, applied identically in the present article, included (a) inclusion of all effect sizes in reporting of findings, including nonsignificant effect sizes; (b) appropriate interpretation of general linear model weights (e.g., beta weights); (c) use of multivariate techniques for multiple outcome variables; (d) use of the highest available scale of data (e.g., interval rather than nominal) unless the decision to use the lower scale is thoroughly justified; (e) presentation of evidence that statistical assumptions are sufficiently met (e.g., homogeneity of variance, normal distribution, measures of central tendency). Studies were coded as meeting Thompson et al. criteria if all criteria were met.
Effect Size Coding 1
Coded effect size characteristics included (a) length of follow-up post-school, operationalized as less than 2 years versus 2 or more years of follow-up; (b) three types of predictor codes; and (c) two types of outcome codes, further described below.
Predictor coding
The first predictor code used the 16 predictors identified by the Test, Mazzotti, et al. (see Test, Mazzotti, et al., 2009, for complete description of the categories). A new Specific Intervention category was added. This category was created to differentiate between specific transition interventions (e.g., Marriott Bridges to Work Program) and more general types of programs (e.g., types of courses such as Vocational Education, or transition programs lacking adherence to a well-described model), with the variable coded if the predictor was a specific intervention. Taxonomy categories (described previously) were also coded.
Outcome coding
Post-school outcomes were coded using the following categories: (a) employment; (b) education or training; (c) productivity (i.e., engagement in either employment or school); and (d) independent living (e.g., residential independence or quality of life variables).
Analytic Plan
Types of Effect Sizes and Effect Size Transformations
Of the 317 effect sizes in the sample, the majority (n = 189) were bivariate correlations (rs), which served as the common metric of effect size for this study and thus were the preferred effect size where available. A small number (n = 7) of other types of bivariate effect sizes were converted to rs, including ds and proportions. Remaining effect sizes used in the study were coefficients representing the unique effect of a single predictor variable, controlled for other predictors, including betas (i.e., standardized regression or path coefficients) or odds ratios from regression analyses (n = 121).
Some researchers (e.g., Hunter & Schmidt, 2004) have argued against using any effects other than simple, zero-order bivariate relationships, holding that effects controlled for varying sets of other variables cannot be meaningfully combined. Others have contested this position, arguing that limiting studies to those with bivariate data results in the exclusion of many sources, in turn biasing findings in favor of whichever studies provide these data. In support of this argument, Peterson and Brown (2005) demonstrated that differences between coefficients for bivariate relationships and those controlled for other effects are small (i.e., less than .1 in most cases) and that including such effects produced more precise estimates. Furthermore, they proposed an adjustment to reduce inflation of estimated effect sizes due to inclusion of multivariate coefficients.
Multiple attempts were made to contact authors in cases in which published studies did not provide bivariate effects, but relatively few were received (these were included in the 189 total). Consequently, to avoid excluding a large number of effects, the Peterson and Brown (2005) adjustment was used where possible (i.e., for coefficients from linear regression analyses). Otherwise, effects were included without the adjustment, based on the conclusion of the Peterson and Brown (2005) analysis that exclusion of multivariate effects results in misrepresentative findings, whereas inclusion of such effects (i.e., relative to bivariate coefficients representing the same effects) is unlikely to overly influence analyses. In order to identify any lingering impact of including multivariate coefficients, researchers tested a moderator comparing bivariate to other effects and found this to be nonsignificant.
For analyses, coefficients were Fisher’s z transformed to remove bias in calculation of standard errors related to the magnitude of estimated effects. Following analyses, results were converted back to correlations for presentation (Borenstein et al., 2009).
Screening for Outliers
Screening for outliers among effect sizes was accomplished through reviewing effect sizes’ z scores. Scores were dropped when they were more than three standard deviations from the mean for outcomes in a given analysis (i.e., either all outcomes or employment outcomes only). Outliers for demographic variables used in weighted regressions were similarly removed; in all cases, these were single observations for which the proportion of students on the variable was 100% and all other observations were 50% or less.
Analysis of Effect Sizes
For all analyses, the sample rather than the study was used as the unit of analysis; thus, k values (i.e., number of discrete effects contributing to an estimate) represented discrete samples, not studies. In cases where multiple studies drew from a particular sample (most notably, studies using the NLTS sample), this resulted in multiple studies being represented by one effect. Comprehensive Meta-analysis, Version 2 (Borenstein, Hedges Higgins, & Rothstein, 2005) was used to calculate the overall effect size and its significance, assess possible publication bias associated with the overall effect using Rosenthal’s (1991) fail-safe N, test homogeneity, and evaluate the significance of moderators. For the overall test of effect size as well as each moderator test, alpha was set at p < .05. Although setting alpha at this level raised some concern related to alpha inflation, typical adjustments (e.g., Bonferroni) would have resulted in overly conservative, underpowered tests, not appropriate for a broad study considering many types of potential moderators. Where a given result exceeded more stringent standards for significance (p < .01, .001), this is noted in tables and text to assist readers preferring to interpret the data more conservatively. Both analysis of variance (ANOVA) and weighted regression approaches used mixed effects models.
ANOVA moderator analyses
In this approach, an average effect size for all effects is first calculated, with each effect weighted by its associated sample size. The homogeneity of effects contributing to the overall estimate is then tested. If significant, the implication is that studies vary more than expected by chance, and thus, that effects cannot be regarded as representing a single population effect size. In order to explain this variability, the ANOVA method analyzes categories of studies as defined by selected study characteristics (i.e., moderators) using Q statistics, which are indices of variability of averages conforming to the chi-square distribution. The Qb statistic represents the between categories effect. A significant Qb indicates that variability associated with the moderator is greater than would be expected by chance, meaning that the moderator meaningfully differentiates effect sizes. Qw represents the within categories effect. A significant Qw indicates that variability among effects exceeds expectations due to chance, necessitating further subdivision of studies to achieve homogeneity. ANOVA analyses were performed to test moderation by outcome, predictor variables (i.e., Test, Mazzotti, et al., 2009, predictors and Taxonomy categories), and design characteristics. Moderation by outcome was also tested within each Test, Mazzotti, et al. predictor and Taxonomy category, to assess whether particular predictors or categories showed differing associations across outcomes.
Weighted regressions with demographic variables
Demographic variables were scaled continuously using percentages, including percentages of females, three commonly reported ethnic groups (i.e., African American, Hispanic/Latino, minority ethnicity), and three commonly reported disability categories (i.e., learning, intellectual, and emotional disabilities). Moderation by these percentages was tested through weighted regression, which unlike the ANOVA approach can accommodate continuous variables (Borenstein et al., 2009). To limit the number of regressions conducted, moderation was tested for employment outcomes only (as opposed to for all outcomes or other types of outcomes). This decision was made based on the fact that differences have previously been found in predicting employment by gender, ethnicity, and disability (Hasnain & Balcazar, 2009; Wehmeyer & Schwartz, 2001; Wittenburg & Maag, 2002); conversely, prior research examining demographic variation in predicting other types of outcomes for students with disabilities is relatively rare.
Tests were also limited to the two most frequent predictor categories in the sample of studies (i.e., Student Development and Vocational Education), both of which were represented by more than 10 effects (for the subset of studies meeting quality indicator criteria as well as studies overall), in order to ensure a sufficiently high number of studies for a credible estimate of relationships involving continuous predictors. 2 For each significant predictor, two Q statistics were calculated, including a Qr, representing variability associated with the regression slope, and a Qe, indicating the residual variability. Beta weights for predictors were also generated, representing the change in effect size corresponding to a 1% change in participants in a given demographic group.
Examining associations with study quality
Relationships between adherence to Thompson et al. (2005) quality criteria and effect sizes were considered in two ways. First, quality criteria adherence was used as a moderator, in order to evaluate whether effects differed among higher versus lower quality studies as defined by adherence to all criteria. Because of the small number of studies overall and the relatively small proportion of studies in the low-quality category, it was impractical to examine whether effects for specific predictors or categories also varied by quality. Instead, a second set of analyses was performed including studies in the high-quality category only, allowing for determination of whether findings for specific variables differed in the absence of low-quality studies.
Approach to Effects’ Nonindependence
A shifting unit of analysis approach was used where multiple effects were drawn from the same samples (Cooper, 1998). In this approach, each sample contributes exactly one effect per category per moderator test. Thus, for the Taxonomy moderator, each sample was allowed to contribute a maximum of five effects, with one effect allowed for each of the five taxonomy categories assessed in studies using the sample. In cases in which multiple effect sizes were available per moderator category for a sample, these were averaged to yield one effect size. This approach avoids the problem of weighting certain samples more heavily based on the number of effects they contribute to an analysis. It also avoids inflation of effect sizes due to nonindependence of multiple effects from the same study.
Describing Effect Sizes
In characterizing effect sizes, Cohen’s thresholds for small, medium, and large coefficients were used (rs of .1, .3, and .5, respectively; Cohen, 1988). Note that this approach is conservative and some have suggested setting lower medium and large thresholds (e.g., r = .24 and .37; McGrath & Meyer, 2006), but the r = .1, .3, and .5 thresholds are most common. Cohen (1988) warned that these thresholds were rules of thumb and that their specific interpretation depends on features of the contexts in which they are used, including base rates (i.e., prior probabilities) of outcomes, and the relative benefit of increments in outcome rates (e.g., relative value of dollars saved, employment obtained, etc., and the level of difficulty in influencing these impacts through other means). To illustrate the practical significance of the .1, .3, and .5 thresholds in the context of correlational postsecondary outcome research, consider the following: the presence of a dichotomous predictor of employment (e.g., Vocational Education experience) with an effect of .1, given a base employment rate of 60% in a nonexposed group youth with disabilities (the approximate base rate of postsecondary employment among young adults with disabilities generally; Newman et al., 2011), predicts a conditional 68% rate; with the same base rate of 60%, .3 and .5 effects predict rates of 82% and 92% (see Schechtman, 2002). In large-scale interventions (e.g., a statewide initiative to introduce new transition supports in high schools), an increase of roughly 10 percentage points associated with an r = .1 (small) effect could be quite meaningful; in others (reforms at the school level) a corresponding difference in outcomes might be undetectable or not worth the resources involved.
Results
Overall Effect Size and Homogeneity Analyses
The overall Pearson correlation between predictors and post-school outcomes was r = .19, with a 95% confidence interval of r = .12 to r = .25, a small, significant effect at the p < .001 level. The homogeneity test was also significant, χ2(26) = 312.45, p < .001, indicating that effects varied more than would be expected by chance and that moderators should be assessed. Rosenthal’s (1991) fail-safe N was calculated to assess the threat of publication bias to the inference that overall effects differ from zero. This test assesses the so-called file drawer effect (i.e., the tendency for nonsignificant results to remain unpublished) by determining the number of unpublished studies containing null results (i.e., r = .00) necessary to bring the observed level of statistical significance down to a nonsignificant level (i.e., a p value of greater than .05). In the present study, this test indicated that approximately 1,598 unpublished studies with an effect size of r = .00 would be required to render nonsignificant the overall relationship between in-school predictors and post-school outcomes. Thus, although small, it appears that the overall association of predictors and outcomes in the literature reviewed is highly robust against the possibility of nonsignificance due to unpublished studies.
Moderator Analyses
Tests of homogeneity for each set of moderator categories are first described. Where tests of homogeneity were significant (i.e., the studies varied significantly), selected comparisons of pairs of categories within the overall set are also presented in the text. Two sets of these overall and pairwise tests were conducted, the first for the complete set of studies and the second limited to studies meeting Thompson et al. (2005) quality indicator criteria. Few differences were found in point estimates for all studies versus quality indicator adhering studies; in a few cases, tests of homogeneity (i.e., Qb and Qw statistics) were significant among studies meeting quality indicator criteria versus the complete set of studies or vice versa, with neither pattern predominating. Where these differences in Qb and Qw statistics were found, results are presented both for all studies and for studies meeting Thompson et al. criteria. 3
Moderation by Outcome
Table 2 shows results of the test of moderation by outcome and estimates by outcome type. Though effects by outcome did not differ significantly from one another, some outcomes (employment and education) showed significant relationships with predictors and others (independent living and productivity) did not. Educational effects appeared slightly larger than employment effects, falling in the small to moderate range (i.e., the upper limit of its 95% confidence interval fell above r = .3) as opposed to effects in the small range for employment (i.e., with a confidence interval upper limit below r = .25); however, a pairwise test comparing education and employment effects was nonsignificant, χ2(1) = 1.55, ns. Effects of independent living and productivity were nonsignificant.
Moderation by outcomes
Note. CI = confidence interval.
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower quality studies versus those meeting quality indicator criteria.
p < .001.
Moderation by Test, Mazzotti, et al. (2009) Predictors
Some of the Test, Mazzotti, et al. (2009) predictors were associated with a small number of samples (i.e., fewer than 5), and point estimates for these would be expected to yield less stable estimates (Hedges & Olkin, 1985). Consequently, homogeneity tests for the Test, Mazzotti, et al. predictors were performed only for high-frequency predictors (i.e., those examined in more than 5 samples). As displayed in Table 3, these effects did not vary significantly. Estimates for four of five—all save Interagency Collaboration—were significant, falling in the small (r = .1 to .3) range. Three of these predictors, however, Interagency Collaboration, Transition Programs, Vocational Education, varied in their effects by outcome. Variation in effects of Interagency Collaboration is described in the next section (as Interagency Collaboration is one of the Taxonomy predictor categories and contributed to a significant heterogeneity test for that moderator).
Moderation by Test, Mazzotti, et al. (2009) high- and low-frequency predictors
Note. NA = Qw not applicable because category consists of a single study. CI = confidence interval.
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower quality studies versus those meeting quality indicator criteria.
“High-frequency” categories were those with five or more associated effects.
Test, Mazzotti, et al. and Taxonomy Interagency Collaboration categories are identical.
Tests of homogeneity were not conducted for low-frequency categories (i.e., those with <5 effects) due to inadequate N.
Test, Mazzotti, et al. and Taxonomy Parent Involvement categories are identical.
Some findings in this column differed for studies overall versus those meeting quality indicator criteria. Statistics for studies overall and the subset are presented on the left and right, respectively.
p < .05. **p < .01. ***p < .001.
Table 4 shows homogeneity tests assessing variability across remaining high-frequency Test, Mazzotti, et al. predictors, including the significantly varying Transition Programs and Vocational Education predictors as well as those that did not significantly vary across outcomes. Transition Programs were associated with better education and productivity but not with employment outcomes. Effects of Vocational Education varied only among studies meeting quality indicator criteria; for these studies, Vocational Education effects were associated with postsecondary employment, but not education and productivity. Neither Vocational Education nor Transition Programs were related to independent living. Only one of the low frequency Test, Mazzotti et al. predictors, Career Awareness, had a significant estimate, though few findings were expected for these predictors, given the low power associated with their associated significance tests.
Moderation by outcome for Test, Mazzotti, et al. predictors
Note. NA = not applicable because category consists of a single study (for Qw test) or no studies (for other statistics). CI = confidence interval.
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower quality studies versus those meeting quality indicator criteria.
Some findings in this column differed for studies overall versus those meeting quality indicator criteria. Statistics for studies overall and the subset are presented on the left and right, respectively.
p < .05. **p < .01. ***p < .001.
Moderation by Taxonomy Categories
Table 5 shows moderator analyses for Taxonomy categories. Taxonomy categories significantly moderated effects, both among studies and among those meeting quality criteria. Estimates were significant for four of five categories, including Student-focused Planning, Student Development, Parent Involvement, and Program Characteristics. The largest effects, in the medium to large range, were found for Student-focused Planning, followed by effects for Parent Involvement, which were in the small to medium range. Effects of Student Development and Program Characteristics were in the small range. In pairwise tests comparing pooled effects from these categories to the larger Student-Focused Planning and Parent Involvement effects, respectively, Student Development and Program Characteristics effects were smaller than Student-focused Planning, χ2(1) = 12.01, p < .01; but not Parent Involvement, χ2(1) = 1.14, ns. Effects of Interagency Collaboration were not significant. Table 6 displays variability in effects of Taxonomy predictors by outcome for categories containing more than five studies and showing varying significance across outcome categories (i.e., Student Development and Interagency Collaboration). As shown in Table 6, despite overall nonsignificance, the effects of Interagency Collaboration varied, such that collaboration was positively associated with education outcomes and unassociated with productivity, independent living, and employment outcomes.
Moderation by Kohler Taxonomy category
Note. CI = confidence interval.
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower quality studies versus those meeting quality indicator criteria.
Test, Mazzotti, et al. and Kohler Interagency Collaboration categories are identical.
p < .01. ***p < .001.
Moderation by outcome for high-frequency Kohler et al. Taxonomy categories a
Note. CI = confidence interval.
“High frequency” operationalized as those categories containing five or more effects; Program Characteristics not included due to nonsignificant variation across outcome categories (see Results).
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower quality studies versus those meeting quality indicator criteria.
Some findings in this column differed for studies overall versus those meeting quality indicator criteria. Statistics for studies overall and the subset are presented on the left and right, respectively.
p < .05. **p < .01. ***p < .001.
Moderation by Design Characteristics
Table 7 shows findings for study design moderators. Some design moderator variables were not tested, including when predictors were measured (i.e., by age and grade), and institutional setting (i.e., school vs. other), as these data were reported too infrequently and inconsistently to permit homogeneity tests. Of moderators tested, only length of follow-up was significant in both the total set of studies and the subset meeting quality indicator criteria. A substantial difference was found between effects assessed less than 2 years from high school exit versus those assessed more than 2 years from exit, with significant, small to medium effects found for the shorter follow-up time period, and nonsignificant effects found for the longer follow-up time period. Other moderators were nonsignficant, including geographic setting (i.e., urban vs. rural), adherence to Thompson et al. (2005) criteria, and analysis type (i.e., bivariate or multivariate with exploratory or theoretically determined covariates).
Moderation by design characteristics
Note. CI = confidence interval.
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower-quality studies versus those meeting quality indicator criteria.
Some findings in this column differed for studies overall versus those meeting quality indicator criteria. Statistics for studies overall and the subset are presented on the left and right, respectively.
p < .05. **p < .01. ***p < .001.
Demographic Moderators
Weighted regressions assessed moderation by demographic variables of prediction of employment by the two most frequently coded predictors: Vocational Education (the most frequent Test, Mazzotti, et al., 2009, predictor) and Student Development (the most frequently coded Taxonomy category). Demographic variables used included gender (% female), minority ethnicity (% students of color), as well as the most commonly reported specific ethnic categories (% African American and Hispanic/Latino students). Results of these regressions are shown in Table 8 (for gender and disability status) and Table 9 (for minority or specific ethnic status). Other demographic variables coded, including high versus low incidence disability status, age, and socioeconomic status, were omitted from analyses due to lack of adequately specific detail related to these variables in most studies. In addition, only the most commonly reported disability statuses (% with intellectual disability, emotional disturbances, or learning disabilities) could be examined, as very few studies considered less common groups in sufficient numbers to permit informative analyses.
Meta-regression of employment outcomes on percentages of gender or disability
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower quality studies versus those meeting quality indicator criteria.
Significance omitted for intercepts to improve readability.
Some findings in this column differed for studies overall versus those meeting quality indicator criteria. Statistics for studies overall and the subset are presented on the left and right, respectively.
p < .05. **p < .01. ***p < .001.
Meta-regression of employment outcomes on percentages of minority status or ethnicity
k is the number of samples, the unit of analysis (rather than studies); multiple ks are provided where numbers differed for samples from lower quality studies versus those meeting quality indicator criteria.
Significance omitted for intercepts to improve readability.
Some findings in this column differed for studies overall versus those meeting quality indicator criteria. Statistics for studies overall and the subset are presented on the left and right, respectively.
p < .05. **p < .01.
Some associations with demographic variables were shown, but many of these findings were nonsignificant, inconsistent (i.e., across analyses for total set of studies vs. for those meeting quality indicator criteria), or both, somewhat limiting confidence in their interpretation. Generally, findings suggested that for one of the predictors considered (i.e., either Student Development or Vocational Education) female gender and minority ethnicity were associated with smaller effects, and Learning Disability and Hispanic/Latino ethnicity with larger effects. More specifically, Student Development showed an inverse association with female gender (i.e., studies with higher proportions of females showed smaller Student Development effects). Student Development effects were positively associated proportions of Hispanic/Latino students, with significance in this case found only among studies adhering to quality indicator criteria. For Vocational Education, both in the total set of studies and among those adhering to quality indicator criteria, higher percentages of students with learning disability were associated with better outcomes, whereas higher percentages of minority students were associated with poorer outcomes. All other tests of demographic moderators were nonsignificant.
Discussion
The present study expanded on the Test, Mazzotti, et al. (2009) systematic review by more precisely assessing the strength of the Test, Mazzotti, et al. predictors as well as an additional widely used classification scheme, the Taxonomy for best practices in transition services proposed by Kohler (1993). The study also examined whether associations of predictors with outcomes differ reliably from one another, and whether prediction differs across the specific outcomes of employment, education, productivity, and independent living. Furthermore, the study addressed whether other study characteristics including design and demographic variables influence these associations. Through these analyses, it was anticipated that improved guidance could be provided to the field on the generalizability of what works in the correlational literature across specific predictors, Taxonomy categories, outcomes, contexts, research designs, and populations. An additional area of interest was the extent to which coordination of interventions among formal and informal supports (e.g., planning interventions, parent involvement) might predict postsecondary outcomes.
Implications of Associations With Post-school Success Overall and by Outcome
Overall, associations between in school predictors and post-school outcomes were reliable but small, in most cases falling below r = .3. In predicting specific outcomes, effects were significant for education and employment, with slightly (but nonsignificantly) larger effects found for education. Effects for productivity and independent living outcomes were nonsignificant, but given the fact that they were less frequently studied, robust associations of predictors with these outcomes were not expected. Poorer prediction of independent living, which served as a “catch all” outcome category in the current analyses, may also have stemmed from its vague definition relative to other outcomes. Overall, these findings suggest that existing correlational studies provide meaningful guidance on prediction of postsecondary education and employment, but no reliable findings for other outcomes.
Implications for the Test, Mazzotti et al., 2009, Predictors and Kohler Taxonomy
It was anticipated that Test, Mazzotti, et al. (2009) predictors and Taxonomy categories would meaningfully distinguish effect sizes, such that distinct estimates would be found for different predictors and categories. Furthermore, it was hypothesized that Test, Mazzotti, et al. and Taxonomy categories focusing on collaboration among stakeholders in school and other settings (e.g., Student-focused Planning, Interagency Collaboration, and Parent Involvement) would have larger estimates than those that did not (e.g., Student Development and Program Characteristics). Findings partly followed these expectations, with evidence (in the form of a significant heterogeneity test) provided for the descriptiveness of Taxonomy categories, but not the Test, Mazzotti, et al. predictors. Furthermore, some evidence was provided of larger effects for predictors related to multistakeholder collaboration, though confidence for some of these findings is somewhat limited due to the low number of studies examining the collaboration predictors.
These findings suggest that at least for understanding global differences in prediction, irrespective of outcome, the broader categories used in the Taxonomy provide a more effective means of classification for the field than the Test, Mazzotti, et al. predictors, at least at this stage in the field’s development. With further accumulation of research, more specific classification systems such as the Test et al. predictors may prove more useful, though particular Test, Mazzotti, et al. predictors associated with larger numbers of studies (e.g., Vocational Education) clearly still have a role to play in characterizing findings of the field.
Implications of Outcome-Specific Relationships
Test, Mazzotti, et al. (2009) predictors and Taxonomy categories also showed outcome-specific relationships, some expected, some not. Some evidence was provided (i.e., in analyses limited to studies meeting quality indicator criteria, rather than for the complete study set) that as anticipated, Vocational Education related more strongly with employment than with other outcomes. Unexpectedly, Transition Programs and Interagency Collaboration predicted education outcomes, but were unrelated to employment outcomes. These differing associations with education versus employment should be further studied to determine the processes involved.
More generally, findings of differing relationships by outcome suggest that strategies to improve education should not be expected to consistently improve employment outcomes and vice versa. It is even possible that strategies improving one of these outcomes may undermine the other. Although research on tradeoffs between work and education among students with disabilities is limited, research in other populations indicates that work responsibilities or high work motivation may sometimes disrupt school activities (Monahan, Lee, & Steinberg, 2011); similarly, it is possible that educational participation may reduce work involvement in some cases, at least over the short-term (e.g., among students enrolled in full-time postsecondary coursework). Future research on postsecondary transition should more consistently include relationships between predictors and multiple outcomes to enable a better understanding of whether prediction is outcome-specific or in opposed directions for one outcome versus another. Furthermore, it may be helpful to more frequently examine productivity (i.e., work or educational involvement) to assess if predictors result in preference for one outcome over another, versus actually interfering with a competing outcome (e.g., if the predictor were associated school outcomes, but without reducing likelihood of work for students out of school).
For Taxonomy categories related to multistakeholder collaboration, findings supported past impressions (e.g., Haber et al., 2008; Landmark et al., 2010) that such efforts might have strong effects, despite being understudied. Some of the largest estimates in the meta-analysis were obtained for these categories, but in very few (i.e., one or two) samples. In contrast, prediction by the more commonly studied specific interventions falling in the Student Development and Program Structure categories was comparatively weak. This relative weakness occurred despite the fact that many of the Student Development effects were short-term outcomes like social skills and academic achievement (though assessed in school and thus still defined as predictors in our research). The relatively stronger effects related to multistakeholder collaboration and weaker effects of other categories suggest a possible need for a shift in emphasis in the secondary transition field. It appears that more studies of multistakeholder collaboration are needed, as are efforts to better address the unique challenges associated with such research (see Foster-Fishman & Behrens, 2007).
Implications of Moderation by Design
Several design-related features related to study rigor were expected to attenuate effect sizes, including the use of theoretically determined (vs. exploratory or bivariate) covariates, long-term follow-up (i.e., 2 or more years post-school), and adherence to quality criteria for correlational research adapted from Thompson et al. (2005). One of these expectations was supported by findings—that lengthier follow-up periods would be associated with smaller effects. Among the 10 studies with longer follow-up, the average predictor–outcome relationship was small and nonsignificant. This finding raises serious questions about practical implications of existing correlational literature. Predictors that are unrelated to outcomes assessed at least 2 years from exit would seem to have limited appeal as foci for secondary transition intervention, especially given the fact that transition outcomes tend to be unstable through the mid-20s among youth, including youth with disabilities (Cooksey & Rindfuss, 2001; Wittenburg & Maag, 2002).
Adherence to quality criteria was not significantly associated with effect sizes, nor did it appear to increase precision of other moderator analyses (i.e., such that more findings emerged when studies not adhering to criteria were excluded). The lack of either significant moderation or any discernable pattern in findings for studies meeting quality criteria versus not raises questions regarding their utility in synthesizing correlational research. Determination of study quality is notoriously difficult and varied, and some recent research has suggested that use of summary indicators for study quality may be counterproductive, increasing uncertainty of estimates and actually worsening bias in some cases (Ahn & Becker, 2011). Current results are certainly consistent with such findings. Given that the source of the selected quality criteria for the present study was a widely used, often cited article published in a well-respected journal, it would seem that implications of their lack of utility extend beyond meta-analytic methodology, and should be considered in any efforts to establish generic quality indicators for a broad field of research. Confidence in quality indicators proposed for wide application in evaluating studies would be increased through empirical evidence, including findings of good psychometric properties. Development of such an evidence base would be a useful contribution to future efforts to synthesize educational research.
In defense of the utility of the Thompson et al. (2005) criteria, despite null findings of the present research, several points should be made. First, because these criteria were used as a set, with studies said to meet criteria only in cases in which all were met, studies meeting only some criteria (and thus showing at least some degree of rigor according to Thompson et al., 2005) were combined with those meeting none. This heterogeneity among studies not meeting quality indicator criteria may have contributed to explaining the lack of significant moderation by the quality indicator variable. Furthermore, it may not be realistic to expect all methodological weaknesses to contribute to bias in the same manner. For example, studies violating Thompson et al.’s criteria by inappropriately using univariate techniques may inflate effect sizes, consistent with our expectations; however, the same studies might also use measures with poor or unestablished psychometric properties, contributing to noise and attenuating effect sizes.
Consistent with our recommendation for subjecting indicators to further study, Ahn and Becker (2011) as well as Rubin (1992) have suggested avoiding use of either sets or composites and instead examining the influence of single quality indicators separately as a remedy to the problem. This approach was not feasible given the limited literature included in this meta-analysis, but may be in the near future with continued growth of the literature base. In particular, further work on measurement of postsecondary functioning would be helpful, as the bulk of studies to date continue to use simple, categorical indicators of postsecondary outcomes (e.g., point in time presence or absence of employment) that may poorly capture the unstable, dynamic nature of such outcomes in the early adult period (Institute of Medicine & National Research Council, 2014). Such work would allow for informative analyses accounting for measurement artifacts commonly found in meta-analytic syntheses (Hunter & Schmidt, 2004).
Questions raised by the length of follow-up finding as well as the relatively small effect sizes found by the meta-analysis underscore the limits of current research on predictors of post-school outcomes. More rigorous, theoretically guided correlational designs with well-designed samples and long-term follow-up assessments (e.g., such as the NLTS2 study; Newman et al., 2011) are clearly needed. A possible additional area for improvement would be increasing specificity of predictors investigated. Unlike the literature on evidence-based practices, much of the extant correlational research examines relatively broad, ill-defined interventions (e.g., Vocational Education), a characteristic that would be expected to weaken effects. Finally, it appears that there is a need for further rigorous, theory-informed research on short-term outcomes (e.g., social skills, self-advocacy), as relationships between these and postsecondary outcomes were surprisingly small (below .2). A variety of issues may be worth examining in this regard, including weaknesses or inconsistency in operational definitions, lack of rigorous measurement tools, and deficiencies in the theoretical bases for selection of these outcomes.
Implications of Moderation by Demographic Characteristics
Several associations with demographic variables were shown, though some of these findings were not significant or inconsistent. Female gender was associated with weaker effects for Student Development among studies meeting quality criteria. Weaker findings for Student Development in samples with higher numbers of female students may reflect dominance in some Student Development programs of stereotypically masculine vocational foci (Shaffer & Shevitz, 2001).
Findings for ethnicity were somewhat complex. Hispanic/Latino ethnicity was associated with stronger effects for Student Development. In contrast, though unrelated to Student Development, minority ethnicity was significantly associated with weaker effects for a specific type of Student Development, Vocational Education. These seemingly contrasting associations of Hispanic/Latino ethnicity (with Student Development outcomes) versus minority status (with Vocational Education) may appear counterintuitive. One possible explanation is that in studies where ethnicity was specifically identified, most minority students were African Americans, not Latinos. Thus, any differences in effects of Hispanic/Latino versus minority status may represent differences between a Hispanic/Latino subsample and a mostly African American overall sample of minority students (despite the fact that the subset of studies identifying African American students did not show the negative effect shown for minority status, likely due to the smaller sample of studies identifying students as being African American vs. belonging to any minority group).
Poorer employment rates are found among ethnic minority youth generally, especially African American youth, and thus any negative effects of minority status may not be specific to those with disabilities (Koyanagi & Alfano, 2013). Regardless of whether poorer outcomes for ethnic minority groups stem from the intersection of disability and minority status or minority status alone, they are concerning. Special or tailored efforts may be needed for some ethnic minorities—especially African Americans—to overcome systemic discrimination or other factors that may impede success in these groups. The contrasting, positive relationship between proportions of Hispanic/Latino students in samples and Student Development outcomes is consistent with the higher employment rate found in the Hispanic/Latino versus African American workforce generally. Note, however, that employed Hispanic/Latinos also show higher rates of part-time employment and lower average wages than employed African American workers (U.S. Department of Labor, 2011), and Hispanic youth show higher rates of high school dropout (Driscoll, 1999). Clearly, a full understanding of the influence of Hispanic minority status on prediction of transition outcomes awaits expansion of the literature, including further studies that include substantial numbers of Hispanic/Latino students and consider diverse types of employment outcomes.
Limitations and Directions for Future Research
Several limitations of the study should be noted. First, many coded characteristics could not be included in analyses due to infrequent or inconsistent reporting. These issues are not weaknesses of our research per se, but a function of gaps in the existing literature. One benefit of conducting systematic reviews and meta-analyses is identifying such gaps and in so doing, creating standards for subsequent research. Several specific difficulties with inadequate representativeness and gaps in reporting are notable. First, possible differences among effects by institutional setting (e.g., conventional school, alternative school, or other) could not be examined due to the fact that the vast majority of studies involved students in conventional, mainstream school settings. Institutional setting context is often an important influence on the impact of educational interventions (Lehr, Tan, & Ysseldyke, 2009); thus, future research should strive to sample students from environments other than traditional schools in adequate numbers to assess the impact of these nontraditional settings on secondary transition outcomes.
Key demographic data were also often absent. Surprisingly, specific information on ages and grades at which students were initially assessed (i.e., ns or percentages for each year of age or grade in school) were often not included in reports, which made it impossible to analyze effects by age. Typically, only overall age ranges were reported. A cursory inspection of these ranges shows that they are not specific enough to be meaningful. For example, approximately two thirds of studies failed to report age data entirely, stated that students were “all ages,” or used a range of 14 to 25 years or broader. The fact that few studies included data on specific ethnic group membership (e.g., % of African American and Hispanic/Latino students) limited power to detect ethnic group membership effects and resulted in a somewhat complex pattern of findings for these variables.
To help build the knowledge base on issues specific to common groups among students with disabilities such as students of African American, Hispanic/Latino, and Asian ethnicities, correlational studies clearly need to more consistently provide data on proportions of students in these groups. Although we intended to examine moderation by disability status frequency, this could not be accomplished, because of (a) the general lack of any specification of disability in studies beyond IDEA disability category and (b) the scarcity of studies involving groups that could be clearly characterized as low incidence (e.g., students with Visual Impairments). Clearly, disability categories can be quite broad (e.g., mild, moderate, and severe intellectual disabilities), and greater precision in characterizing meaningful subgroups within such broad categories would be desirable. Students in certain low incidence groups may have very different needs than those with more commonly encountered disabilities. A strength of the NLTS studies is their oversampling of low incidence groups (Cameto, Wagner, Newman, Blackorby, & Javitz, 2000). This strength should be more fully exploited through research examining differences in prediction associated with membership in these low-incidence groups.
Finally, although the present meta-analysis informs the field in a variety of ways, a meta-analysis of experimental literature would further improve the existing knowledge base. Currently, sufficient literature may exist to conduct a meta-analysis of trials with short-term follow-up (i.e., assessing outcomes prior to school exit). Such a meta-analysis would provide an ideal complement to the current review, improving on the prior qualitative systematic review of the experimental literature by Test, Fowler, et al. (2009) in similar ways as the current study improves on the Test, Mazzotti, et al. (2009) systematic review of correlational research. In addition, once sufficient experimental studies with long-term follow-up are available, a meta-analysis of these studies would provide still more definitive evidence on what works and how.
Implications for Practice
Given the current absence of a well-developed literature on supporting secondary transition (i.e., using experimental or quasi-experimental designs; Cobb et al., 2013), the correlational literature provides the best available guidance for selecting promising interventions and short-term targets for interventions. The current review was intended to increase confidence in applying prior summaries of what works from this literature (i.e., in particular, the Test, Mazzotti, et al., 2009, article), using meta-analytic methods to
Despite this important caveat, findings showed positive relationships between predictors and outcomes in almost all cases, of meaningful (albeit small) magnitude, including positive effects for some of the most widely studied interventions (e.g., vocational education and specialized transition programs) and characteristics of programs (e.g., use of inclusive classrooms and experiences such as paid work). In addition, relatively large effect sizes were obtained for certain understudied areas not emphasized in the Test, Mazzotti, et al. (2009) review, including those captured by the Taxonomy categories of Student-focused Planning and Parent Involvement, and (to a lesser extent, for education outcomes) Interagency Collaboration, all of which explicitly focus on promoting collaboration among stakeholders in school, home, and elsewhere (e.g., vocational rehabilitation, mental health services). More attention should be paid to these or other means of enhancing connections between such stakeholders in different contexts or mesosystems (Bronfenbrenner, 1979), both by practitioners and researchers.
Another important finding for practitioners was differences in effects by outcome. Predictors or Taxonomy categories that predicted education outcomes did not consistently predict employment. This result suggests educators should target interventions to specific post-school goals of youth. The weaker effects of some predictors on employment among minority students suggest that interventions may need to be tailored to these special populations. Overall, these and other findings support examining questions beyond what works, so that practitioners select specific interventions or intervention foci that fit the needs of the postsecondary outcomes, populations, setting characteristics, and so being addressed, rather than recommending interventions as evidence-based in general fashion.
Conclusion
The present study provides a stronger foundation for inferring what works (or might work) from the correlational literature as well as the limitations of existing evidence on these influences. Findings demonstrate hazards of relying on single studies and underscore the need to apply meta-analytic techniques to interpret the rapidly expanding research base on correlational predictors. Furthermore, they highlight issues needing to be addressed to improve correlational research, including problems with study designs and reporting. Findings also suggest that educators should pay special attention to factors involving relationships among supports within and outside of the school (mesosystems; Bronfenbrenner, 1979) and assess appropriateness of interventions for specific post-school outcomes and particular student populations.
Footnotes
Notes
Authors
MASON G. HABER is an assistant professor of psychology and contributes to doctoral programs in community and clinical health psychology at the University of North Carolina at Charlotte (UNC–Charlotte), 9201 University City Blvd., 4040 Colvard, Charlotte, NC 28223; e-mail:
VALERIE L. MAZZOTTI is a research associate and technical assistance provider for the National Post-School Outcomes Center (NPSO) at the University of Oregon. While a PhD student at UNC–Charlotte, she worked with the National Transition Technical Assistance Center (NSTTAC), where she contributed to several reviews of secondary transition evidence-based practices and predictors of post-school success for youth with disabilities. Her research interests include self-determination, evidence-based practices and predictors of post-school success, interagency collaboration, and secondary transition for students with high incidence disabilities.
APRIL L. MUSTIAN is an assistant professor of special education at Illinois State University and an external evaluator for a State Personnel Development Grant for the Illinois State Board of Education. Her research interests include academic and behavioral interventions for students with and at risk for mild to moderate disabilities (with specific emphasis on students with emotional disturbance), and disproportionality and culturally responsive behavioral strategies.
DAWN A. ROWE is a research associate in the College of Education and the project coordinator for the National Post-School Outcomes Center at the University of Oregon. Her duties include providing technical assistance to states and local education agencies, research-based product development and dissemination, event coordination, and project coordination. Her current research interests include parent and family involvement and college and career readiness for students with disabilities.
AUDREY L. BARTHOLOMEW is an assistant professor in the Education Department at the University of New England. She is the leader for the inclusion concentration in the online masters of education program and also teaches face-to-face courses in the undergraduate program. Her research interests include aligning academic and secondary transition skills instruction for students with disabilities and effective online instruction in teacher training programs.
DAVID W. TEST is a professor of special education at the University of North Carolina at Charlotte, where he teaches courses in single-subject research, transition, classroom management, and professional writing. His publications have focused on self-determination, transition, community-based training, and supported employment. He cowrote the first transition methods textbook, titled Transition Methods for Youth with Disabilities. He currently serves as a co–project director of the National Secondary Transition Technical Assistance Center, and codirector of the North Carolina Indicator 14 Post-School Outcomes Project and the IES CIRCLES project. He also serves as coeditor of Career Development and Transition for Exceptional Individuals.
CATHERINE H. FOWLER is the project coordinator for the National Secondary Transition Technical Assistance Center, located in the College of Education, Special Education and Child Development at the University of North Carolina at Charlotte, where she develops research-based products and provides technical assistance to state and local education agencies. Her current research interests include self-determination, integrating instruction of academic content and transition-focused skills, and professional development and technical assistance strategies in schools.
