Abstract
In this systematic evaluation of intervention research for transition-age autistic youth, we examined quality indicators in 193 group and single-case design intervention studies, which tested effects on 1258 outcomes. Behaviorally based interventions were the most common intervention type. We found significant threats to internal validity for the majority of studies, including inadequate randomization, unmasked assessors, and too few data points to infer functional relations. The majority of outcomes were measured in contexts similar to the intervention and were conceptualized as behaviors directly addressed by intervention procedures. As such, they are of unclear long-term utility for autistic people entering adulthood. Adverse events were rarely reported. We suggest several avenues for improving intervention research for this age group.
Lay abstract
In this study, we assess the quality of intervention research that focuses on autistic youth who are 14–22 years old. We found 193 different studies on this topic, and carefully reviewed them. Most of these studies tested strategies that were behavioral. This means that they used procedures like prompting and rewards to change participants’ behavior. We found that the majority of studies had problems that make it hard to determine whether or not the intervention worked. The problems related to how researchers designed their studies, and how they measured the study outcomes. We also found that researchers rarely tried to find out if the strategies they studied had unintended negative effects for participants. Because of these issues, we make suggestions for how researchers might design better studies that will let people know how well the strategies worked.
Introduction
In many countries, including the United States, transition-age students enrolled in special education are entitled to school-based services geared toward preparing them for post-school contexts, such as employment, higher education, and community living (Individuals with Disabilities Education Act, 2004, section 300.43). Transition supports should be designed in tandem with services to support academic and social development that are provided throughout preschool–12. Despite the mandated provision of these services, post-high school outcomes for autistic 1 youth tend to be poor. In the most recent nationally representative survey of post-school outcomes for disabled youth in the United States, autistic youth were reported to be less likely to be enrolled in post-secondary education, employed, and living independently as compared to the average graduate across all disability categories (Newman et al., 2011; Test et al., 2020). Autistic adults also report lower self-esteem (Cooper et al., 2017), and are at elevated likelihood for mental health disabilities such as depression and anxiety, as compared to non-autistic adults (Croen et al., 2015).
Much research and policy relevant to autistic youth has focused on individual-level factors to ensure successful transition to adulthood, such as promoting skills acquisition. However, transition outcomes remain suboptimal for this population, and this is likely due to at least two structural-level factors. The first is the lack of availability of post-school services designed for autistic youth that would facilitate initiating and maintaining access to meaningful adult opportunities following school exit (Anderson & Butt, 2018). The abrupt shift from receiving intensive, individualized supports in preschool–12 contexts, to navigating difficult to access, scarce, and one-size-fits-all post-school supports, is perceived by both autistic youth and their caregivers as a significant barrier to wellbeing in adulthood (Sosnowy et al., 2018).
The second structural issue, which is the focus of this review, is that there may be little high-quality research available to inform educators and other service providers regarding the types of support they should provide to transition-age autistic youth. Fewer funding dollars are spent, and fewer research reports are published, on interventions designed for transition-age autistic youth as compared to interventions designed for younger autistic children (Cervantes et al., 2021; Hume et al., 2021). This could be because researchers and funders consider early childhood to be a “critical period” when intervention services are most likely to be effective, despite a lack of evidence for this claim (Sandbank, Bottema-Beutel, & Woynaroski, 2021). In addition, previous syntheses that address research quality suggest autism intervention research in general lacks rigor (Davis et al., 2019; Gates et al., 2017; Sandbank et al., 2020), and this could also be true for research on transition-age autistic youth. Poor quality research exacerbates our ability to train school professionals and other providers to implement appropriate transition services because it is difficult to discern which services are worth implementing. In this study, we investigate this issue by evaluating all available research (including group and single-case designs (SCDs)) on interventions designed for transition-age autistic youth, which can inform services that focus on the transition to adulthood.
Evaluating risks of bias
Several design elements are required for intervention studies to be internally valid, which refers to the extent to which causal claims can be made regarding the relationship between the intervention and the outcome. Or, in SCD logic, internal validity refers to the extent to which one can infer a functional relation between the independent variable (the intervention) and dependent variable (the outcome). Threats to internal validity or “risks of bias” stem from features of the study design that result in estimations of effectiveness deviating from the true effect (Boutron et al., 2022). Below, we describe risks of bias relevant to both group design and SCD studies.
Group design studies
The Cochrane Collaboration provides a risk of bias tool that has been widely used in research syntheses of group design intervention studies (Higgins et al., 2021). This tool divides risks of bias into five domains. Selection bias consists of two sub-domains; sequence generation is the adequacy of procedures to randomly assign participants to intervention and control groups, whereas allocation concealment is the adequacy of procedures for ensuring that participants and investigators are unable to foresee assignment to intervention and control groups prior to the moment of assignment. Performance bias is the extent to which the participants in the intervention, including users and their families, and the personnel conducting the intervention are aware of group allocation. This type of bias is difficult to minimize in autism intervention research. This is because, unlike in most drug trials where there is a placebo group that can be administered treatment in a manner similar to the treatment group, most intervention trials use a business-as-usual control group. Therefore, the only group interfacing with the study team to receive intervention is the treatment group, which reveals group membership to both the users and the research team. Detection bias refers to the extent to which outcome assessors, including examiners conducting standardized assessments, coders who score performance on assessments, and informants (such as parents) who complete questionnaires, remain unaware of group assignment. Reporting bias refers to the completeness of the outcome data; that is, the extent to which group-level data can be influenced by significant numbers of participants withdrawing from the study, which threatens group equivalence. Selective reporting refers to whether all of the outcomes that were assessed as part of the study plan are included in published reports, or if outcomes were strategically excluded if the intervention failed to show an effect on that outcome.
A source of bias that is somewhat field-specific to autism research is correlated measurement error (i.e. measurement error that inflates scores in the treatment relative to the control group) related to an assessor receiving some aspect of the intervention (Sandbank et al., 2020). This is a source of bias that occurs when someone involved in the assessment procedures, usually a parent or teacher, also received training from the study team. These assessors are of course subject to detection bias, but they also may have learned techniques as part of the intervention for interacting with the participant. Because of this, the participant’s scores on a given assessment may improve because the assessor has changed their behavior in such a way as to reveal capacities the child had all along, but were not elicited by the assessor. This improvement in the informant occurs only in the treatment group and not the control group, resulting in an additional source of bias.
SCD studies
Recently, Reichow and colleagues (2018) developed a tool for evaluating risks of bias in SCDs. While not yet in wide use, there have been at least a few systematic reviews that have used it to examine SCDs for interventions relevant to autism research (e.g. Biggs & Robison, 2022; Davis et al., 2019). This tool is meant to be an analog to the Cochrane guidelines, but with the domains of bias adjusted to be relevant to SCD procedures. For this tool, selection bias is again divided into two types: sequence generation, which refers to whether there is some element of randomization in moving participants between baseline and treatment conditions; and participant selection, which refers to whether the participants were determined to be eligible for the intervention based on a systematic assessment of need for the intervention. Performance bias refers to whether procedural fidelity (implementing the intervention according to planned procedures) was appropriately monitored and achieved. Detection bias is divided into four types. First, masked assessment is defined similarly to the same concept in group design studies, but reflects the fact that the vast majority of SCDs use observational measures that occur within the intervention context. Therefore, significant efforts need to be extended to ensure that assessors are unable to distinguish between baseline (or withdrawal) and intervention conditions. Second, selective outcome reporting refers to whether all participants who began the study provide outcome data for each phase. Third, dependent variable reliability refers to whether researchers report adequate inter-coder reliability for observational assessment procedures, to ensure that coding categories are adequately defined, coders were adequately trained to implement the coding procedures, and the coding procedures are replicable. Fourth, data sampling refers to whether the amount of data, and the design by which data were collected, is sufficient to determine a functional relation between the intervention and the outcome. For most SCDs, this means that there were at least three outcome data points per phase, and three opportunities to demonstrate an effect (i.e. shifts between the baseline and intervention conditions). There is an additional design aspect not included in Reichow and colleagues’ tool, but is nevertheless important for discerning internal validity: uncontrolled baselines (Kennedy, 2005). This refers to whether the baseline condition allows for the participant to show evidence of the outcome variable. An example of an uncontrolled baseline would be a study in which the outcome variable is measured by tallying interactions with peers, but there are no peers in proximity to the participant in the baseline condition.
Adverse event reporting
In addition to evaluating risks of bias to discern the internal validity of intervention studies, systematic review guidelines such as those established by the Cochrane Collaboration recommend reporting on whether primary studies monitored adverse events, and describing the frequency and nature of these events (Preyer et al., 2019). This allows researchers, practitioners, and end users to weigh potential positive effects of the intervention against potential negative effects. Adverse event reporting involves researchers actively monitoring for events considered unfavorable or harmful that occur before, during, or after implementation of the intervention regardless of whether it can be easily determined that the event was caused by the intervention (Preyer et al., 2019). However, at least in autism intervention research for young children, adverse event monitoring and reporting is so scarce as to be practically nonexistent. Bottema-Beutel and colleagues (2021a) examined 150 studies on interventions for young autistic children to determine whether adverse events were reported, and found that only 11 of these studies made any mention of adverse events, and none provided any insight into how adverse events were monitored. This likely means that researchers used passive monitoring, where adverse events are noted only if the participant or caregiver mentions it to the researcher, rather than as part of systematic procedures to determine whether such events occur. Therefore, even in the very few studies that do report adverse events, they are likely undercounted.
Failing to consider adverse events in the context of autism intervention research is problematic for at least two reasons. First, practitioners and policymakers are left without sufficient information to weigh potential benefits with potential harms when recommending particular types of interventions. Second, ignoring the potential for adverse events conveys that harming autistic people is not of general concern for researchers or providers. That is, it perpetuates an ideology that being autistic is such a negative state of affairs in and of itself that autistic people are considered “unharmable” (Dawson & Fletcher-Watson, 2022). For these reasons, there have been calls to vastly improve adverse event monitoring and reporting in autism research (Bottema-Beutel et al., 2021a; Fletcher-Watson et al., 2021; Lord et al., 2022).
Evaluating outcomes according to the scope of change
Recently, autism researchers have begun to pay closer attention to the scope of change indexed by a given intervention outcome (Biggs & Robison, 2022; Carruthers et al., 2020; Sandbank et al., 2020; Yoder et al., 2013), which reflects guidance for evaluating intervention outcomes more generally (e.g. What Works Clearinghouse, n.d.). Sandbank, Chow, Bottema-Beutel, and Woynaroski (2021) provide a tutorial for classifying intervention outcomes along two continua; the boundedness of the outcome to the intervention context and the proximity of the outcome to the intervention targets. An outcome is context-bound if the measurement context is very similar to the intervention context; it is a generalized characteristic if the two contexts differ on multiple dimensions (e.g. the interaction partner, the materials used, and the setting). Outcomes that are generalized characteristics provide evidence that the outcome has been incorporated into the child’s behavioral repertoire or developmental profile, while context-bound outcomes provide no evidence that what has been learned in the intervention will manifest outside the intervention context.
An outcome is proximal to the intervention if it indexes a construct directly targeted by the intervention procedures (e.g. measuring instances where the participant says “hello” during or after an intervention focused on offering greetings), and distal to the intervention if it measures a context developmentally beyond or broader than what was directly targeted in the intervention (e.g. measuring language development after participating in an intervention facilitating peer interaction). Distal outcomes give evidence that the intervention has tapped into a developmental pathway, suggesting that effects may influence the participant’s growth even after the intervention has stopped, while proximal outcomes provide no evidence that growth will continue to occur after cessation of the intervention (Sandbank, Chow, Bottema-Beutel, & Woynaroski, 2021; Yoder et al., 2013).
Despite the importance of distal and generalized outcomes, previous reviews of autism intervention research suggest children are more likely to show improvement on proximal outcomes than distal outcomes, and effect sizes for proximal outcomes are generally larger than effect sizes for distal outcomes. Similarly, context-bound outcomes are more likely to show improvement than generalized outcomes, and effect sizes for context-bound outcomes are generally larger than for generalized outcomes (Sandbank et al., 2020; Yoder et al., 2013). Promoting distal and generalized outcomes may be especially important for transition-age autistic youth, because these students are nearing school exit. That is, this is a particularly important age to ensure that intervention effects will outlast the provision of the intervention, given that most intervention services provided in schools will no longer be available after the transition out of school.
Study aims
At present, there are no systematic, comprehensive reviews that: (a) cover both single-case and group design studies conducted on autistic youth during the transition period (14–22 years), and (b) rigorously evaluate study and outcome quality. This information is important, because it can clarify the strength of evidence for support practices that may be in wide use for transition-age autistic youth (including those currently designated as “evidence-based practices”), offer avenues for further research, and ultimately provide insight into structural reasons as to why outcomes for autistic adults are less than favorable after high school exit. Because we anticipated significant methodological limitations in the majority of the literature in this field, we did not plan to generate summary effect sizes for either group or SCDs. Our reasoning is that doing so can invite readers to use the effect estimates as evidence of effectiveness, even if given quality concerns we were to caution against such interpretations. Specifically, our aims were to:
Provide basic descriptions of the types of interventions that have been studied for this population;
Classify outcome variables in terms of their domain, their boundedness to the intervention context, and their proximity to the intervention targets;
Provide an evaluation of key design elements using the Cochrane Collaboration Risk of Bias tool (Higgins et al., 2021) for group design studies, and for SCDs using the risk of bias tool proposed by Reichow and colleagues (2018);
Determine the extent to which adverse events were monitored and reported.
Methods
The review protocol was pre-registered on PROSPERO (registration number: 231764), an open access database for systematic reviews. Since the initial submission, an exploratory aim was added to the study, such that we would determine the extent to which adverse events were monitored and reported. In addition to pre-registration, we followed The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Page et al., 2021), and published all screening procedures, coding manuals, and coding spreadsheets with all study- and outcome-level codes on the Open Science Framework (OSF; https://osf.io/qzdru/).
Eligibility criteria
Articles met inclusion criteria if they were published (a) in a peer-reviewed journal, (b) in English, and (c) between 1970 and November 2020. In addition, the study had to (d) include an experimental design (i.e. SCD or group design study that included a control group), (e) implement a non-pharmacological intervention with transition-age autistic youth aged 14–22 years, and (f) report outcomes for autistic participants separately. If outcomes were not reported separately, then 70% or more of participants had to have a diagnosis of autism, pervasive developmental disorder not otherwise specified, or Asperger’s Syndrome. Studies that included students enrolled in a university were excluded, as these students were already post-transition; our aim was to evaluate research that could inform the provision of transition services.
Search strategy
The search for peer reviewed journal articles commenced in November 2020 and, in consultation with the university librarian, involved 11 databases including: Academic Search Complete, CINAHL Plus, Sociological Abstracts, Social Work Abstracts, Education Source, ERIC, PsycARTICLES, SocINDEX with Full Text, Web of Science, PsycINFO, and Medline. The search of the latter two databases was limited to abstracts because a specified number of citations could be exported and our search results exceeded that criteria. Articles from the remaining nine databases were searched full-text.
Databases were searched using terms that reflected the population (autis* OR ASD), age (transition* OR adolescen* OR school age OR high school OR secondary), and the provision of an intervention (intervention OR therapy OR teach* OR treat* OR program OR practice* OR strateg*). These categories were combined using the boolean operator “and.” As indicated in the eligibility criteria, the search was limited to articles published in English and between 1970 and November 2020. These criteria generated 37,724 articles.
Selection process
All articles identified from the database search were exported to RefWorks, and duplicates were removed. Next, a multi-step screening process was followed. First, 34,394 articles were imported into Abstrackr, a web-based application designed to organize and screen articles for systematic reviews (Wallace et al., 2012). Second, titles and corresponding abstracts were reviewed by one of four authors (K.B.B., S.C.P., S.M., or Q.Y.), who excluded articles that were clearly irrelevant to the study. Third, two of three authors (S.C.P. & S.M.) independently coded the full-texts of 3193 articles to determine if they met inclusion criteria, with 97% agreement. This resulted in 193 articles that were retained for coding. See Figure 1 for the PRISMA diagram illustrating article selection.

PRISMA diagram.
Coding procedure
In a first pass, studies were coded for: (a) commonly reported participant characteristics (age, mental age, gender, and scores on measures of autism characteristics), (b) study design features (group vs SCD, and SCD type), (c) intervention characteristics (setting, interventionist), (d) a description of the intervention, (e) a description of the outcome variable, (f) effect size information for group design studies, 2 and (g) study and outcome-level risks of bias specific to group designs (Higgins et al., 2021) and SCDs (Reichow et al., 2018, with some adaptations including to improve reliability such as adding examples and definitions specific to the studies in our review). In a second pass, categories for coding domains (d) and (e) were developed inductively to capture the range of intervention types and outcome domains represented in the included studies. For intervention types, we developed categories that captured the primary theory of change where possible, and grouped interventions according to similar procedures in cases where the theory of change was not discernable (see Table 1 for definitions and examples). For outcome domains, we divided them into two broad types: discrete outcomes that were conceptualized as distinct units, and non-discrete outcomes that were conceptualized as broader constructs measured by standardized instruments or rating scales (see Table 2). In a third and final pass, we coded each study for (f) adverse event reporting. This final coding domain included adverse events, which are any instances of unfavorable outcomes that occurred during the intervention period but were not necessarily caused by the intervention; adverse effects, which are any instances of unfavorable outcomes that occurred during the intervention period and could be attributed to the intervention; and harms, which are unfavorable outcomes that continue to negatively impact participants over extended periods of time after the intervention is over. In addition to coding-specific mentions of adverse events, we also coded reasons for withdrawal to determine if any could be interpreted as an adverse event or adverse effect.
Definitions and examples of primary intervention types.
Number of variables coded for each outcome domain.
IQ: intelligence quotient; IEP: individualized education program.
Reliability
Full texts of each included article were independently coded by a combination of two coders (K.B.-B., S.C., and S.Y.K. for the first two coding passes, Q.Y. and R.M. for adverse event coding) and all disagreements were resolved via discrepancy discussions. During the first round of coding, agreement ranged from 52% to 100% for all variables. Five variables did not reach the 80% agreement threshold (interventionist type, detection bias, reporting bias, correlated measurement error, and participant selection) and so were re-coded after adjustments were made to the coding manual to improve reliability. For these follow-up rounds, 20% of variables were double-coded for calculating agreement. This process continued until 80% agreement was reached for each variable, which resulted in three rounds of re-coding. After the final round of coding, agreement ranged from 81% to 100%. Any disagreements between coders were resolved through discrepancy discussions.
Community involvement statement
This study was not co-produced with autistic people, but we do believe that it reflects autistic community priorities for autism research (Pellicano et al., 2014).
Results
Basic study and participant descriptors
Of the 193 studies that met inclusion criteria (see https://osf.io/qzdru/ for a list of included studies), 181 were SCDs, (94%) and 12 were group design studies (6%). Of the group design studies, 11 were randomized controlled trials, and 1 was a quasi-experimental design. See Figure 2 for a depiction of the number of studies published by year. The first study was published in 1984, and more than half the studies were published after 2010. A total of 987 youth participated across the studies; the majority of participants were male, and the mean chronological age was 17 years. The mean intelligence quotient (IQ) of 58 (but with only 38% of studies reporting) suggests at least some participants had co-occurring intellectual disability. See Table 3 for means and ranges of the most commonly reported demographic characteristics.

Number of studies included in the review by publication year.
Characteristics of participants in included studies.
M: mean; SD: standard deviation; ABC: Autism Behavior Checklist; ADOS: Autism Diagnostic Observation Schedule; ASDS: Asperger Syndrome Diagnostic Scale; ASRS: Autism Spectrum Rating Scale; CARS: Childhood Autism Rating Scale; GADS: Gilliam Asperger’s Disorder Scale; GARS: Gillam Autism Rating Scale; SCQ: Social Communication Questionnaire; SRS: Social Responsiveness Scale.
Interventions
Frequencies of intervention types, settings, and interventionists are provided in Table 4. Behaviorally based interventions were most common, accounting for ~70% of included studies. All other intervention types, considered individually, accounted for fewer than 10% of studies. Most interventions were examined in school (63%), and the most frequent type of interventionist was some combination of researcher, educator, other professional, or computer instruction (45%).
Intervention characteristics.
In one study, authors compared two different intervention types; therefore, frequencies add to 194 total interventions.
Outcomes
Of the 1258 outcomes we extracted across the included studies, 1198 (95%) were coded as proximal to the intervention, and 60 (5%) were coded as distal to the intervention. A total 925 (74%) were coded as context-bound and 333 (26%) were coded as generalized characteristics. A total 176 (14%) were measured as non-discrete outcomes and 1082 (86%) were measured as discrete behaviors. See Table 2 for the number of outcomes by domain.
Risks of bias
We report quality indicators separately for SCDs and group designs, and separately for quality indicators that reflect overall study design versus outcome variable design given that proportions for these two groups of biases have different denominators (number of studies vs number of outcomes, respectively). See Figures 3 and 4 for aggregate scores on the risk of bias tools by Cochrane and Reichow et al. (2018).

Risks of bias for group design studies.

Risks of bias for single-case design studies.
Adverse events
Only 2 of the 193 studies mentioned adverse events (1%). Both studies reported that no adverse events had occurred. Only one of the studies provided details about how adverse events were monitored, and it was via a single rating item on a social validity questionnaire provided to the teacher immediately following the intervention (Cihak & Schrader, 2008). Harms were not mentioned in any study, and do not appear to have been monitored. Twelve studies reported reasons for withdrawal from the intervention; four reported reasons that were too vague to determine whether an adverse event had occurred (e.g. the participant stopped coming to intervention sessions), four reported reasons that included at least one adverse event but it was unclear if these events were related to the intervention (e.g. increasing behavior concerns), and four studies reported reasons that were not adverse events (e.g. the participant moved to a different town).
Discussion
Types of interventions
The prevalence of behaviorally based studies in this literature is consistent with previous reviews that include transition-age autistic youth (Hume et al., 2021). While developmentally informed intervention approaches (i.e. approaches that account for developmental level and are designed to support the achievement of developmental milestones along a specific trajectory) make up a significant portion of the research with younger autistic children (Schreibman et al., 2015) and have a stronger evidence base than other types of intervention (Sandbank et al., 2020), we did not find any examples of this approach in research on transition-age autistic youth. Some intervention categories reflect a focus on influencing developmentally relevant domains (e.g. cognitive and social cognitive interventions), and leverage natural interactions in a way similar to many developmental approaches (e.g. facilitating peer interaction) but developmental sequencing was not emphasized as a component.
The school-based nature of this research is helpful for discerning which approaches are feasible for the learning contexts readily accessible for most students in this age group. However, relatively few interventions were implemented by educators alone; most were implemented by a combination of adults (e.g. researchers and educators) or researchers alone. This means that, for many interventions, it is unclear how feasible implementation is when researchers are not present.
Risks of bias
There were some risk of bias categories where “low” risks of bias were relatively common. For example, in SCDs, more than 75% of studies reported adequate fidelity, reliability of outcome variables, and complete outcome data. In group design studies, selective reporting was also quite rare, and was rated as low for more than 80% of studies (we note, however, that this form of bias is difficult to detect in group design studies without pre-registration).
However, the prevalence of high risks of bias for many categories are concerning. Among group design studies, few authors provided details about randomization procedures (sequence generation), or how researchers were kept from discerning which groups participants were assigned to until the moment of randomization (allocation concealment). This means that participants could have been strategically allocated to a particular group, or could have been given information regarding group allocation that influenced whether they granted consent. Among SCDs, researchers very infrequently included a randomization component for determining how participants moved between phases. To illustrate why this could be problematic, take a multiple baseline across participants design. According to the logic of this design, to establish a functional relation between the intervention and the outcome, participants should start the intervention phase at staggered time points. The researcher could determine the order in which participants begin the intervention phase, or the onset of the phase shift, by selecting the participant who appears to be most amenable to participating in an intervention on the day phase shifts are set to start, or selecting a start date based on participant mood. This could happen in any number of decisions on phase shifts; that is, in the absence of a random component, phase shifts could be guided by when participants may be most likely to show immediate improvement in ways not related to participation in the intervention. While clearly not in wide practice yet, there are calls for SCD researchers to use randomization components to decrease this type of bias (Ledford & Gast, 2018).
A very small number of studies, across both group designs and SCDs, used masked assessors. Unmasked assessors increase bias for multiple reasons. First, placebo effects for unmasked assessors are well-documented. For example, Jones and colleagues (2017) showed that caregivers who believed their autistic child was participating in an intervention rated their children higher at post assessment than caregivers who were not told their child was receiving an intervention, even when no intervention was provided. Second, in many intervention designs in which assessments are completed by the research team, the researchers: (a) also developed the intervention being studied, (b) serve dual roles as a clinical provider of similar interventions, and/or (c) are students conducting a study needed for program completion (Bottema-Beutel & Crowley, 2021; Bottema-Beutel et al., 2021b; Reichow et al., 2018). These conflicts of interest provide an incentive to score participants more highly in the treatment group/intervention phase as compared to in the control group/baseline phase.
There were a few serious risks of bias specific to SCDs. For example, more than half of the outcomes measured via SCDs had too few data points per phase (i.e. < 3) and/or too few opportunities to demonstrate an effect (i.e. < 3) to infer a functional relation. In addition, in nearly one-quarter of the SCDs, the baseline phases were constructed in such a way that the participant had insufficient opportunity to show evidence of the outcome (i.e. uncontrolled baselines). A recurring example of this design issue was in studies examining task-analysis interventions with prompt hierarchies and reinforcement, where the outcome was measured as the percentage of steps in the task analysis completed by the participant. In many such studies, the participants were not provided with the steps that needed to be completed in the baseline phase, but were provided with these steps in the treatment phase. It is difficult to discern what can be learned from such designs; it comes as no surprise that people will complete more steps comprising a task when they are told the steps as compared to when they are not.
Adverse event reporting
Similar to autism intervention research for younger children (Bottema-Beutel et al., 2021a), adverse event reporting was rare in this literature. Only 1% of reports mentioned adverse events, and only a single study provided detail on how adverse events were measured, which was a single item in a teacher questionnaire at the end of the study; parents and participants were not consulted. Despite autism intervention researchers’ seeming indifference to the occurrence of adverse events, the potential for intervention procedures to result in harming the transition-age autistic youth who participate in them is a distinct possibility. Several of the interventions involved heavy machinery or appliances (e.g. weight lifting for exercise, kitchen stoves for cooking), promoted activities with high potential for sensory overload (e.g. wearing mascot costumes with full head and body gear), and targeted behavior with significant health impacts (e.g. food refusal). Many also involved strategies that have been identified as harmful by autistic people who have participated in them, such as the suppression of “stimming” via response cost or planned ignoring (Kim & Bottema-Beutel, 2019), and that have been hypothesized to cause long-term harmful effects, such as an over-focus on compliance (Dawson & Fletcher-Watson, 2022; Sandoval-Norton & Shkedy, 2019).
Outcomes
The outcomes that were examined reflect the behavioral orientation of much of this literature, with the majority conceptualized as discrete behaviors. Behaviorist theory frames behavior as being composed of discrete entities characterized by stimulus–response relationships (Baer et al., 1968; Watson & Kimble, 1998). This framing is a good match for SCD research logic, which calls for outcomes to be measured repeatedly with relatively close temporal proximity between measures, and often requires operational definitions that divide behavior into individuated, talliable units (Ledford & Gast, 2018). In addition, SCD logic requires that outcomes show improvement relatively quickly after the intervention begins. Because of these features, SCDs are less apt than group designs for examining broader, developmental changes that are distal to the intervention procedures. Distal outcomes can take time to occur and are often measured with standardized assessments that have been construct validated and cannot be re-administered multiple times in close succession. Given the prevalence of SCDs and behavioral approaches, outcomes in the included studies were almost exclusively proximal to the intervention. A drawback to discrete, proximal outcomes is that, as compared to broad, distal outcomes, they may have more limited relevance to the variety of post-school opportunities transition-age autistic youth may wish to pursue. There is no evidence that achieving exclusively proximal outcomes will lead to continued growth with broad applicability (Sandbank, Chow, Bottema-Beutel, & Woynaroski, 2021; Yoder et al., 2013).
In addition, the vast majority of SCDs are conducted such that the outcome is measured in the intervention context, while the intervention is taking place. As such, most outcome measures were context-bound. Generalization probes were sometimes conducted in other contexts, but these probes do not allow for inferring a functional relation between the outcome and the intervention, because researchers very rarely provide baseline measures of the outcome in the generalization context. Indeed, there were no examples of such measures in the included studies. Therefore, most outcomes that were coded as generalized did not permit inferring a functional relation between the outcome and the intervention. This means there is little evidence that what autistic youth learn in intervention programs will be available to them once the intervention program is over—this is especially concerning given that the intensive supports that are needed to maintain context-bound outcomes (e.g. prompting and reinforcement, stimuli specific to the context) are not likely to be available following school exit.
In terms of outcome domain, employment and functional skills were relatively common, indicating researchers are conceptualizing the transition period as marking a shift from exclusively academic and social goals to include skills that are at least in domains related to post-school contexts. However, researchers focused little attention on outcomes reflecting autonomy and choice-making; self-determination variables were examined less than 1% of the time, and only in a single study. This is surprising, given that self-determination is considered an important component of wellbeing in autistic people (Kim, 2019).
Implications
Researchers who focus on transition-age autistic youth need to design and implement high-quality intervention studies that are free from serious risks of bias, and that influence outcomes associated with lasting and meaningful change relevant to adulthood. We note that researchers have been calling for these studies for many years now (Westbrook et al., 2015), yet few examples have materialized. This could be because quality evaluations have been a relatively recent aspect of research syntheses, especially for studies on transition-age youth. Research on transition autistic youth continues to lag behind research on younger children, in terms of both volume and quality. For example, in their meta-analysis of group design interventions for autistic children between the ages of 2 and 8, Sandbank and colleagues (2020) indicated that there has been a shift from quasi-experimental to randomized designs. In addition, while not free of research quality issues, there are at least some examples of exemplary studies that meet all or most quality standards for younger children (e.g. Green et al., 2010). In this review, we were unable to find any examples of studies that included a combination of masked assessors, assessors who were not participants in the intervention, and adequate randomization procedures.
Many of the persistent quality issues do not appear to be related to cost; it is difficult to imagine how designing bias-free procedures for allocating participants between groups in studies that are already using randomization procedures, or ensuring that assessors are masked to treatment status would add insurmountable additional cost. For SCDs, randomization procedures could be added without significant added cost or overburdensome planning requirements. However, ensuring assessors are masked to group assignment or phase may require innovation for both group design studies and SCDs. New measures may need to be developed that do not rely on parent or self-report, or studies will need to be designed with active treatment control groups so that, even if informants are aware that they/their child is receiving an intervention, they are masked as to which intervention is hypothesized to be superior. To ensure assessors are masked in SCDs, significant planning and innovation will need to occur so that assessors are unable to distinguish between baseline and intervention phases (see the study by Reichow et al., 2010 for an example). Furthermore, researchers who designed or provided the intervention being studied (or who have other conflicts of interest) should not be involved in data collection or analysis.
Journal editors and funders may have significant roles to play in ensuring these positive changes happen; declining to fund or publish studies with significant risks of bias, or that report on outcome variables of unclear utility, or that omit to monitor and report adverse events may disincentivize researchers from producing them. In the meantime, recommendations about interventions for transition-age autistic youth should include appropriate caution. The serious risks of bias inherent to this literature, and the near total lack of information about adverse events should be clearly communicated to both practitioners and families.
Conclusion
In this evaluation, we identify a structural issue—low quality intervention research—that has significant implications for autistic youth transitioning from high school to adulthood. Unsatisfactory outcomes for recently graduated autistic adults could be, at least in part, because the services provided to support their transition to adulthood simply are not efficacious, and not because their disability status inherently leads to poor outcomes. Our findings may also explain negative perceptions of the services offered to transition autistic youth, from both autistic youth and parents. Autistic youth who recently graduated high school describe a lack of appropriate supports that meet their specific needs (Bottema-Beutel et al., 2020), and parents report optimism about school services when their children first enter the transition period, but find that school-based transition services do not meet their expectations over time (Kirby et al., 2020). For intervention research that uses the experimental methods we examine here, significant improvements will need to be made. These include reducing significant risks of bias, adequately assessing adverse events, and focusing on meaningful outcomes. These improvements are essential for ensuring professionals provide the support transition-age autistic students deserve.
Footnotes
Acknowledgements
We would like to thank Philip Postek for his assistance with coding.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Kristen Bottema-Beutel has previously received fees for consulting with school districts on intervention practices for autistic children and teaches courses on autism interventions in her role as an Associate Professor of Special Education. She has also accepted speaker fees in the amount of US$750 to discuss her work on research quality, adverse events, and researcher conflicts of interest as they pertain to autism intervention research. She also receives royalties for a co-edited book titled Clinical Guide to Early Interventions for Children with Autism, published by Springer. At the time of publication, the total amount of royalties received for this work was US$261. Shannon Crowley LaPoint was formerly affiliated with an entity that trained students to become Board Certified Behavior Analysts and provided early Intensive Behavioral Intervention. Rachael Mckinnon was previously employed as a special educator in a school that provided transition supports to autistic students.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was partially supported by a grant from the Michael and Susan Argyelan Foundation administered by the Lynch School of Education and Human Development at Boston College.
