Abstract
The number of trials aimed at evaluating treatments for autism spectrum disorder has been increasing progressively. However, it is not clear which outcome measures should be used to assess their efficacy, especially for treatments which target core symptoms. The present review aimed to provide a comprehensive overview regarding the outcome measures used in clinical trials for people with autism spectrum disorder. We systematically searched the Web of KnowledgeSM database between 1980 and 2016 to identify published controlled trials investigating the efficacy of interventions in autism spectrum disorder. We included 406 trials in the final database, from which a total of 327 outcome measures were identified. Only seven scales were used in more than 5% of the studies, among which only three measured core symptoms (Autism Diagnostic Observation Schedule, Childhood Autism Rating Scale, and Social Responsiveness Scale). Of note, 69% of the tools were used in the literature only once. Our systematic review has shown that the evaluation of efficacy in intervention trials for autism spectrum disorder relies on heterogeneous and often non-specific tools for this condition. The fragmentation of tools may significantly hamper the comparisons between studies and thus the discovery of effective treatments for autism spectrum disorder. Greater consensus regarding the choice of these measures should be reached.
Introduction
Autism spectrum disorder (ASD) comprises a group of lifelong neurodevelopmental conditions that, according to recent estimates, affect 1.5% of the population in developed countries (Baxter et al., 2015). Since its first descriptions, ASD prevalence has been increasing (Xu, Strathearn, Liu, & Bao, 2018) and one explanation has been the development and use in diagnostic practice of common and validated diagnostic tools shared by researchers and clinicians worldwide (Hansen, Schendel, & Parner, 2015). Nevertheless, the focus of research on accurate and early diagnosis has not been accompanied by corresponding progress in the development and validation of standardized outcome assessment tools (Bolte & Diehl, 2013). This discrepancy persists, despite the usage of assessment tools being an important topic of debate for many years (Burgess & Gutstein, 2007; Campbell, Kafantaris, Malone, Kowalik, & Locascio, 1991; Kasari, 2002; Lord et al., 2005; Susan, Angela, Caitlin, Brenna, & Saray, 2014).
The growing range of interventions for ASD has resulted in the concomitant development and proliferation of a scattered variety of instruments to assess changes in symptoms or abilities after intervention trials. This fragmentation complicates key processes, such as comparing results between ASD intervention studies (Magiati, Moss, Yates, Charman, & Howlin, 2011) and drawing conclusions regarding the effectiveness of different treatments in this population (Pijl et al., 2018). In mental health research, the primary outcome measure frequently evaluates changes in the severity of core symptoms of the condition. However, there is no consensus about specific measures for monitoring core changes in ASD (Brugha, Doos, Tempier, Einfeld, & Howlin, 2015) or for measuring response to interventions (Magiati et al., 2011). Also, given the heterogeneity of ASD and the frequent co-occurrence of associated symptoms, the correct definition of the target population is crucial. Furthermore, it is important to evaluate objective measure of the effectiveness of treatment outcomes that go beyond core symptoms, such as changes in problem behaviours, adaptive skills, quality of life, global functioning and psychiatric comorbidities (Matson et al., 2013). At present, the outcome measures commonly used within this field of research are often unrelated to the main focus of treatment, are non-ASD specific, or rely on measures developed for individuals without intellectual or developmental disabilities (Brugha et al., 2015).
The need of the scientific community to address this lack of consensus led to the establishment, in 2011, of a panel focusing on the choice of outcome tools in ASD clinical trials (Anagnostou, 2018), which subsequently produced recommendations regarding measures for social communication (Anagnostou et al., 2015), repetitive behaviours (Scahill et al., 2015) and anxiety (Lecavalier et al., 2014). Nonetheless, the authors reported several limitations for all of the proposed instruments (Anagnostou, 2018). Analogously, the assessment of quality of life has been progressively gaining attention in this field, but a pertinent measure, specifically developed for people with ASD, has only been validated recently (McConachie et al., 2018).
In their review, McConachie et al. (2015) found 131 different instruments employed as measure of effect in young children with autism, concluding that this large set of tools are used inconsistently, have differing relevance and variable or no evidence of their psychometric properties. Another study, reviewing intervention trials from 2001 to 2010 (Bolte & Diehl, 2013), identified 289 unique measurement tools: 61.6% of them were used only once and 20.8% were designed ad hoc by the investigators. Brugha et al. (2015) conducted a systematic review of 30 studies, focusing on assessment of core symptoms in interventions aimed at adults with ASD, and showed that the evaluation of outcomes frequently involved the use of non-standardized tools, while scales designed specifically for ASD or instruments focusing on core ASD deficits were used very little. However, even if existing reviews have paid specific attention to core symptomatology (Bolte & Diehl, 2013; Brugha et al., 2015; McConachie et al., 2015), the publication period of included study was in general limited. In addition, earlier reviews have not considered ASD across the lifespan, from childhood and adolescence through adulthood, focusing instead on more restricted age groups.
Therefore, our goal was to conduct a systematic, exhaustive and up-to-date review of the tools used to measure outcomes after intervention trials for people with ASD, with a specific focus on core symptoms, that will aid the selection of appropriate instruments in future trials. In order to circumvent the methodological limitations of previous reviews, no limitations to the age of the sample were set, different interventions targeting the same outcome were included, and the search was extended to 1980, corresponding to the publication of Diagnostic and Statistical Manual of Mental Disorders (3rd ed., Rev.; DSM-III-R), when autism was introduced as a distinct diagnostic category. Our primary objective was to better classify the outcome tools used since 1980 for quantifying changes in a wide array of ASD-related domains, especially core symptoms, in two key areas: (a) appropriateness of the tool to the targeted domain and (b) frequency of use of the tool. Second, we intended to investigate how the use of instruments changed over time and which scales were most used according to the characteristic of the study (such as its design and the length of follow-up).
Methods
Selection procedures and data collection
Search strategy
We performed a systematic search to identify original studies investigating the clinical efficacy of controlled intervention trials in people with ASD. Four investigators (L.F.-P, U.P., M.V., and S.D.) conducted a two-step literature search. As a first step, the Web of KnowledgeSM database (including Web of Science, MEDLINE®, KCI – Korean Journal Database, Russian Science Citation Index and SciELO Citation Index) was searched. The search was run in January 2017 and extended from 1980 (corresponding to the publication of DSM-III-R, where Autism was introduced as a distinct diagnostic category) to December 2016, limited to English language and human studies only. The electronic search adopted the following terms: ((autis* OR (developm* AND disorder) OR Asperger OR Kanner OR ASD OR PDD)) AND ((RCT OR trial OR observational OR open label OR prospective OR longitudinal OR randomized OR cohort)) (see Supplementary Material for details). Records were imported into reference management software, de-duplicated, and the title and abstract screened by two investigators. Studies meeting inclusion criteria were then assessed further for eligibility by full text reading by two authors, and any doubt was resolved through consensus. The second step involved the implementation of a manual search of the reference lists of the retrieved articles and relevant reviews on the topic to identify additional studies not identified in the electronic search. To achieve a high standard of reporting, we followed the PRISMA guidelines and compiled the PRISMA checklist (Liberati et al., 2009).
Selection criteria
Studies meeting the following criteria were included: (a) original articles, published in peer-reviewed scientific journals, written in English; (b) included subjects with a pervasive developmental disorder (PDD) or ASD diagnosis; (c) randomized controlled trials (RCT) or observational longitudinal studies comparing at least two different interventions directed to people with ASD or one treatment versus placebo; (d) studies which reported clinical outcome.
The following studies were excluded: (a) reviews, meta-analyses, case reports, congress abstracts, and articles in languages other than English; (b) studies with retrospective observational design or longitudinal observational design without a comparison group; (c) studies investigating the effect of an indirect treatments (e.g. interventions directed towards parents or teachers); and (d) studies failing to report at least one behavioural outcome measure (e.g. biomarkers and imaging were not considered). We excluded indirect interventions in order to examine only the outcome measures directly related to symptoms and functioning of people on the autism spectrum. Given these objectives, we did not exclude paper with overlapping data sets, to retain the largest number of outcome assessment tools.
Data extraction
Study selection and data extraction were performed independently by two investigators. Cases of inconsistency and disagreement were double-checked and discussed with a third author (M.R.). The following variables were recorded on a standardized pro-forma: author, year of publication, study location, study design (randomized or non-randomized trial), length of follow-up, type of intervention (pharmacological, psychotherapy, educational, nutraceutical and other), sample size, sample age, gender (female proportion), psychiatric comorbidities other than intellectual disability (excluded, acknowledged and unclear), diagnosis, intelligence quotient (IQ) assessment, aim/target of the study (core symptoms, problem behaviours, adaptive functioning, cognitive functioning and medical and psychiatric comorbidities), assessment tools, and whether or not they were validated. We evaluated the matching between the primary aim of the study and the primary outcome measure. The aim was coded ‘unclear’ when the article did not clearly report a specific target symptom or domain for the intervention, or when the aims where too numerous or vaguely defined.
An outcome assessment tool (e.g. ‘coding of videos of adult-child interaction’) was considered ad hoc if: (a) a specific methodology that could be clearly identifiable in previous publications was not reported; (b) a modified or adapted version of a standardized assessment tool was used; and (c) the tool was defined in the original study as an ad hoc or investigator-designed tool. A detailed description of the data extraction has been reported in Supplementary Material.
Appraisal of methodological quality
Assessment of study quality was performed independently by two investigators using the Cochrane Risk of Bias Tool (Higgins et al., 2011). Discrepancies were solved after consultation with a third reviewer (M.R.). We considered the following domains: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting and other sources of bias. According to Cochrane’s tool, each domain could be judged to be at ‘low risk’, ‘unclear risk’ or ‘high risk of bias’ (Higgins et al., 2011).
Statistical analysis
Trends in outcome assessment strategy were analysed descriptively, to identify the most used instruments (>5% of the studies). This was performed both on the overall database and according the explicit aim of each single study. Furthermore, we performed exploratory analyses by correlating the most frequently used outcome assessment tools with several clinical and methodological variables (use over time, for specific aims/targets, clinical populations or methodological designs). Considering the exploratory nature of this statistic, after checking for assumption violation, we adopted Fisher’s exact testing and non-parametric Spearman correlation without correcting for multiple comparisons. Specifically, for each outcome assessment measure used in more than 5% of the studies, we correlated the use of the tool with methodological and clinical variables, analysing the studies published after the publication of the assessment tool. We correlated the publication year with the proportion of studies published in the same year adopting a specific outcome measure. Using Fisher’s exact test, we tested the association between the use of the most frequently adopted outcome measures and the length of follow-up (divided as ⩽12 weeks and >12 weeks), and study design. Analyses were performed using SPSS version 24.
Results
The search yielded 100,635 items, from which 3486 full-text publications were selected for detailed screening, and additional 77 studies were retrieved from manual search. In total, 406 trials from 402 publications were included in the final database. The selection process is reported in the PRISMA flow diagram appended as Supplementary Material.
We included 354 (77.2%) RCT and 52 (12.8%) non-randomized trials. The active intervention was educational in 137 (34%) studies, pharmacological in 132 (33%) studies, nutraceutical in 50 (12%) studies and psychotherapy in 30 (7%) studies. Other forms of interventions were adopted in 57 (14%) studies. Acknowledging potential overlapping data sets, our database included 17,240 participants, of which 11,246 were assigned to the active treatment. Sample sizes ranged from 4 to 308 individuals. On average, each study included 17.7% female participants (range 0%–51%, unclear in 30 studies). In total, 315 (77.6%) studies recruited only children and 19 (4.7%) only adults, while 39 (9.6%) studies included both children and adults (unclear in 33 studies; 8.1%). Psychiatric comorbidities (excluding intellectual disability (ID)) were excluded in 56 (13.8%) studies, acknowledged in 52 (12.8%) studies and unclear in 298 (73.4%) studies. IQ characteristics of the sample were not specified in 227 (55.9%) studies, samples with ID only were recruited in 25 (6.2%) studies, while 81 (20.0%) studies included only individuals without ID. In 73 (18.0%) studies, the recruited population was mixed, including both ASD people with and without ID. Follow-up duration varied from a single administration (1 day) to 208 weeks (mean follow-up of 17.4 weeks). Most of the included studies were conducted in the United States (221; 54% of the studies). A complete list of the studies included can be found in Table S1 in Supplementary Material.
Quality of the included studies
A summary of the risk of bias of RCT studies is presented in Figure 1. Only 11 (3%) studies were rated as good quality, 107 (30%) scored as fair and 235 (66%) had poor quality of reporting according with the methodology reported in (Penson, Krishnaswami, Jules, & McPheeters, 2013). The complete list of RCT risk of bias evaluation can be found in Table S3 in Supplementary Material.

Quality assessment of the studies’ methods according to the Cochrane Risk of Bias Tool.
Summary of outcome assessment tools
A total of 475 tools were used across all 406 trials to assess clinical outcome. Of these, 247 (52.0%) were validated and 53 (11.2%) were at least previously cited in the literature, while 148 (31.2%) were ad hoc measures. Information about validation was obtained by searching for articles which examined validation of the tool or analysing the tool’s manual for use. Excluding ad hoc measures, each study included a mean of 3.2 outcome assessment tool (median 3, range from 0 (only ad hoc measures) to 13).
Only eight validated tools were used for outcome assessment in more than 5% of the included studies: Social Responsiveness Scale (SRS) (Constantino et al., 2003) in 56 (13.8%), Childhood Autism Rating Scale (CARS) (Schopler, Reichler, & Renner, 1988) in 34 (8.4%) and Autism Diagnostic Observation Schedule (ADOS; Lord et al., 1989) in 32 (7.9%) for core symptoms. None of these three tools have been specifically designed for outcome evaluation. The Aberrant Behaviour Checklist (ABC), mainly investigating problem behaviours (Aman, Singh, Stewart, & Field, 1985), was used in 97 (23.9% of the studies); problem behaviours were also assessed with Conners Rating Scale (CRS) (Conners, 1969) in 28 (6.9%) studies. Global Clinical Impression Scale was used in the ‘improvement’ and ‘severity’ version (Beneke & Rasmus, 1992) in 79 (9.5%) and 40 (9.9%) of the studies, respectively. Adaptive behaviours were also assessed as outcome measures using the Vineland Adaptive Behaviour Scales (VABS) (Sparrow, Balla, & Cicchetti, 1984) in 63 (15.5%) of the trials.
In pharmacological trials, the ABC was the most used tool, employed in 66 (50%) studies, followed by Clinical Global Impression–Improvement (CGI-I) and Clinical Global Impression–Severity (CGI-S) used in 53 (40.2%) and 26 (19.7%) of the studies, respectively. Focusing on educational intervention three tools were used in more than 10% of the trials: VABS in 36 (26.3%); SRS in 23 (16.8%); and ADOS in 17 (12.4%). A complete list of the retrieved outcome measures has been reported in Table S2 in Supplementary Material.
Primary aims and outcome measures
The primary aims of the retrieved studies have been categorized according to the intended effect of the intervention and are reported in Figure 2. Sixty-five clinical controlled trials (CCTs) reported more than one primary aim. The matching between the primary aim of the study and the primary outcome measure was rated as ‘unclear’ in 123 CCTs, with no matching in three studies. The most common type of interventions directed towards core symptoms A, core symptoms B and problem behaviours are summarized in Figure 3.

Aims/targets of the retrieved studies.

Interventions targeting core symptom domains and problem behaviours.
The outcome assessment tools which were most frequently used (>5% and used in more than three studies) are reported in Table 1, according to the primary aim. A significant consensus emerged within the 28 studies investigating the effect of intervention on adaptive functioning: the VABS was used in 60.7% (17) of the studies. The evaluation of the treatment effect on cognitive functioning was extremely heterogeneous with none of the tools used in more than three studies. Of note, the most targeted psychiatric comorbidity was anxiety, for which the ADIS (Anxiety Disorders Interview Schedule for Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV)) was the most frequently used measure in 9 out of 37 studies (24.3%). Only three CCTs specifically aimed at improving quality of life, each of which used different instruments.
Assessment tools used to assess effect of intervention in clinical controlled trials (CCTs).
SRS: Social Responsiveness Scale; CGI: Clinical Global Impression; ABC: Aberrant Behaviour Checklist; ADOS: Autism Diagnostic Observation Schedule; VABS: Vineland Adaptive Behaviour Scales; YBOCS: Yale-Brown Obsessive-Compulsive Scale; ATEC: Autism Treatment Evaluation Checklist; CARS: Childhood Autism Rating Scale; SSRS: Social Skills Rating System; CPRS: Children’s Psychiatric Rating Scale.
Trends in outcome assessment for trials in ASD
Only two scales reported a significant Spearman’s correlation with time: SRS has been increasingly adopted as an outcome assessment tools since 2004 (ρ = 0.840, p < 0.001), while the use of CRS has decreased since 1982 (ρ = –0.454, p = 0.008). The use of the ABC (p = 0.497), CGI-I (p = 0.477), VABS (p = 0.789), CGI-S (p = 0.686), CARS (p = 0.806) and ADOS (p = 0.699) was more constant over time, when the number of studies using the tools is adjusted for the number of studies published each year since their first application as an outcome assessment tool.
Dividing the studies included by length of follow-up, a significant association was found for ADOS (p = 0.002), VABS and ABC (p < 0.001), CRS (p = 0.026) and CGI-I (p = 0.020). Specifically, ADOS and VABS were used more frequently than expected in studies with longer follow-up periods (zresid = 2.5 and 4.7, respectively), whereas the ABC was used more than expected in shorter follow-up periods (zresid = 2.8). Finally, several outcome assessment tools were used differently in randomized and non-randomized design (SRS, p = 0.016; VABS and ABC, p < 0.001; CRS, p = 0.036; and CGI-I and CGI-S, p = 0.005). In particular, the VABS was used more frequently than expected in studies without randomisation (zresid = 4.1), while this design is associated with a less frequent use of SRS (zresid =−2.0), ABC (zresid =−2.7) and CGI (both zresid =−2.3). Table 2 reports Fisher’s exact test significance between the length of the follow-up, the study design and the use of each of the selected outcome assessment tools.
Association between the use of outcome assessment measures, clinical and methodological variables.
ADOS: Autism Diagnostic Observation Schedule; CARS: Childhood Autism Rating Scale; SRS: Social Responsiveness Scale; VABS: Vineland Adaptive Behaviour Scales; ABC: Aberrant Behaviour Checklist; CRS: Conners’ Rating Scale; CGI-I: Clinical Global Impression–Improvement; CGI-S: Clinical Global Impression–Severity.
Bold: Significant Fisher’s test p-value; italic: significant standardized residuals for studies using the tool.
Discussion
Given the rising prevalence of ASD, developing effective treatments for this group of conditions is a priority, which has led to a marked increase in funding allocation for ASD research (US Department of Health and Human Services, 2017). Consequently, the crucial importance of adopting appropriate measures for assessing the outcomes of both novel and consolidated treatments has been highlighted (McConachie et al., 2015). Our systematic review is the largest review, so far, investigating the characteristics of assessment tools in intervention studies focusing on ASD.
Lack of consensus in ASD outcome evaluation
A key finding of this synthesis is the complexity of assessing outcomes in ASD and the difficulties in reaching a consensus in the scientific community. Among 406 studies, 475 measurement tools were used. Of these, 69% of the tools were used in the literature only once. Of note, only seven of the retrieved tools were adopted in more than 5% of the trials: ADOS (Lord et al., 1989), CARS (Schopler et al., 1988), SRS (Constantino et al., 2003), VABS (Carter et al., 1998), ABC (Aman et al., 1985), CRS (Conners, 1969), and CGI (both Severity and Improvement) (Beneke & Rasmus, 1992). This striking finding may explain the inconsistent conclusions obtained by research on ASD until now. In fact, as reported by Virues-Ortega (2010), it is quite difficult to compare results between trials if the investigators use different outcome tools, even when targeting specific symptom domains.
The use of ad hoc tools represents a significant issue, as they represented almost one-third of the total number of unique measurement tools observed in our systematic review. The use of these instruments to measure ASD symptoms could represent innovation (Bolte & Diehl, 2013), but this approach is questionable when used as the primary (or only) outcome measure, as was the case in 32 of the included studies. In fact, psychometric properties of ad hoc tools were generally not evaluated in the studies we reviewed, thus preventing the possibility of conducting reproducible studies.
Target aims, core symptoms and outcome tools
Nearly 30% of the reviewed studies did not clearly state the primary aim. This means that almost one-third of interventions did not target a clear symptom domain, or targeted multiple and vaguely defined domains. The rapid growth in ASD research, the shifting boundaries of the autism spectrum, and the stability and resistance of core symptoms to intervention (King, Navot, Bernier, & Webb, 2014) are possible reasons for why some studies adopt a more diffuse approach, where the precise aim or outcome of investigation is not defined, and measures are gathered from many different and unrelated target domains with the purpose of demonstrating a therapeutic effect.
With regard to interventions directed towards core symptoms, 38% of the studies targeted core symptoms in domain A (social communication challenges) and 10% targeted domain B core symptoms (repetitive behaviours and sensory alterations). An explanation for the limited number of studies investigating core symptoms may be related to their stability over time, particularly in the context of short follow-up periods (Bieleninik et al., 2017). Consequently, investigators may avoid selecting core symptoms as a primary outcome, as these may show relative invariability in comparison to other domains. Another potential reason might be related to publication bias: previous researchers have shown that positive or significant results are more likely to be published, and outcomes that are statistically significant have higher odds of being fully reported (Dwan et al., 2008). Of note, even though pharmacological treatments have yet to demonstrate efficacy towards core symptoms (Howes et al., 2018), 54% of the studies targeting core symptoms domain B employed a pharmacological treatment. However, the majority of the studies targeting core symptoms domain A used a psychoeducational treatment.
Of the scales most frequently used for evaluating changes from interventions targeting core symptoms, three (ADOS, CARS and SRS) were originally developed to assess core symptomatology (Lord et al., 1989) or autistic-like traits (Constantino et al., 2003), and were thus designed as diagnostic tools. Their use in evaluating outcomes is therefore somewhat controversial: for instance, the ADOS, even if used to evaluate outcomes in 13% (aim: core A) and 21% (aim: core B) of the trials targeted towards core symptoms, is not a tool designed to measure subtle improvements or behavioural changes (Lord et al., 1989). The implementation of the newer ADOS Calibrated Severity Score helped to overcome this limitation, although further research is needed to support its use in clinical trials (Shumway et al., 2012). The CARS (Schopler et al., 1988) showed excellent psychometric properties in diagnosing ASD, but has never been specifically tested for measuring outcomes. However, the fact that CARS provides a cut-off measure for both diagnosis and for ASD severity has potentially extended its use for outcome evaluation. The SRS (Constantino et al., 2003) is a valid and time-saving instrument for screening ASD in clinical settings and has been proposed also as a measure to detect changes after treatment (Payakachat, Tilford, Kovacs, & Kuhlthau, 2012). Of note, we have observed a rising trend in the use of the SRS in ASD research during the years, which may be related to its usability. In addition, in contrast with ADOS, the SRS does not require specific training before administration, and in contrast with the CARS, it could be administered across the entire lifespan.
VABS was the third most used outcome assessment tools in studies aiming at core symptoms A, although the VABS is a measure of adaptive functioning not specific to ASD, and though its use in tracking changes in core symptoms has been debated. A panel sponsored by Autism Speaks recently evaluated the Vineland-II as having adequate reliability, validity and responsiveness to quantify social and communication deficits in clinical trials of ASD (Anagnostou et al., 2015; McConachie et al., 2015).
Of interest, the trials targeting core symptoms also showed an extensive use of the CGI, which is a rapid and useful tool not only in ASD – with and without ID – but also virtually in every clinical context (especially when time for assessment is limited). Therefore, it may be a reliable tool to compare outcomes between different treatments in the same population, but also between different populations treated with the same intervention. The recommendation made by Aman et al. (2004) 14 years ago could be considered still valid at the present day: […] regardless of the objective, CGI should be a universal measure in all ASD clinical trials. Raters should use the CGI to assess all behaviour of the participants (in as many contexts as possible) so that the score is truly a reflection of the participant’s global functioning.
However, since this tool can be used to reflect changes in both core symptoms and in comorbid behaviours, its lack of specificity is the main reason why this tool should not be used as single measure of evaluation in any symptom domain, but as a helpful addition to another more specific tool. Results from recent studies provided encouraging evidence for the Brief Observation of Social Communication Change (BOSCC) (Kim, Grzadzinski, Martinez, & Lord, 2018; Pijl et al., 2018) and the Autism Behavior Inventory (ABI) (A. Bangerter et al., 2017) as candidate outcome measures, although these instruments have not yet been tested extensively and require further research.
Beyond core symptoms: evaluation of other outcome domains
It is well-known that ASD is a complex condition with an extensive spectrum of associated symptoms (Lai, Lombardo, & Baron-Cohen, 2014) and 169 studies (41%) defined as their primary aim non-core symptoms. Also, taking into consideration the multiple outcome assessments used in the wide range of studies that we included, five of the most frequently used scales evaluate associated symptoms of ASD such as problem behaviours (ABC and CRS), adaptive behaviours (VABS) or, more generally, Clinical Global Impression (CGI-I and CGI-S). These rating scales are focused on different behavioural aspects that are not ASD-specific and could be used in a wide range of conditions as in the case of the ABC (Johannes, Jonathan, & Jessica, 2010; Rojahn, Wilkins, Matson, & Boisjoli, 2010). The prominent use of these measures in ASD could reflect the importance of addressing these symptoms in targeted interventions (Hanley, Jin, Vanselow, & Hanratty, 2014). In fact, the severity of problem behaviours and the presence of poor daily living skills have a greater impact on caregivers of individuals with ASD compared to the severity of ASD core symptoms (Estes et al., 2013). A recent study (Chatham et al., 2018) critically discussed the issue of adaptive behaviour in ASD and suggested the clinical significance of minimal differences for Vineland-II in ASD, thereby enabling researchers to use this tool to assess efficacy of interventions for clinical practice.
The heterogeneous findings of our research also reflect, from a clinical perspective, the different needs of people with ASD. Core symptoms are not the only important target for autistic people and their families, who face myriad associated concerns, each with their individual and social consequences. For example, a recent systematic review estimated that depression and anxiety are highly prevalent in adults with ASD (Hollocks, Lerh, Magiati, Meiser-Stedman, & Brugha, 2018), and some treatments have been specifically designed to target these comorbidities. However, tools measuring psychiatric comorbidities were often developed for neurotypical populations, and not people suffering from ASD (e.g. Hamilton Depression Rating Scale, Beck Depression Inventory, Anxiety Disorders Interview Schedule for DSM-IV and Spence Children’s Anxiety Scale). This could represent a bias, since both ASD people and their clinicians often have difficulties in detecting and recognizing internalizing symptoms (Weisbrot, Gadow, DeVincent, & Pomeroy, 2005). Of note, after a panel promoted by Autism Speaks failed to find a sufficiently reliable instrument to measure anxiety in individuals with ASD (Lecavalier et al., 2014), two measures have been adapted and validated for this condition: the Anxiety Scale for Children with Autism Spectrum Disorder (ASC-ASD; Rodgers et al., 2016) and the ADIS (Kerns, Renno, Kendall, Wood, & Storch, 2017). In our findings, this last measure was by far the most used in clinical trials targeting anxiety in ASD.
There is growing interest regarding subjective quality of life as a crucial outcome. Quality of life has been shown to be consistently lower for people on the autism spectrum compared to typically developing peers, across the whole lifespan (Ayres et al., 2018; van Heijst & Geurts, 2015). In our systematic review, only seven trials included measures evaluating quality of life and three studies had quality of life as a primary aim. The instruments used in all the studies were not designed specifically for people with autism. Recently, McConachie et al. (2018) validated the Autism-specific Quality of Life Schedule (ASQoL) as an add-on module to the World Health Organization Quality of Life Schedule (The WHOQoL Group, 1998), to evaluate the quality of life in people on the autism spectrum. As more awareness is raising regarding quality of life and functioning in ASD, this outcome should be assessed more consistently, and the ASQoL might be integrated in clinical and research practice, especially for the evaluation of treatment effectiveness.
Limitations
To our knowledge, the present synthesis represents the most comprehensive systematic review of outcome measures used in ASD trials. However, some limitations should be discussed. Our review did not analyse single tools by comparing their availability, psychometric properties and standardization, as these issues have already been extensively explored by previous studies and reviews (Cassidy, Bradley, Bowen, Wigham, & Rodgers, 2018a, 2018b; Hanratty et al., 2015; Norris, Aman, Mazurek, Scherr, & Butter, 2019; Wigham & McConachie, 2014). Therefore, this review does not allow conclusions to be drawn on the best tools to use, but only provides an overview of their pattern of utilization. A further potential limitation of our analysis relates to the characteristics of the included studies: most studies did not report important data adequately, especially primary aims and sample characteristics (such as IQ, age and gender). Therefore, the hypotheses derived from this analysis should be considered cautiously. As ASD is defined by a spectrum of symptoms and includes a heterogeneous population, knowing specifically for which subpopulation a specific intervention is targeted may help to increase its efficacy. Moreover, this bias in data reporting partially limited the possibility to adopt an approach which was more focused on specific ASD subgroups (e.g. according to age and cognitive abilities). It appears mandatory to specify sample characteristics in future studies, in order to improve reproducibility and consistency across trials. Moreover, we only considered behavioural outcome measures, while we did not analyse biological markers of treatment response (e.g. eye tracking and neuroimaging) which are of growing interest in ASD research (Del Valle Rubido et al., 2018; McPartland, 2017). Finally, the quality of the included RCTs was generally low. Despite the inclusion of some interventions with intrinsic limitations (e.g. inability to blind psychoeducational interventions), it is important to note that only 11 studies (3%) out of 406 were considered of good quality, using previously defined thresholds (Penson et al., 2013).
Conclusion
Our review highlights a critical point in the assessment of treatment efficacy in trials for individuals with ASD. The choice of outcome measures remains highly heterogeneous, making synthesis and generalization a challenge. Overall, it is likely that several factors contribute to the inconsistency among the choice of measurement tools, among them the difficulty of choosing specific target domains, as underlined by the significant number of studies lacking a clear primary aim.
Our findings also suggest that, given the evolving and still unclear phenomenology of ASD, there may be no single tool capable of detecting and assessing changes in symptoms and behaviours. Therefore, in the absence of an overarching consensus regarding reliable standardized measures, the use of multiple instruments (together with the CGI) could be recommended. Conversely, the efficacy of treatments on some important domains which are frequently impaired in autistic people (e.g. quality of life) demand further investigation.
Supplemental Material
AUT854641_Lay_Abstract – Supplemental material for What are we targeting when we treat autism spectrum disorder? A systematic review of 406 clinical trials
Supplemental material, AUT854641_Lay_Abstract for What are we targeting when we treat autism spectrum disorder? A systematic review of 406 clinical trials by Umberto Provenzani, Laura Fusar-Poli, Natascia Brondino, Stefano Damiani, Marco Vercesi, Nicholas Meyer, Matteo Rocchetti and Pierluigi Politi in Autism
Supplemental Material
AUT854641_Supplemental_material – Supplemental material for What are we targeting when we treat autism spectrum disorder? A systematic review of 406 clinical trials
Supplemental material, AUT854641_Supplemental_material for What are we targeting when we treat autism spectrum disorder? A systematic review of 406 clinical trials by Umberto Provenzani, Laura Fusar-Poli, Natascia Brondino, Stefano Damiani, Marco Vercesi, Nicholas Meyer, Matteo Rocchetti and Pierluigi Politi in Autism
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
