Abstract
Systematic reviews have traditionally focused on internal validity, while external validity often has been overlooked. In this study, we systematically reviewed determinants of external validity in the accumulated randomized controlled trials of social skills group interventions for children and adolescents with autism spectrum disorder. We extracted data clustered into six overarching themes: source population, included population, context, treatment provider, treatment intervention, and outcome. A total of 15 eligible randomized controlled trials were identified. The eligible population was typically limited to high-functioning school-aged children with autism spectrum disorder, and the included population was predominantly male and Caucasian. Scant information about the recruitment of participants was provided, and details about treatment providers and settings were sparse. It was not evident from the trials to what extent acquired social skills were enacted in everyday life and maintained over time. We conclude that the generalizability of the accumulated evidence is unclear and that the determinants of external validity are often inadequately reported. At this point, more effectiveness-oriented randomized controlled trials of equally high internal and external validity are needed. More attention to the determinants of external validity is warranted when this new generation of randomized controlled trials are planned and reported. We provide a tentative checklist for this purpose.
Introduction
Multiple strategies have been proposed to promote social communication skills in children and adolescents with autism spectrum disorder (ASD). Among these, the concept of social skills group interventions (SSGI) has attracted the most attention in the past 20 years (Reichow et al., 2012; Williams White et al., 2007). SSGI is an umbrella term comprising a wide range of teaching methods such as social stories, video modeling, social problem solving, pivotal response training, scripting procedures, computer-based teaching, prompting procedures, and self-monitoring (Bohlander et al., 2012). A recent systematic review of SSGI evidence identified five randomized controlled trials (RCTs) on school-aged children and adolescents with high-functioning ASD (Reichow et al., 2012). The review concluded that there is some evidence that SSGI can improve social skills in individuals with ASD.
Notably, the generalizability was deemed to be unclear, as all studies were conducted in the United States, focused mainly on children aged from 7 to 12 years, and included only participants of average to high intelligence (Reichow et al., 2012). Recent reviews focusing on methodological aspects of the accumulated research on SSGI raise additional questions about generalizability, including how gains are generalized and maintained (Kaat and Lecavalier, 2014; McMahon et al., 2013). The methodological review by Kaat and Lecavalier also indicates that the determinants of external validity are often inadequately reported. For instance, they found that less than half of the identified studies had a well-characterized sample, defined as information about age and sex of participants and an independent evaluation of either intellectual functioning, or an independent confirmation of ASD diagnosis.
External validity refers to whether findings can be generalized across populations and environments. Reliable judgment about the generalizability of the accumulated body of RCTs is essential if interventions are to be successfully implemented in routine clinical practice. The external validity is sensitive to many factors, for instance, setting of the trial, selection of participants, characteristics of included participants, differences between the protocol and routine practice, choice of outcome measures, follow-up timing, providers’ characteristics and training, treatment fidelity, number of participants, and trainer ratio (Rothwell, 2006).
Traditionally, the main target of systematic reviews is on the determinants of internal validity (i.e. the quality of design, conduct, and analysis), to ensure that the results are valid for the study population and setting (Higgins and Green, 2011). External validity, however, often receives insufficient attention in primary research studies and systematic reviews (Rothwell, 2006). The need for increased attention to such issues has been explicitly highlighted for psychosocial interventions (Gardner et al., 2013; Grant et al., 2013).
In this systematic review, we expand on previous reviews by fine-mapping the determinants of external validity of RCTs on SSGI reported in primary research studies, with the purpose to bring attention to these aspects when the next generation of RCTs are planned, evaluated, and reported in the scientific literature. We do this at a point in time when the SSGI research is entering a new phase, where the key question is how the results from early efficacy oriented trials are transported into real-world clinical settings. We evaluate different aspects (source population, included population, context, treatment provider, treatment intervention, and outcome) of the clinical generalizability for individual trials and the SSGI evidence base as a whole.
Material and methods
Protocol and registration
This study was based on a systematic review conducted by the authors of this article on behalf of the Swedish Board of Health and Welfare, registered in PROSPERO (International prospective register of systematic reviews; crd.york.ac.uk/NIHR_PROSPERO/), registration no. CRD42013003780. In the commissioned report, four groups of interventions of relevance for children and adolescents with ASD were assessed: SSGI, early intensive behavioral intervention (EIBI), treatment and education of autistic and related communication handicapped children (TEACCH), and interventions involving relatives of children and adolescents with ASD. A combined search was used to address all four questions. In the present review, only the external validity pertaining to SSGI was examined.
Eligibility criteria
Participants
Children and adolescents with a diagnosis of ASD, autism, Asperger syndrome (AS), or pervasive developmental disorders-not otherwise specified (PDD-NOS) (as diagnosed using the DSM-IV, DSM-IV-TR, or ICD-10 criteria). Studies with adult populations (>18 years) were excluded.
Interventions
SSGI, defined as an intervention aimed at improving social skills characterized by participation of at least two individuals with ASD in therapy sessions led by one or more therapists.
Comparator
Any comparator.
Study design
Randomized controlled trial.
Information sources
Electronic searches were conducted using medical subject headings (MeSH) and relevant text word terms. Five databases (Medline, PubMed, PsycINFO, CINAHL, and ERIC) were searched, up to 19 December 2014.
Search strategy
The search was performed by a specialist at the Karolinska Institutet library. We used search terms relevant for the eligible interventions and the population. Search results were limited to original studies from 1990 or later, written in English, Danish, Norwegian, or Swedish. Animal studies and case studies were excluded. The updated search included only the search terms related to SSGI. For a detailed description of search terms related to SSGI, see Appendix 1, available online.
Study selection
Two reviewers independently screened the titles and abstracts of all the citations identified by the search strategy. Studies of potential relevance for SSGI were screened a second time by two reviewers. If deemed necessary at this stage, the article was obtained in full text and two reviewers independently assessed them for inclusion. Any disagreements were resolved by discussions. Reference lists and systematic reviews were screened for additional studies of relevance. The citations identified in the updated search were screened in the same manner, with the exception that the search tool in EndNote was used first in order to identify citations including in any field either the truncated term random or the term RCT.
Data collection process
From each included study, data were extracted and inserted in a table by one of the authors. A second author checked the integrity of the data. Any disagreements were resolved by discussion. If no agreement could be reached, a third author would decide. In case the information was reported or partly reported, the relevant available information was inserted in the table. If the information was missing in the report, the cell was left empty.
It was also planned that study protocols referred to in the original publications would be searched for additional information. No other sources were searched for information.
Data items
In order to systematically evaluate external validity, a list of potential determinants was compiled. Items of relevance for external validity were selected from a review of reporting guidelines by Grant et al. (2013) and completed by additional items of clinical significance derived by face validity. Included items were clustered into six overarching themes related to external validity: source population, included population, context, treatment provider, treatment intervention, and outcome (Table 1).
Checklist of information to be included in trial reports to facilitate assessment of external validity, with information provided in one of the included trial reports given as an example.
The largest included trial was selected as an example. Detailed information about all included trials in Appendix 2 (available online).
ADI-R: The Autism Diagnostic Interview–Revised; ADOS: The Autism Diagnostic Observation Schedule; ASSQ: Autism Spectrum Screening Questionnaire; NIMH IRB: National Institute of Mental Health Institutional Review Board; PEI: The Pupil Evaluation Inventory; QPQ: The Quality of Play Questionnaire; SD: standard deviation; SSRS: social skills rating system.
Planned methods of analysis
For each coded item, the coding was summarized across all included studies. In a narrative synthesis, we summarized the extracted information for the whole body of included studies.
Results
In the first screening of abstracts, 217 unique citations were identified as of potential relevance for social skills training. A total of 199 of these reports were excluded: 57 case studies, 39 non-randomized studies, 75 studies with irrelevant research question, 13 studies of another intervention, 3 studies with adult participants, 6 study protocols, and 6 reviews (see Figure 1). In total, 15 eligible RCTs (reported in 18 publications) were identified (Baghdadli et al., 2013; Beaumont and Sofronoff, 2008; Begeer et al., 2011; DeRosier et al., 2011; Frankel et al., 2010; Koenig et al., 2010; Koning et al., 2013; Laugeson et al., 2009; Lerner and Mikami, 2012; Lopata et al., 2010; Schohl et al., 2014; Solomon et al., 2004; Thomeer et al., 2012; White et al., 2013; Yoo et al., 2014). A long-term follow-up of the study by Frankel and colleagues was presented in a separate publication (Mandelberg et al., 2014). The study by Schohl et al. (2014) also included measures of parent and family outcome and electroencephalography (EEG), which were reported in separated publications (Karst et al., 2015; Van Hecke et al., 2015).

PRISMA flow-chart.
Source population
An ASD diagnosis ((AS), high-functioning Autism (HFA), or (PDD-NOS)) was a criterion for eligibility in all included trials. In most trials, the diagnosis was confirmed by standardized diagnostic tools or rating scales, such as the Autism Diagnostic Observation Schedule (ADOS) (Lord et al., 1999), Autism Diagnostic Interview–Revised (ADI-R) (Rutter et al., 2003), and Social Responsiveness Scale (SRS) (Bölte et al., 2008), but in two of the trials this was not clearly stated (Laugeson et al., 2009; Lopata et al., 2010). A minimum IQ or verbal IQ level (>60–85) was an additional criterion for inclusion in all trials except one, in which only participants with HFA were eligible (Lerner and Mikami, 2012). Two trials also used a minimum score on the Comprehensive Assessment of Spoken Language as a criterion for inclusion (Lopata et al., 2010; Thomeer et al., 2012). In one trial (White et al., 2013), only individuals with a comorbid anxiety disorder were included.
All trials except one (Thomeer et al., 2012) provided some information about the recruitment procedure, although this information was insufficient. Two of the trials used public announcements (Beaumont and Sofronoff, 2008; Lopata et al., 2010) whereas the remaining 12 recruited from academic centers, clinics, schools, or local organizations. (For details, see Appendix 2(a), available online.)
Included population
Of the 15 trials, 9 reported the number of screened and randomized participants, as well as the number of participants who completed the study (Baghdadli et al., 2013; Begeer et al., 2011; Koenig et al., 2010; Laugeson et al., 2009; Lopata et al., 2010; Schohl et al., 2014; Thomeer et al., 2012; White et al., 2013; Yoo et al., 2014), while some of this information was omitted in the remaining trials. No trial provided clear information about the number of invited individuals. No information about the participants’ expectations and preferences was included in any of the trials.
In total, 11 trials included children within an age range spanning from 6 to 13 years, while 4 trials focused mainly on adolescents (Laugeson et al., 2009; Schohl et al., 2014; White et al., 2013; Yoo et al., 2014). Only a minority of females was included in the trials and no study presented gender-specific results. Two reports included some information about psychiatric comorbidity among the included participants (Begeer et al., 2011; White et al., 2013), while the rest did not.
The sample characteristics presented varied across studies, but information about the participants’ diagnosis, ethnicity, IQ, and parental education was frequently reported. Some trials mainly included participants with a diagnosis of PDD-NOS (Begeer et al., 2011; Koenig et al., 2010), while HFA and AS were the most common diagnoses in other trials. A number of different measures of social skills were used in the trials, complicating assessment of differences in severity of the target symptom between studies. The available information suggests that the participants were predominantly Caucasian and had parents with relatively high educational level. (For details, see Appendix 2(b), available online.)
Context
One trial was conducted in Australia (Beaumont and Sofronoff, 2008), one in Canada (Koning et al., 2013), one in France (Baghdadli et al., 2013), one in the Netherlands (Begeer et al., 2011), one in South Korea (Yoo et al., 2014), and the remaining in the United States. The setting was described briefly or not at all: four trials were set at universities (Beaumont and Sofronoff, 2008; Lopata et al., 2010; Solomon et al., 2004; Thomeer et al., 2012), five in clinics, research clinics, or school (Baghdadli et al., 2013; Begeer et al., 2011; Frankel et al., 2010; Koenig et al., 2010; Schohl et al., 2014), one within family service (DeRosier et al., 2011), and five reports did not provide explicit information about setting (Koning et al., 2013; Laugeson et al., 2009; Lerner and Mikami, 2012; White et al., 2013; Yoo et al., 2014).
Incentives included a light meal and paid parking (Laugeson et al., 2009), a honorarium of US$50.00 (Koning et al., 2013), and no fee for participation (Koenig et al., 2010). External events occurring at the time when the trials were conducted were not mentioned in any of the reports. (For details, see Appendix 2(c), available online.)
Treatment provider
Generally, information about staffing was scarce. The number of trainers in each skill-training group ranged from one to four across studies. The qualifications of the trainers varied: in several studies post-graduate, graduate, or undergraduate students delivered the intervention (Beaumont and Sofronoff, 2008; Laugeson et al., 2009; Lopata et al., 2010; Schohl et al., 2014; Thomeer et al., 2012; White et al., 2013; Yoo et al., 2014), while other studies also used clinical psychologists, psychotherapists, psychiatrists, social workers, registered nurses, and speech and language pathologists. The experience of the trainers varied from considerable practical training in conducting social skills groups in ASD to less experience. In total, 10 trials reported that providers received specific training in delivering the intervention (Baghdadli et al., 2013; Begeer et al., 2011; DeRosier et al., 2011; Laugeson et al., 2009; Lerner and Mikami, 2012; Lopata et al., 2010; Schohl et al., 2014; Thomeer et al., 2012; White et al., 2013; Yoo et al., 2014), while 5 trials did not include such information. Eight trials reported that the trainers received supervision (Baghdadli et al., 2013; Begeer et al., 2011; Laugeson et al., 2009; Lerner and Mikami, 2012; Schohl et al., 2014; Thomeer et al., 2012; White et al., 2013; Yoo et al., 2014). Treatment fidelity and adherence to the treatment protocol was controlled in some way in all trials. (For details, see Appendix 2(d), available online.)
Treatment intervention
All reports explicitly referred to a treatment manual or gave relatively detailed descriptions of the interventions. The treatments varied substantially across studies and focused on several aspects of social skills, including verbal communication (e.g. tone of voice and nonliteral language), non-verbal communication (e.g. eye contact, facial expressions, posture, and gestures), and social interaction (e.g. initiation, maintaining or leaving a social interaction, conversation skills and social problem solving, conflict in relationships, coping with bullying and teasing, self-regulation, and self-monitoring controlling negative emotions). However, only a few trials gave a detailed description of the number of groups treated and the number of participants in each group (Lopata et al., 2010; Solomon et al., 2004; Thomeer et al., 2012; Yoo et al., 2014).
The duration of the interventions varied between 4 and 20 weeks. Two trials had less than 10 sessions (Beaumont and Sofronoff, 2008; Lerner and Mikami, 2012), two trials used a high intensity training with five daily treatment cycles 5 days a week for 5 weeks (Lopata et al., 2010; Thomeer et al., 2012), and the remaining trials ranged between 12 and 20 sessions. The intensity of the sessions ranged from 60 to 75 min to 2 h. (For details, see Appendix 2(e), available online.)
Outcome
A range of different measures of social skills was used across the studies, including rating scales, tests, and observations. All trials except one (Koenig et al., 2010) used more than one source of information (e.g. the child, parents, teachers, staff, or tests). Two trials used blinded observations (Koning et al., 2013; Lerner and Mikami, 2012), while an additional five trials included any form of blinded assessment (Baghdadli et al., 2013; Koenig et al., 2010; Laugeson et al., 2009; Schohl et al., 2014; White et al., 2013). Five trials reported follow-up assessments after 2–5 months (Beaumont and Sofronoff, 2008; Frankel et al., 2010; Laugeson et al., 2009; Thomeer et al., 2012; Yoo et al., 2014), while the other trials only included pre- and post-measurements.
A few studies used tools to measure symptoms of depression or anxiety (Schohl et al., 2014; Solomon et al., 2004; White et al., 2013; Yoo et al., 2014). One trial included global assessment of everyday functioning (White et al., 2013) and two trials reported change in clinical global impression (Koenig et al., 2010; White et al., 2013).
In the discussion sections of all studies, the authors refer to limitations related to external validity, such as restricted samples, no follow-up assessments, no evaluation of the transportability to other settings, and no information about generalizability of gains. For details, see Appendix 2(f), available online.
Discussion
For many years, ASD intervention research was criticized for a lack of evidence-based practice. Recently, several systematic reviews of SSGI in ASD identified a range of RCTs showing reasonable internal validity. However, the clinical applicability and implementability of these interventions has not been rigorously investigated yet. We examined information provided in RCTs of SSGI in ASD to judge the external validity of these trials. This information is pivotal to address questions raised by clinicians, clients, parents, policy makers, and other stakeholders related to implementation: To whom do the results apply? To what kind of settings do the results apply? Who can provide the treatment and how should the treatment be delivered? Are the gains generalized to the child’s everyday environment and maintained over time?
Our review suggests that the generalizability of the evaluation research within this field is unclear. Moreover, information necessary for proper assessment of generalizability is often missing in the reports. Notably, the scant information about the pathway to recruitment precluded identification of the source population from which the participants were drawn. Unless detailed information about the source population is available (e.g. the clinic’s patient mix and catchment area), general conclusions about generalizability are undermined.
A homogeneous population allows for a higher internal validity, but highly selective eligibility criteria can also considerably reduce the applicability of the trial results (Khorsan and Crawford, 2014; Rothwell, 2006). The eligibility criteria reported in the included trials mainly concerned age, IQ, aggressive behavior, and severe mental health problems. These eligibility criteria could be justified, provided that the intervention primarily is intended for high-functioning school-aged children and adolescents. Nevertheless, the study populations might still be highly selective, not representing clinical reality.
Detailed reporting of baseline characteristics of participants included in RCTs should allow clinicians to compare similarity with their actual patients in daily routine. Although baseline characteristics were described in almost all reports, the majority of trials did not take into account the high rates of comorbid conditions among persons with ASD, which is an important aspect when assessing to whom the results can be applied (Fortin et al., 2006; Matson and Cervantes, 2014; Rothwell, 2006). ASD as a heterogeneous condition with a spectrum of symptom severity and individual differences can have significant impact on the treatment effect (Bölte, 2014). Notably, some trials mainly included participants with a diagnosis of PDD-NOS. The use of PDD-NOS as a diagnostic category may vary between countries depending on diagnostic routines, traditions, and applied diagnostic systems (Elsabbagh et al., 2012; Huerta et al., 2012).
Some of the sample characteristics suggest that the samples were highly selective ASD individuals: For instance, most parents were relatively highly educated according to the available information. It is unclear to what extent the included participants were those with highly motivated and dedicated parents, as this is hardly ever assessed. Overall, there is a need for studies examining whether participant and intervention characteristics mediate/moderate treatment efficacy, as highlighted by a recent review (McMahon et al., 2013).
The feasibility and effects of some psychological interventions could depend on cultural aspects and it might be necessary to adapt the interventions according to cultural differences before implementation. Unfortunately, psychological interventions in general are rarely evaluated outside North America and Europe (Arnberg et al., 2013). Notably, one of the included trials was set in South Korea (Yoo et al., 2014) and indicated that the intervention with modest cultural adjustments was efficacious in this cultural setting.
With respect to social skills outcome measures, different tools related to different component of social skills were used. Since “social skills” are described as a complex and multidimensional construct, it can be difficult to adequately assess improvements in social skills (Koenig et al., 2009; McMahon et al., 2013; Williams White et al., 2007). Doing so may require a multi-method measurement approach. Another issue is that few studies included blinded observations in the participants’ everyday environment. Also, follow-up of long-term effects were not regularly reported. Only one trial report stated clearly that spontaneously reported adverse events were logged and reviewed (White et al., 2013). Missing information about potential adverse effects is a general shortcoming in RCTs of psychological interventions (Jonsson et al., 2014).
Treatment preferences might have an effect on the outcome (Swift and Callahan, 2009), but no report included information about the participants’ or providers’ preferences. Such information can be particularly important when two or more active treatments are compared. In addition, no information about costs related to the intervention was provided. This is unfortunate, since economic aspects of interventions are of paramount importance for implementation likelihood (National Institute for Health and Care Excellence, 2013).
Vital information was missing from all the reviewed research reports. This is understandable, given all the information that is demanded and the limited space provided by many journals. However, inadequate information about the determinants of external validity could considerably reduce the applicability of the trial results and ultimately lead to a waste of resources allocated to research (Glasziou et al., 2014). There are signs of improvement, though. Several recent research protocols for RCTs of SSGI suggest an increased focus on external validity (Choque Olsson and Bölte, 2013; Dekker et al., 2014; Freitag et al., 2013). In order to facilitate further improvement and a next generation of RCTs of SSGI in ASD accounting equally for internal and external validity, we suggest the list of items we used for this review as a tentative checklist (Table 1). Previous efforts to improve the reporting of results of RCTs, such as the CONSORT recommendations (Schulz et al., 2010), have mainly focused on internal validity. This checklist focuses exclusively on the determinants of external validity. We believe that the checklist can be useful for both researchers and reviewers.
It should be emphasized that the setting, care providers, and centers have obvious implications for external validity (Rothwell, 2006; Slack and Draugalis, 2001). In order to increase ecological validity, it is important that applied research uses a bottom-up approach with the intervention embedded in real clinical setting and delivered by clinicians, teachers, or other practitioners who have direct contact with individuals with ASD in their everyday environment (Williams White et al., 2007). Top-down intervention research in academic setting reflecting no or limited clinical reality may have little impact on clinical practice, as it does not face and comply with the challenges of true clinical service.
Ensuring generalizability to the everyday clinical settings might compromise the internal validity. This “trade-off” is reflected in the differentiation between efficacy (how the intervention performs under ideal and controlled conditions) and effectiveness (how the intervention performs in real-world settings). A primacy of internal validity and efficacy is often appropriate in the first place for two reasons. First, initial trials might be of low scientific rigor and therefore have limited internal validity. Second, internal validity is an indispensable prerequisite for external validity (not vice versa). Once an intervention is proven efficacious, a natural next step is to continue with effectiveness studies, combining satisfactory internal validity with a high degree of external validity. As the quantity of SSGI RCTs is increasing continuously, a gradual shift or enlargement of the focus into effectiveness research seems now warranted in order to stimulate a new generation of RCTs of equally high internal and external validity.
A possible limitation of the present review is that it was restricted to RCTs. It is important to note that other designs than RCTs can give valuable insight into generalizability (Wells, 1999). Still, the results must have a satisfactory internal validity and sufficient information must be available regarding the setting, participants, and intervention. Many valuable one-group and quasi-experimental SSGI trials have been published in ASD and paved the way for RCTs (Kaat and Lecavalier, 2014; McMahon et al., 2013). These studies may appear more clinically naturalistic than RCTs. However, it is technically impossible that studies with poor internal validity have high external validity. It is evident from the review by Kaat and Lecavalier (2014) that extant non-randomized studies either had a small sample size or did not have a well-characterized sample, suggesting that they would add little to our understanding of generalizability.
An additional limitation is that we consistent with standard systematic review procedures only reviewed information provided in the primary research report. We did not contact the authors for missing information, nor did we review all possibly available literature and information outside of scientific journals. In particular, more detailed information about the intervention could be available in manuals or educational material. However, we believe that the information we checked for should be provided in the primary research reports. We also did not investigate a number of other factors necessary for successful implementation, including availability of the manuals, educational material, and training courses.
Conclusion
Trial reports should allow clinicians to judge to whom and in which context these results could reasonably be applied and how the treatment should be provided in order to obtain similar effects. In order to facilitate this, journal editors, reviewers, and researchers within this research field should pay more attention to information of relevance for external validity. In addition, more effectiveness-oriented RCTs of equally high internal and external validity are needed. This will provide a better evidence base for decisions about implementation into regular care.
Footnotes
Acknowledgements
The authors would like to thank Viviann Nordin for her contribution to the selection of studies in the systematic literature review, as well as Carl Gornitzki and Klas Moberg for their assistance with the literature search.
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
This study was supported by the National Board of Health and Welfare (Socialstyrelsen), Barnforskningen Astrid Lindgrens Barnsjukhus, Stiftelsen Sunnerdahls Handikappfond, Majblomman, Sällskapet Barnavård, and the Stockholm County Council. Sven Bölte was supported by the Swedish Research Council (grant no. 523-2009-7054).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
