Abstract
This study provides results on a methodological quality review of the single-case research literature from 1998 to 2014 on the use of social skills interventions for students with challenging behavior. A systematic review of the social skills literature was conducted with the intent of updating the Mathur et al. study of social skills interventions. Twenty-four studies, published between 1998 and 2014, were identified and coded for methodological quality. Findings indicated half the studies failed to meet single-case design standards. Many studies did not sufficiently report reliability, implementation fidelity, or provide adequate opportunities to demonstrate intervention effect. The three most common behaviors across all studies were noncompliance, negative verbal interactions, and class disruptions. The majority of studies were conducted in early elementary grades. Results are discussed in the context of the need for greater methodological rigor for future single-case research studies in the area of social skills instruction.
Keywords
Challenging behaviors and social skills deficits are defining characteristics of students with or at-risk of Emotional and Behavioral Disorders (EBDs) as well as individuals with Autism Spectrum Disorders (ASDs; Forness, Freeman, Paparella, Kauffman, & Walker, 2012; Walker, Ramsey, & Gresham, 2004). Students with or at-risk of EBD are often characterized as having externalizing and/or internalizing behavioral patterns that are linked to social skills deficits (Lane, Parks, Kalberg, & Carter, 2007; Walker et al., 2004). These deficits have been described as either skill or performance-based deficits (Gresham, Sugai, & Horner, 2001). According to this model, students with social skill deficits have not learned a given social skill (representing a skill deficit), or he or she has learned the skill but chooses not to perform (reflecting a performance deficit; Gresham et al., 2001; Mathur & Rutherford, 1996).
Similar to students with or at-risk of EBD, social skill deficits are a defining characteristic among students with ASD, particularly those who are higher functioning (Wang, Cui, & Parrila, 2011). Although children with high functioning autism, pervasive developmental disorders not otherwise specified, or Asperger syndrome may show fewer cognitive and language deficits compared with students with more severe forms of ASD, the development of social skills continues to be a major problem (Rao, Beidel, & Murray, 2008). Social interaction skill deficits are especially problematic, including initiating interactions, maintaining reciprocity, understanding perspectives, and inferring meanings during social situations (Bellini, Peters, Benner, & Hopf, 2007).
Use of Single-Case Research (SCR) to Identify Evidence-Based Practices
The field of special education is placing greater emphasis on identifying interventions that are evidence-based (Shavelson & Towne, 2002). Evidence-based practices (EBPs) are defined as “practices and programs shown by high-quality research to have meaningful effect on student outcomes” (Cook & Odom, 2013, p. 136). Arguably, SCR has an important role in identifying EBPs and several methodological indicators been put forward regarding what constitutes high-quality SCR (Horner et al., 2005; Horner & Kratochwill, 2012; Kratochwill et al., 2010).
Methodological quality pertains to the methods of a research study as well as the safe guards implemented to prevent the likelihood of alternative explanations for observed outcomes (Shadish, Cook, & Campbell, 2002). Proposed quality indicators for SCR design and reporting include (a) operational definitions and descriptions of variables, (b) replication of effects, (c) fidelity of implementation, (d) reliability, and (e) social validity (Horner et al., 2005; Logan, Hickman, Harris, & Heriza, 2008; Tate et al., 2008). First, operational definitions focus on the detailed reporting of study features; they allow for the identification of commonalities and disparities across studies. Clear operational definitions can increase the ability for other researchers to replicate study effects (Wolery & Ezell, 1993). Clear descriptions and operationalized definitions should be provided for all aspects of a study including (a) student populations, (b) independent variables, (c) procedures, (d) dependent variables, and (e) settings.
Second, within-study replication is an important quality indicator in SCR. Within-study replication is determined by the extent to which treatment effects are consistently observed across phases, participants, settings, and behaviors (Horner et al., 2005). Treatment effects are established when a desired change in the dependent variable coincides with the systematic manipulation of the independent variable or intervention. Third, fidelity of implementation is a key element of SCR methodology and refers to the consistency of intervention delivery. Data on the fidelity of intervention implementation should be collected to ensure that the treatment or intervention was carried out as planned. Measures of fidelity can also help validate treatment effects by documenting that a particular treatment protocol was adhered to during the course of a study (Horner et al., 2005).
Fourth, more than one observer should be used to conduct reliability on each dependent variable. Acceptable reliability of measurement, or inter-observer agreement (IOA), must be collected for each participant and each dependent variable. Although many indices for IOA are available, percent agreement and Cohen’s kappa are often used. Minimum acceptable values for percent agreement and Cohen’s kappa are .80 and .60, respectively (Hartmann, 1977). Fifth, social validity provides helpful information on the acceptability and appropriateness of an intervention (Horner et al., 2005). Social validity data are also useful in determining the feasibility of an intervention (Spear, Strickland-Cohen, Romer, & Albin, 2013).
Recently, the What Works Clearinghouse (WWC) developed a framework for evaluating single-case designs (Kratochwill et al., 2013). This framework classifies studies into three categories: Meets Design Standards, Meets Design Standards With Reservations, or Does Not Meet Design Standards. Following the recommendations of the WWC, to meet basic SCR design standards, studies must (a) systematically manipulate an independent variable, (b) systematically measure each dependent variable over time by more than one observer, (c) use a design that documents at least three attempts to demonstrate intervention effects, and (d) have a minimum of five data points per phase in the design. If a study meets the previously mentioned criteria, but phases only include three or four data points per phase, then the study Meets Design Standards With Reservations.
Previous Reviews of the Social Skills Literature
Social skills interventions (SSIs) focus on teaching prosocial or alternative social behaviors and skills using nonaversive methods (Elliott & Gresham, 1993). There is a robust research literature base on SSIs as evidenced by a number of narrative, quantitative, and meta-analytic reviews dating back to 1981 (Cappadocia & Weiss, 2011; Flynn & Healy, 2012; Gillis & Butler, 2007; Gresham, 1981, 1985; Gresham & MacMillan, 1997; Maag, 2006; Reichow & Volkmar, 2010; White, Keonig, & Scahill, 2007). Several literature reviews have concluded that SSIs can be effective in promoting the acquisition and performance of prosocial behaviors (Gresham, 1981, 1985; McIntosh, Vaughn, & Zaragoza, 1991). However, subsequent meta-analyses on SSIs have reported mixed findings (Ang & Hughes, 2001; Beelmann, Pfingsten, & Lösel, 1994; Cook et al., 2008; Quinn, Kavale, Mathur, Rutherford, & Forness, 1999; Schneider, 1992).
One SCR meta-analysis germane to this current study focused on SSIs for students with or at-risk of EBD and students with autism (Mathur, Kavale, Quinn, Forness, & Rutherford, 1998). In the Mathur et al. (1998) meta-analysis, a total of 64 single-case studies were analyzed. Authors reported the mean percentage of nonoverlapping data (PND; Scruggs, Mastropieri, & Casto, 1987) across all studies as 62% (SD = 33%). Their meta-analysis included 283 participants identified as having behavioral problems, including those with EBD and autism. Results indicated that participants at the elementary and secondary levels were found to benefit more from SSIs than participants at the preschool level. In addition, greater SSI effects were reported for promoting social interaction skills than fostering communication skills. The mean PND for studies that evaluated maintenance and generalization of social skills was 64%, and greater effects were reported for studies that only included students with autism.
Purpose and Research Questions
The primary purpose of this systematic literature review was to evaluate the recent evidence base of SSIs using SCR design standards (Horner et al., 2005; Kratochwill et al., 2013). Rigorous analysis of single-case design methodology is needed to determine the quality of the SSI evidence base and whether SSIs are an evidence-based intervention. The present study also extended the Mathur et al. (1998) study by (a) providing an updated synthesis of the literature and (b) applying proposed quality standards to SCR studies on SSIs. Because students with or at-risk of EBD and students with ASD are often characterized as having social skills deficits and challenging behavior (Denning, 2007; Kauffman, Mock, & Simpson, 2007), the present SCR meta-analysis included both populations. SSI studies from 1998 to 2014 were identified and evaluated for methodological quality according to SCR design standards. The following research questions were posed:
Method
Literature Search
A systematic review of the literature was conducted to identify SSI studies to be included and evaluated for methodological quality. Electronic searches of the following psychology and educational databases were conducted to identify the initial pool of articles to be included: PsycINFO, Educational Resources Information Center, Academic Search Complete, and Education Full Text. Search terms related to challenging behavior and SSIs were combined using the Boolean phrase AND. The first set of terms included the following: behavioral disorders, emotional disorders, seriously emotionally disturbed, disruptive behavior, social behavior problems, antisocial behavior, autism, social behavior problems OR conduct disorders. The second set of terms included the following: social skills training, social skills instruction OR social skills interventions.
Inclusion and Exclusion Criteria
The intent of this literature review was to evaluate the quality of the evidence of SSIs for students with or at-risk of EBD and students with ASD exhibiting challenging behavior. Studies were included in this literature review if (a) participants were educated in a school setting; (b) participants were described as students with or at-risk of EBD or identified with ASD; (c) participants were described as exhibiting challenging behavior in school settings; (d) the intervention implemented taught social skills related to school-based prosocial behaviors or positive social interactions; (e) outcome measures assessed school-related social skill behaviors as a primary predictor; (f) they used a single-case design methodology; and (g) they were written in English, conducted in the United States, and published in a peer-reviewed journal between 1998 and 2014. Because a secondary goal of this study was to update the Mathur et al. (1998) study, the search was limited to articles published between 1998 and 2014. Dissertations and book chapters were excluded because the goal of this review was to draw conclusions based on information that had been evaluated through the peer-review process.
The initial search yielded 1,067 articles. After 373 duplicate articles were removed, 694 titles and abstracts were evaluated to determine whether the article should be read in its entirety. References of identified studies were then reviewed to find other articles that met inclusion criteria. In addition, journals of articles meeting inclusion criteria were searched between 2013 and 2014 to find articles published that may not have been added to the electronic databases including Exceptional Children, Behavioral Disorders, Journal of Emotional and Behavioral Disorders, Journal of Autism and Developmental Disorders, Journal of Applied Behavior Analysis, Journal of Autism and Developmental Disorders, Journal of Positive Behavior Interventions, and Remedial and Special Education. A total of 22 articles were identified through the electronic search, and one article was identified through the extended search resulting in 23 articles included in the present literature review (see Figure 1). However, a total of 24 studies were analyzed because Blake, Wang, Cartledge, and Gardner (2000) consisted of two studies.

Article selection flowchart.
Article Coding
Included articles were reviewed, and descriptive information was extracted for coding. Each article was coded for participant, setting, and study characteristics. Methodological design features were coded and used to determine the overall methodological quality of the study.
Participant and setting characteristics
Participants were coded on (a) age, (b) gender, (c) school level (d) ethnicity, (e) disability, and (f) educational setting. The age of each participant was recorded in years rounding down in instances where studies reported age in years and months. Gender was dichotomous and included male and female. School level of participants included three levels: early elementary (pre-kindergarten–fourth grade), intermediate/middle (fifth–eighth grade), and secondary (ninth–12th grade). Ethnicity included five levels: White, Black, Hispanic, Asian, and Mixed/Other. Disability was coded as (a) identified with or at-risk of EBD or (b) identified with ASD. Educational setting included three levels: general education, special education, or both.
Study and intervention characteristics
Study characteristics including experimental design, intervention development, intervention implementation, and dependent measures were coded.
Experimental design and intervention
The SCR design used in each study was recorded. Social skill interventions were categorized as teaching (a) positive social interactions, (b) prosocial classroom behaviors, or (c) mixed.
Intervention development
Two levels were included in intervention development: individualized or general. Instances where the social skill intervention was created or developed specifically for the target student and matched social skills instruction to student deficits were coded as individualized. All other interventions in which packaged or scripted social skills curriculums were implemented were categorized as general.
Intervention implementation
Intervention implementation was coded as alone or combined. Studies in which social skills training was the only intervention implemented were categorized as alone. Studies in which SSIs were combined with other strategies such as self-monitoring, cueing, group contingency, or other forms of reinforcement were categorized as combined.
Dependent measures
Dependent measures included three levels: social interaction skills, social classroom behavior, or mixed. The first category, social interaction, described behaviors in which the participant engaged in physical or verbal interactions aimed at peers or adults in school settings. The second category, classroom behavior, was used to describe behaviors in which the student displayed behavior in response to a classroom rule, procedure, or task. Target behaviors categorized as classroom behavior included behaviors such as refusal to complete academic tasks, temper tantrums, and property destruction. Target behaviors categorized as social interaction included behaviors such as physical or verbal aggression, hypersensitivity to redirection, and refusal to play cooperatively with peers. Behaviors involving social interaction and classroom behavior were categorized as mixed.
Dependent measure categories were pre-determined and largely based on the Caldarella and Merrell (1997) taxonomy of social skill dimensions which includes peer relations, self-management, academic, compliance, and assertion skills. Descriptive information from the studies also guided how behaviors were categorized. Behaviors relating to peer relations and assertion were categorized as social interaction skills. Academic and compliance behaviors were categorized as social classroom behavior skills. Self-management skill behaviors were categorized according to the context and description of the behavior. Similarly, instances where compliance behaviors directly involved another peer or adult were categorized as social interaction skills. For example, noncompliance in completing academic tasks was coded as classroom behavior skills because the student was described as refusing to complete an assignment which did not directly involve social interaction with a peer or adult. However, in a different study, noncompliance described as refusal to take turns during an activity was coded as social interaction because of the necessary involvement of a peer to take turns. Table 1 organizes social skill dependent measures into social interaction skills and social classroom behavior skills. The codebook containing operational definitions for all coded variables can be obtained from the first author.
Classification of Social Skills.
Quality of Evidence Evaluation
Quality of evidence evaluations and the application of the WWC SCR Standards were modeled after Maggin, Chafouleas, Goddard, and Johnson (2011). Design standards were applied to each study. In addition, to help ensure design quality, standards for experimental control and opportunities to demonstrate effect were applied to each graph in studies that included multiple graphs. Studies with more than one graph received a two-part rating reported as (S, G). S signifies the overall quality rating of the study, and G represents the additional ratings for each graph within a study. For example, a study with a multiple baseline across behaviors design, and graphs for three participants, would receive three ratings: (S, G1), (S, G2), and (S, G3); where S represents the overall quality rating and GX represents the rating for each participant’s graph. The purpose of the two-part rating was to identify inconsistencies in experimental control and opportunities to demonstrate effect between the study as a whole and individual graphs within a study. Instances where studies only included one graph received only one rating.
Coding for design standards
The methodological quality rubric focused on research design and methods. The following standards were assessed: (a) systematic manipulation of the independent variable, (b) IOA, (c) fidelity of implementation, (d) opportunities to demonstrate effect, and (e) phase length. Each standard assessed was coded as Meets Design Standards, Meets Design Standards With Reservations, or Does Not Meet Design Standards (see Table 2).
Methodological Quality Rubric Design Standards, Definitions, and Codes.
Note. 0 = Does Not Meet Design Standards; 1 = Meets Design Standards With Reservations; 2 = Meets Design Standards; IOA = inter-observer agreement.
Systematic manipulation of the independent variable
The independent variable must be systematically manipulated to be coded Meets Design Standards. The researcher must determine when and how the SSI was implemented. If this was not done intentionally, this standard was not met, and the study was coded Does Not Meet Design Standards.
IOA
Each of the dependent variables must be measured repeatedly over time by more than one observer. Data on agreement between the two assessors should be collected on at least 20% of data points overall, and indicate that data were collected on 20% of data points in each condition, setting, or phase. In addition, agreement reported must meet the minimum thresholds of agreement indices: .80 for percentage of agreement and .60 for measures of Cohen’s kappa.
The WWC standard for IOA was sectioned into three parts: (a) collection of IOA, (b) percentage of data assessed for reliability from each condition, and (c) meeting minimum thresholds of agreement indices. If a study reported reliability data above .80 for percent agreement and/or .60 for Cohen’s kappa, on 20% of data overall and across all conditions, it received a rating of Meets Design Standards. If the study reported reliability for 20% of the data overall but did not indicate whether the 20% represented data from each condition, it received a rating of Meets Design Standards With Reservations. If a study reported reliability data that were below the minimum thresholds for percent agreement or Cohen’s kappa, it received a rating of Does Not Meet Design Standards.
Fidelity of implementation
The adapted quality rubric was modified to include a standard on fidelity. Fidelity of implementation criteria mirrored design standards for IOA requiring studies to collect data on fidelity of implementation for at least 20% of all conditions, with percentages of accurate implementation at or above 80%. To receive a rating of Meets Design Standards, a study must collect and report measures on the fidelity of implementation for 20% of data that are at or above 80% agreement. If a study collected informal measure of fidelity, it received a rating of Meets Design Standards With Reservations. If no measures of fidelity were reported, the study was coded Does Not Meet Design Standards.
Opportunities to demonstrate effect
Opportunities to demonstrate an effect were assessed by the number of adjacent phase changes within a study. To receive a rating of Meets Design Standards, the study must include at least three attempts to demonstrate an intervention effect at three different points in time. Attempts to demonstrate a treatment effect must occur between phase contrasts that are adjacent to one another. If this standard was not met, the study was coded Does Not Meet Design Standards. Examples of designs meeting this standard include ABAB reversal designs, multiple baseline designs (MBDs) with at least three baseline conditions, alternating treatment designs with at least three alternating treatments compared with a baseline condition or two alternating treatments compared with each other, changing criterion designs with at least three different criteria, and more complex variants of these designs. Examples of designs not meeting this standard include basic two-phase AB designs, and some variations of reversal designs (e.g., ABA and BAB).
Phase length
Opportunities to demonstrate an effect within a phase were determined by the number of data points within a phase, or the phase length. For a phase to qualify as an attempt to demonstrate an effect, the phase must have a minimum of three data points to Meets Design Standards With Reservations, and five or more data points are required to Meets Design Standards.
Overall quality ratings
Overall quality ratings were coded as Meets Design Standards, Meets Design Standards With Reservations, or Does Not Meet Design Standards. To receive an overall quality rating of Meets Design Standards, all items assessed in the quality rubric must be coded as Meets Design Standards. If any of the items assessed in the quality rubric are coded Meets Design Standards With Reservations, then the overall quality is coded as Meets Design Standards With Reservations. Similarly, if any item within the quality rubric is coded Does Not Meet Standards, then the overall quality rating is coded as Does Not Meet Design Standards.
The same logic was applied to studies receiving two-part ratings. For a study with multiple graphs to receive an overall quality rating of Meets Design Standards, all ratings for that study must be coded as Meets Design Standards. If one of the graphs within a study received a rating of Meets Design Standards With Reservations, the overall quality rating for that study received a rating of Meets Design Standards With Reservations. If one of the graphs within a study received a rating of Does Not Meet Design Standards, the overall quality rating for that study received a rating of Does Not Meet Design Standards.
Reliability
Reliability estimates were collected for (a) article selection, (b) descriptive coding of studies, and (c) the application of the methodological quality rubric. The first author independently coded each of the 24 included studies. A co-author served as an additional reviewer and coded a random sampling of included studies to check for reliability. Based on the recommendations of Hartmann, Barrios, and Wood (2004), 20% (n = 5) of included studies were checked for reliability by an additional reviewer. If the two coders’ results matched, this was coded as an agreement. If results between the two coders did not match, this was coded as a disagreement. Disagreements were resolved by discussing discrepancies until agreement was reached.
Simple percent agreement and Cohen’s kappa, a more conservative measure of reliability adjusting for chance agreement (Ary & Suen, 1989), were calculated for each area of reliability. Simple percent agreement was calculated by dividing the sum of agreements by the total number of agreements plus disagreements multiplied by 100. Cohen’s kappa was calculated using the Vassar stats website (Lowry, 2016). Percent agreement above 80% and Cohen’s kappa values above .60 were considered acceptable (Kratochwill et al., 2013).
Article selection
To check for the reliability of article selection, an additional reviewer, with expertise in the systematic literature review process, assessed 20% (n = 55) of eligible articles (n = 267) for the application of inclusion and exclusion criteria. Each of the 55 articles was categorized as include or exclude. Initial percent agreement and Cohen’s kappa results for article selection were 90% and .81, respectively. Disagreements related to the inclusion criteria of intervention type were the most common. Further explanation and examples of interventions using social skills to teach school-based prosocial behaviors or positive social interactions were discussed until 100% agreement was reached.
Descriptive coding
Articles included in the review were also checked for reliability on descriptive coding. More than 20% (n = 6) of the included studies were checked for descriptive coding reliability. Each study was coded by an additional coder trained in single-case design methodology and unaware of initial coding results. Reliability on 15 different descriptive items was assessed across the number of participants in a study. For example, a study with three participants allowed for 45 opportunities of agreement. Initial reliability using percent agreement was 87%. Cohen’s kappa was assessed for the coding of dependent measures because the majority of disagreements occurred in this area. Kappa for coding of dependent measures was .70. The most common disagreements related to behaviors described as noncompliance. To resolve differences, outcome behaviors were discussed and the reviewers referred back to the codebook containing operational definitions for all coded variables. All disagreements were discussed until 100% agreement was reached.
Methodological quality coding
The same five articles randomly selected for descriptive coding were checked for reliability on the application of the quality design rubric. Each graph within a study was assessed for opportunities to demonstrate effect and phase length. Initial percent agreement for graphs was 95%. Reliability on overall ratings of methodological quality at the study level was 100% for percent agreement and Cohen’s kappa.
Results
Participant and Study Characteristics
Participants and setting
A total of 75 participants were included across the 24 studies examined in this systematic literature review (see Table 3). Studies were published between 1998 and 2014. The majority of the participants were male (89%, n = 68). Although ethnicity was not reported for 23 participants (31%), Black (33%, n = 25) and White (31%, n = 23) were the two ethnic groups with the greatest representation. All studies involved students with behavioral difficulties with 15 participants (20%) at-risk for EBD, 29 participants (39%) identified with EBD, and 31 participants (41%) identified with ASD. The majority of students were educated in special education settings which included specialized schools for students with disabilities (47%, n = 35). Twenty-six participants (35%) were educated in general educational settings. Only 18% (n = 14) of participants were educated in both special education and general education settings (see Table 3).
Participant and Study Characteristics.
Note. EBD = emotional and behavioral disorder; ASD = autism spectrum disorder.
Number of studies.
Experimental design and intervention
Because one article included two studies, a total of 24 studies were evaluated. MBDs were the most commonly used experimental design (67%, n = 16) followed by AB or reversal designs (16. 5%, n = 4) and mixed designs (16. 5%, n = 4). The four mixed designs included MBD across subjects with randomization of intervention implementation (Bardon, Dona, & Symons, 2008), MBD across behaviors with two treatments (Blake et al., 2000, Study 1), a combined ABAB and MBD across behaviors (Hagopian, Kuhn, & Strother, 2009), and an MBD across setting and behaviors with reversals (Herring & Northup, 1998).
The majority of studies examined classroom behavior (46%, n = 11) as the dependent measure. Social interaction was the dependent measure for 29% (n = 7) of studies; 25% (n = 6) of the studies measured classroom behavior and social interaction skills. In 62.5% (n = 15) of studies, the intervention implemented was individualized to the student. The majority of studies implemented SSIs alone (62.5 %, n = 15), as opposed to combining the intervention with other behavioral strategies (see Table 4). Table 5 contains a list of the SSIs implemented.
Study Characteristics.
Note. pK = pre-kindergarten; EBD = emotional and behavioral disorder; ASD = autism spectrum disorder; MBD = multiple baseline design; NP = not provided.
Social Skills Interventions.
Note. PATHS = promoting Alternative Thinking Strategies; SODA = Stop-Observe-Deliberate-Act; SDLMI = self-determined learning model of instruction.
Methodological Quality
Overall ratings
Each study as well as each graph within each study was assessed with the quality rubric and given an overall rating of methodological quality. Twenty four studies and 43 graphs were evaluated (see Table 6).
Design Standards Ratings.
Note. 0 = Does Not Meet Design Standards; 1 = Meets Design Standards With Reservations; 2 = Meets Design Standards; S = rating by study; G = rating by graphs.
Study ratings
Twenty-four studies were assessed for methodological quality. Only one study received a rating of Meets Design Standards (Campbell & Tincani, 2011). About 42% (n = 10) of the studies evaluated received a rating of Meets Design Standards With Reservations. The remaining 54% (n = 13) of studies Did Not Meet Design Standards (see Table 7).
Methodological Quality Results.
Note. IV = internal validity; IOA = inter-observer agreement; Y = yes; N = no; F = formal; I = informal. IOA-A = 20% of data; IOA-B = 20% of data with in each condition; IOA-C = minimal thresholds; Fidelity B = 20% of data; Fidelity C = minimal thresholds.
Individual graph ratings
The areas of opportunities to demonstrate effect and phase length from
Individual Design Standard Ratings
The quality rubric assessed five standards: (a) systematic manipulation of the independent variable, (b) IOA, (c) fidelity of implementation, (d) opportunities to demonstrate effect, and (e) phase length. Failing to meet all design standards for IOA was the primary reason studies received the rating of Meets Design Standards With Reservations (70%, n = 17), followed by phase length (58%, n = 14). Nine studies did not include at least three attempts to demonstrate intervention effects at three different points in time.
Systematic manipulation of the independent variable
All of the 24 included studies systematically introduced the SSIs. One study (Bardon et al., 2008) used a randomization technique to determine when the intervention would be implemented with each participant.
IOA
The majority of studies (96%, n = 23) reported IOA on 20% of data overall, at or above 80% for percent agreement or 60% for Cohen’s kappa. Only one study (Keeling, Smith, Myles, Gagnon, & Simpson, 2003) failed to provide any information on reliability. However, more than 70% (n = 17) of studies did not specify whether IOA was collected during each condition.
Fidelity of implementation
Formal and informal fidelity of implementation data were reported for 66% (n = 16) and 8% (n = 2) of studies, respectively. Six studies (25%) did not report any data on fidelity of implementation.
Opportunities to demonstrate effect and phase length
Of the 24 studies evaluated, 62.5% (n = 15) of studies included sufficient opportunities to demonstrate intervention effects for at least three different points in time, and 42% (n = 10) of studies utilized designs that included at least five data points per phase.
Discussion
The purpose of this systematic literature review and quality evaluation was to update and evaluate the evidence base of SSIs for students with or at-risk of EBD and students with ASD. Results from the quality evaluation provided information on the methodological rigor within the existing social skills literature. Currently in the field of special education, there is a move to evaluate the methodological quality of studies being conducted due to standards now available for determining quality SCR studies (e.g., Kratochwill et al., 2010). Although some may have concerns about applying recently developed criteria to studies retrospectively, the application of design standards to the social skills literature provides a better understanding of the methodological quality of research that has been historically conducted (Kratochwill et al., 2013). Moreover, application of these design standards provides important information for moving the field of SSIs forward by identifying problematic design issues, which, in turn, will hopefully improve the quality and reproducibility of research conducted in this area. Three research questions were investigated in this study. Findings for each are discussed as follows.
Major Findings
The first question focused on the methodological quality of single-case studies researching SSIs for students with or at-risk for EBD and students with ASD who exhibit challenging behavior. After applying the quality rubric to each of the 24 included studies, results indicated that the methodological quality of the evidence base in this area is not ideal, but holds some potential. More than half of studies identified failed to meet minimum design standards with or without reservations (54%, n = 13). However, 10 studies met standards with reservations. From this evaluation, three areas of methodological weakness were identified: reliability, fidelity of implementation, and experimental control.
First, although 96% (n = 23) of the studies evaluated included acceptable data on reliability, only 29% (n = 7) of studies specified conducting IOA across all participants and phases. Adequate collection and reporting of IOA increases confidence in the reliability of effects reported in research literature. Fifteen of the included studies reported positive intervention effects but did not include adequate IOA measures. Insufficient IOA decreases the overall quality of data collection and, in the present study, decreases confidence in the results for 15 studies (e.g., Bardon et al., 2008; Blake et al., 2000; Chan & O’Reilly, 2008; Schneider & Goldstein, 2010).
Second, six studies did not include information on fidelity of implementation (Blood et al., 2011; Hagiwara & Miles, 1999; Hagopian et al., 2009; Herring & Northup, 1998; Keeling et al., 2003; Miller et al., 2011). The quality rubric developed was based on the WWC design standards but was modified to include criteria on the fidelity of implementation. The addition of fidelity of implementation criteria weakened the overall quality of the evidence for SSIs. Three studies originally rated as Does Not Meet Standards appear to Meet Standards With Reservations when fidelity of implementation is not considered (Hagiwara & Miles, 1999; Hagopian et al., 2009; Miller et al., 2011).
Although fidelity of implementation is not included in the WWC standards, reporting fidelity data ensure that interventions are provided as intended. Fidelity of implementation data is a key element in the description of intervention procedures (Horner et al., 2005). Lack of fidelity measures not only limit confidence in treatment efficacy but also hinder the ability for future researchers to replicate effects. Given that replication is essential to determining EBPs, the absence of fidelity measures is a real problem. Furthermore, without the measurement of treatment fidelity, it is unknown if a SSI was ineffective due to an ineffective strategy or because it was poorly implemented.
The third area of concern, experimental control, is foundational in SCR (Horner et al., 2005). Design standards related to opportunities to demonstrate effect and phase length evaluated the adequacy of experimental control. It is generally known that appropriate experimental control validates the functional relationship between the independent and dependent variable. Of the studies evaluated, 63% (n = 15) included three or more opportunities to demonstrate intervention effect at three different points in time. However, the remaining nine studies (37%) reported positive effects without sufficient experimental control to support their findings (Bardon et al., 2008; Blake et al., 2000; Blood et al., 2011; Hansen & Lignugaris-Kraft, 2005; Herring & Northup, 1998; Hune & Nelson, 2002; Kelly & Shogren, 2014; Kuoch & Mirenda, 2003; Schneider & Goldstein, 2010). Additional single-case studies with proper methodological rigor related to experimental control are needed to strengthen the evidence of SSIs for students with or at-risk of EBD and students with ASD who display challenging behavior.
The second question focused on identifying the most common school-related behaviors targeted for improvement through SSIs. Results were organized into two categories. The first category, social interaction, described behaviors in which the participant engaged in physical or verbal interactions with peers or adults in school settings. This included behaviors such as physical or verbal aggression and difficulties interacting with peers or adults. The second category, classroom behavior, was used to describe all other behaviors that did not explicitly focus on social interactions with peers or adults. Target behaviors categorized as classroom behavior included behaviors such as noncompliance, temper tantrums, and property destruction. The three most common behaviors across all studies in both categories were noncompliance, negative verbal interactions, and class disruptions.
Specific to the category of classroom behavior, off-task behavior, class disruptions, and noncompliance were the three most common targeted behaviors. Specific to the category of social interaction, physical or verbal aggression and negative verbal interactions were the most common targeted behaviors. The majority of participants with target behaviors in the social interaction category were students with or at-risk for EBD. Conversely, the majority of participants with target behaviors in the classroom behavior category were students with ASD. Social stories and video modeling were often used to teach students with ASD social skills related to appropriate classroom behavior. Perhaps one reason studies including students with or at-risk of EBD primarily focused on social interaction skills is that verbal and physical aggressions, which are more characteristic for this population, are more visible than the social interaction deficits of students with ASD, such as off-topic comments during conversations.
The third area of inquiry focused on identifying for whom and under what conditions SSI research has been conducted. Information was collected on the percentages of included studies that (a) were conducted in preschool, elementary, and secondary settings; (b) researched the effects of SSIs for students with or at-risk of EBD; (c) researched the effects of SSIs for students with ASD; (d) individualized treatment to the social skill deficits of the student; and (e) implemented SSIs alone versus combining social skills instruction with other behavioral strategies. Results focused on this third area showed that the majority of studies were conducted in early elementary grades prekindergarten through fourth grade. Consistent with previous research, findings indicated a lack of research on SSIs for students at the intermediate/middle and secondary settings (Maag, 2006). An equal number of studies included participants with ASD compared with participants with or at-risk for EBD. However, at the participant level, more research was conducted with participants with or at-risk of EBD (n = 44) than students with ASD (n = 31).
Similarly, 15 studies individualized the SSI to the student. However, the numbers were almost even at the participant level (individualized, n = 37; general, n = 38). Interventions that were individualized were primarily in studies including students with ASD. In contrast, general SSIs were implemented for groups of students with or at-risk for EBD. This is most likely due to the fact that students with or at-risk for EBD are often separated from the general learning environment and grouped with other students displaying similar behaviors. In addition, the majority of studies implemented SSIs alone (62.5 %, n = 15) as opposed to combining the intervention with other behavioral strategies. For nine studies, SSIs were combined with one of the following strategies: visual reminders, peer training, self-management, reinforcement, corrective feedback, and group contingency.
Limitations and Future Research
Several limitations should be considered when interpreting the findings of this review. First, although efforts were made to identify all studies meeting inclusion criteria, all suitable studies may not have been identified because consideration for inclusion was limited to peer-reviewed studies conducted between 1998 and 2014. It is possible that evaluating the complete body of research related to SSIs could have affected results. In addition, limiting the search to peer-reviewed literature can result in the exclusion of relevant studies that may not be published. Second, the majority of included studies were conducted in pre-kindergarten through fourth-grade settings. Therefore, care should be taken when interpreting results for students in intermediate and secondary settings. Third, the specific focus of this study was on (a) students with or at-risk of EBD, (b) students with ASD, and (c) outcome measures for the remediation of social interaction or classroom behaviors only. These conditions limit the generalization of findings to similar participant and study characteristics.
Fourth, results presented on SCR design standards pertain only to the methodological quality of included SSI studies and not on intervention effect. Many reviews apply criteria for demonstrating evidence of a relationship between an independent and dependent variable to studies that meet standards (with or without reservations). Because the focus of this study was to evaluate the methodological quality of the social skills literature, studies were not evaluated for evidence of effect. Therefore, results of this review relate only to methodological quality of the SSI literature base without assessing whether SSIs are effective for students with or at-risk of EBD or students with ASD who exhibit challenging behavior. Future research should examine SSI effectiveness for these populations.
Results of this systematic literature review on the quality of SCR for SSIs suggest that future research be conducted with greater methodological rigor, particularly in the areas of IOA, fidelity of implementation, and experimental control. If SCR studies are to be used to identify EBPs, then their results are directly linked to the methodological quality of the study design (Cook, Tankersley, & Landrum, 2009). Future research should adhere to, at a minimum, guidelines set by WWC (Kratochwill et al., 2010). Methodological quality at the study, design, or participant level may need to be assessed to accurately capture features of research design. Additional research on the effectiveness of SSIs from studies meeting SCR design standards is also needed to determine SSIs as an EBP.
Since the development of SCR design standards, areas of the literature that were assumed to be evidence-based (i.e., SSI) were revealed to contain methodologically flawed research (Chard, Ketterlin-Geller, Baker, Doabler, & Apichatabutra, 2009; Maggin et al., 2011). Although all studies have methodological limitations, flawed research that does not meet basic design standards is of concern. The concern regarding methodological quality is shared not only in special education but also in the scientific community (Open Science Collaboration, 2015). High quality studies that are reproducible are an important component of establishing the evidence base regarding a particular practice. Reproducibility is especially important in SCR which depends upon across study or systematic replication to establish external validity (Horner et al., 2005). If replications are based on poor quality research, then the evidence base is undermined, and many practices may be put forward as promising or evidence-based, when in fact, the evidence supporting those practices is questionable.
This brings to light the role of the peer-review process in supporting the publication of high-quality research. Although evidence reviews are important for developing recommendations for practice, they may also provide additional guidance for peer reviewers and editorial boards regarding the level of research accepted for publication in peer-reviewed journals. Reviewers have a responsibility to voice concerns for studies that do not meet basic design criteria and are methodologically flawed to a point that their results are questionable. Studies should be scrutinized on their level of methodological quality, and current SCR methodological standards put forward by WWC are arguably, a reasonable way of applying “minimum” quality standards.
The proposed WWC design standards provide a solid framework for assessing methodological rigor in SCR. Additional areas that should be considered are fidelity of implementation and social validity. Fidelity is an important factor in increasing confidence that the intervention effects observed were related to the intervention. Previous research as well as the current study provides compelling support for the inclusion of fidelity measures as part of the WWC design standards (Kratochwill et al., 2013; Wolery, 2013). It is also seen as “best practice” (Keller-Margulis, 2012). Replication is essential to the identification of an EBP (Horner et al., 2005), and fidelity of implementation is a key variable in replicating intervention effects. Future research should measure fidelity and consider including this area as a basic standard for SCR design.
Another area that may warrant consideration for inclusion as a design standard is social validity. Social validity was not included as part of the quality rubric used in this study because it speaks more to intervention effectiveness rather than methodology. However, measures of social validity would add an additional level of rigor to the current SCR design standards. Two major barriers to the implementation of EBPs in school settings are the lack of time and inadequate support from administrators (Kratochwill & Shernoff, 2004). Future research on the social validity of SSIs, including feasibility of application, can further advance this area of research. For example, surveys or post-intervention interviews on the real world application of interventions would provide meaningful data on barriers to implementation and success. Measures of social validity are vital if the ultimate goal is to transfer EBPs into practice. Finally, future studies are needed to extend the research on SSI to other populations of individuals with disabilities (e.g., adults with disabilities), in a variety of settings (e.g., naturalistic, home, or employment settings), and other behaviors of interest (e.g., problem solving, safety skills, or social competence).
In conclusion, this review found that SSI research using SCR contained several methodological problems. Although historically SSIs have been thought of as an EBP, Gresham (1998) asked the question of whether we should “raze, remodel, or rebuild” the social skills literature when pointing out the historically weak effects found in meta-analyses conducted on social skills training. In paving the way forward, we offer that an EBP should be based on the best available evidence. However, there should come a time in a line of research that studies conducted reach a level of maturity in terms of both quality and quantity to conclude that a practice is evidence-based. It appears that the literature on SSIs needs more methodologically rigorous studies to be able to identify them as EBPs. We believe the application of design standards to the SSI literature provides important information for “remodeling” this area of research. Identifying problematic design issues, we hope, will inform the development of future studies that use more rigorous single-case methods.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
