Social Skills Interventions for Students With Challenging Behavior

Abstract

This study provides results on a methodological quality review of the single-case research literature from 1998 to 2014 on the use of social skills interventions for students with challenging behavior. A systematic review of the social skills literature was conducted with the intent of updating the Mathur et al. study of social skills interventions. Twenty-four studies, published between 1998 and 2014, were identified and coded for methodological quality. Findings indicated half the studies failed to meet single-case design standards. Many studies did not sufficiently report reliability, implementation fidelity, or provide adequate opportunities to demonstrate intervention effect. The three most common behaviors across all studies were noncompliance, negative verbal interactions, and class disruptions. The majority of studies were conducted in early elementary grades. Results are discussed in the context of the need for greater methodological rigor for future single-case research studies in the area of social skills instruction.

Keywords

single-subject research methodology emotional and behavioral disorders exceptionalities autism social skills

Challenging behaviors and social skills deficits are defining characteristics of students with or at-risk of Emotional and Behavioral Disorders (EBDs) as well as individuals with Autism Spectrum Disorders (ASDs; Forness, Freeman, Paparella, Kauffman, & Walker, 2012; Walker, Ramsey, & Gresham, 2004). Students with or at-risk of EBD are often characterized as having externalizing and/or internalizing behavioral patterns that are linked to social skills deficits (Lane, Parks, Kalberg, & Carter, 2007; Walker et al., 2004). These deficits have been described as either skill or performance-based deficits (Gresham, Sugai, & Horner, 2001). According to this model, students with social skill deficits have not learned a given social skill (representing a skill deficit), or he or she has learned the skill but chooses not to perform (reflecting a performance deficit; Gresham et al., 2001; Mathur & Rutherford, 1996).

Similar to students with or at-risk of EBD, social skill deficits are a defining characteristic among students with ASD, particularly those who are higher functioning (Wang, Cui, & Parrila, 2011). Although children with high functioning autism, pervasive developmental disorders not otherwise specified, or Asperger syndrome may show fewer cognitive and language deficits compared with students with more severe forms of ASD, the development of social skills continues to be a major problem (Rao, Beidel, & Murray, 2008). Social interaction skill deficits are especially problematic, including initiating interactions, maintaining reciprocity, understanding perspectives, and inferring meanings during social situations (Bellini, Peters, Benner, & Hopf, 2007).

Use of Single-Case Research (SCR) to Identify Evidence-Based Practices

The field of special education is placing greater emphasis on identifying interventions that are evidence-based (Shavelson & Towne, 2002). Evidence-based practices (EBPs) are defined as “practices and programs shown by high-quality research to have meaningful effect on student outcomes” (Cook & Odom, 2013, p. 136). Arguably, SCR has an important role in identifying EBPs and several methodological indicators been put forward regarding what constitutes high-quality SCR (Horner et al., 2005; Horner & Kratochwill, 2012; Kratochwill et al., 2010).

Methodological quality pertains to the methods of a research study as well as the safe guards implemented to prevent the likelihood of alternative explanations for observed outcomes (Shadish, Cook, & Campbell, 2002). Proposed quality indicators for SCR design and reporting include (a) operational definitions and descriptions of variables, (b) replication of effects, (c) fidelity of implementation, (d) reliability, and (e) social validity (Horner et al., 2005; Logan, Hickman, Harris, & Heriza, 2008; Tate et al., 2008). First, operational definitions focus on the detailed reporting of study features; they allow for the identification of commonalities and disparities across studies. Clear operational definitions can increase the ability for other researchers to replicate study effects (Wolery & Ezell, 1993). Clear descriptions and operationalized definitions should be provided for all aspects of a study including (a) student populations, (b) independent variables, (c) procedures, (d) dependent variables, and (e) settings.

Second, within-study replication is an important quality indicator in SCR. Within-study replication is determined by the extent to which treatment effects are consistently observed across phases, participants, settings, and behaviors (Horner et al., 2005). Treatment effects are established when a desired change in the dependent variable coincides with the systematic manipulation of the independent variable or intervention. Third, fidelity of implementation is a key element of SCR methodology and refers to the consistency of intervention delivery. Data on the fidelity of intervention implementation should be collected to ensure that the treatment or intervention was carried out as planned. Measures of fidelity can also help validate treatment effects by documenting that a particular treatment protocol was adhered to during the course of a study (Horner et al., 2005).

Fourth, more than one observer should be used to conduct reliability on each dependent variable. Acceptable reliability of measurement, or inter-observer agreement (IOA), must be collected for each participant and each dependent variable. Although many indices for IOA are available, percent agreement and Cohen’s kappa are often used. Minimum acceptable values for percent agreement and Cohen’s kappa are .80 and .60, respectively (Hartmann, 1977). Fifth, social validity provides helpful information on the acceptability and appropriateness of an intervention (Horner et al., 2005). Social validity data are also useful in determining the feasibility of an intervention (Spear, Strickland-Cohen, Romer, & Albin, 2013).

Recently, the What Works Clearinghouse (WWC) developed a framework for evaluating single-case designs (Kratochwill et al., 2013). This framework classifies studies into three categories: Meets Design Standards, Meets Design Standards With Reservations, or Does Not Meet Design Standards. Following the recommendations of the WWC, to meet basic SCR design standards, studies must (a) systematically manipulate an independent variable, (b) systematically measure each dependent variable over time by more than one observer, (c) use a design that documents at least three attempts to demonstrate intervention effects, and (d) have a minimum of five data points per phase in the design. If a study meets the previously mentioned criteria, but phases only include three or four data points per phase, then the study Meets Design Standards With Reservations.

Previous Reviews of the Social Skills Literature

Social skills interventions (SSIs) focus on teaching prosocial or alternative social behaviors and skills using nonaversive methods (Elliott & Gresham, 1993). There is a robust research literature base on SSIs as evidenced by a number of narrative, quantitative, and meta-analytic reviews dating back to 1981 (Cappadocia & Weiss, 2011; Flynn & Healy, 2012; Gillis & Butler, 2007; Gresham, 1981, 1985; Gresham & MacMillan, 1997; Maag, 2006; Reichow & Volkmar, 2010; White, Keonig, & Scahill, 2007). Several literature reviews have concluded that SSIs can be effective in promoting the acquisition and performance of prosocial behaviors (Gresham, 1981, 1985; McIntosh, Vaughn, & Zaragoza, 1991). However, subsequent meta-analyses on SSIs have reported mixed findings (Ang & Hughes, 2001; Beelmann, Pfingsten, & Lösel, 1994; Cook et al., 2008; Quinn, Kavale, Mathur, Rutherford, & Forness, 1999; Schneider, 1992).

One SCR meta-analysis germane to this current study focused on SSIs for students with or at-risk of EBD and students with autism (Mathur, Kavale, Quinn, Forness, & Rutherford, 1998). In the Mathur et al. (1998) meta-analysis, a total of 64 single-case studies were analyzed. Authors reported the mean percentage of nonoverlapping data (PND; Scruggs, Mastropieri, & Casto, 1987) across all studies as 62% (SD = 33%). Their meta-analysis included 283 participants identified as having behavioral problems, including those with EBD and autism. Results indicated that participants at the elementary and secondary levels were found to benefit more from SSIs than participants at the preschool level. In addition, greater SSI effects were reported for promoting social interaction skills than fostering communication skills. The mean PND for studies that evaluated maintenance and generalization of social skills was 64%, and greater effects were reported for studies that only included students with autism.

Purpose and Research Questions

The primary purpose of this systematic literature review was to evaluate the recent evidence base of SSIs using SCR design standards (Horner et al., 2005; Kratochwill et al., 2013). Rigorous analysis of single-case design methodology is needed to determine the quality of the SSI evidence base and whether SSIs are an evidence-based intervention. The present study also extended the Mathur et al. (1998) study by (a) providing an updated synthesis of the literature and (b) applying proposed quality standards to SCR studies on SSIs. Because students with or at-risk of EBD and students with ASD are often characterized as having social skills deficits and challenging behavior (Denning, 2007; Kauffman, Mock, & Simpson, 2007), the present SCR meta-analysis included both populations. SSI studies from 1998 to 2014 were identified and evaluated for methodological quality according to SCR design standards. The following research questions were posed:

Research Question 1: What is the methodological quality of recent (1998–2014) SCR studies using SSIs to remediate challenging classroom behaviors for students with or at-risk of EBD and with ASD?

Research Question 2: What are the most common behaviors targeted for improvement through SSIs?

Research Question 3: For whom and under what conditions have SSIs been evaluated?

Method

Literature Search

A systematic review of the literature was conducted to identify SSI studies to be included and evaluated for methodological quality. Electronic searches of the following psychology and educational databases were conducted to identify the initial pool of articles to be included: PsycINFO, Educational Resources Information Center, Academic Search Complete, and Education Full Text. Search terms related to challenging behavior and SSIs were combined using the Boolean phrase AND. The first set of terms included the following: behavioral disorders, emotional disorders, seriously emotionally disturbed, disruptive behavior, social behavior problems, antisocial behavior, autism, social behavior problems OR conduct disorders. The second set of terms included the following: social skills training, social skills instruction OR social skills interventions.

Inclusion and Exclusion Criteria

The intent of this literature review was to evaluate the quality of the evidence of SSIs for students with or at-risk of EBD and students with ASD exhibiting challenging behavior. Studies were included in this literature review if (a) participants were educated in a school setting; (b) participants were described as students with or at-risk of EBD or identified with ASD; (c) participants were described as exhibiting challenging behavior in school settings; (d) the intervention implemented taught social skills related to school-based prosocial behaviors or positive social interactions; (e) outcome measures assessed school-related social skill behaviors as a primary predictor; (f) they used a single-case design methodology; and (g) they were written in English, conducted in the United States, and published in a peer-reviewed journal between 1998 and 2014. Because a secondary goal of this study was to update the Mathur et al. (1998) study, the search was limited to articles published between 1998 and 2014. Dissertations and book chapters were excluded because the goal of this review was to draw conclusions based on information that had been evaluated through the peer-review process.

The initial search yielded 1,067 articles. After 373 duplicate articles were removed, 694 titles and abstracts were evaluated to determine whether the article should be read in its entirety. References of identified studies were then reviewed to find other articles that met inclusion criteria. In addition, journals of articles meeting inclusion criteria were searched between 2013 and 2014 to find articles published that may not have been added to the electronic databases including Exceptional Children, Behavioral Disorders, Journal of Emotional and Behavioral Disorders, Journal of Autism and Developmental Disorders, Journal of Applied Behavior Analysis, Journal of Autism and Developmental Disorders, Journal of Positive Behavior Interventions, and Remedial and Special Education. A total of 22 articles were identified through the electronic search, and one article was identified through the extended search resulting in 23 articles included in the present literature review (see Figure 1). However, a total of 24 studies were analyzed because Blake, Wang, Cartledge, and Gardner (2000) consisted of two studies.

Figure 1.

Article selection flowchart.

Article Coding

Included articles were reviewed, and descriptive information was extracted for coding. Each article was coded for participant, setting, and study characteristics. Methodological design features were coded and used to determine the overall methodological quality of the study.

Participant and setting characteristics

Participants were coded on (a) age, (b) gender, (c) school level (d) ethnicity, (e) disability, and (f) educational setting. The age of each participant was recorded in years rounding down in instances where studies reported age in years and months. Gender was dichotomous and included male and female. School level of participants included three levels: early elementary (pre-kindergarten–fourth grade), intermediate/middle (fifth–eighth grade), and secondary (ninth–12th grade). Ethnicity included five levels: White, Black, Hispanic, Asian, and Mixed/Other. Disability was coded as (a) identified with or at-risk of EBD or (b) identified with ASD. Educational setting included three levels: general education, special education, or both.

Study and intervention characteristics

Study characteristics including experimental design, intervention development, intervention implementation, and dependent measures were coded.

Experimental design and intervention

The SCR design used in each study was recorded. Social skill interventions were categorized as teaching (a) positive social interactions, (b) prosocial classroom behaviors, or (c) mixed.

Intervention development

Two levels were included in intervention development: individualized or general. Instances where the social skill intervention was created or developed specifically for the target student and matched social skills instruction to student deficits were coded as individualized. All other interventions in which packaged or scripted social skills curriculums were implemented were categorized as general.

Intervention implementation

Intervention implementation was coded as alone or combined. Studies in which social skills training was the only intervention implemented were categorized as alone. Studies in which SSIs were combined with other strategies such as self-monitoring, cueing, group contingency, or other forms of reinforcement were categorized as combined.

Dependent measures

Dependent measures included three levels: social interaction skills, social classroom behavior, or mixed. The first category, social interaction, described behaviors in which the participant engaged in physical or verbal interactions aimed at peers or adults in school settings. The second category, classroom behavior, was used to describe behaviors in which the student displayed behavior in response to a classroom rule, procedure, or task. Target behaviors categorized as classroom behavior included behaviors such as refusal to complete academic tasks, temper tantrums, and property destruction. Target behaviors categorized as social interaction included behaviors such as physical or verbal aggression, hypersensitivity to redirection, and refusal to play cooperatively with peers. Behaviors involving social interaction and classroom behavior were categorized as mixed.

Dependent measure categories were pre-determined and largely based on the Caldarella and Merrell (1997) taxonomy of social skill dimensions which includes peer relations, self-management, academic, compliance, and assertion skills. Descriptive information from the studies also guided how behaviors were categorized. Behaviors relating to peer relations and assertion were categorized as social interaction skills. Academic and compliance behaviors were categorized as social classroom behavior skills. Self-management skill behaviors were categorized according to the context and description of the behavior. Similarly, instances where compliance behaviors directly involved another peer or adult were categorized as social interaction skills. For example, noncompliance in completing academic tasks was coded as classroom behavior skills because the student was described as refusing to complete an assignment which did not directly involve social interaction with a peer or adult. However, in a different study, noncompliance described as refusal to take turns during an activity was coded as social interaction because of the necessary involvement of a peer to take turns. Table 1 organizes social skill dependent measures into social interaction skills and social classroom behavior skills. The codebook containing operational definitions for all coded variables can be obtained from the first author.

Table 1.

Classification of Social Skills.

Social interaction	Classroom behavior
Interfering in the business of classmates	Off-task
Being overbearing	Losing homework and school supplies
Pushing to be in charge of any interaction	Shouting out answers
Yelling or crying when challenged by peers or corrected by a teacher	Becoming over excited and rowdy
Hypersensitivity to redirection	Making inappropriate noises
Physical threats to peers	Talking during lessons
Excessive talkbacks to adults	Daydreaming
Refusal to play cooperatively with peers	Class disruptions
Interruptions	Excessive use of inappropriate language
Physical aggression	Temper tantrums
Verbal aggression	Noncompliance
Being oppositional	Impulsive
General difficulties with peers	Property destruction
Invading the personal space of peers	Off-topic comments
Self-injury	Complaining/whining
Negative verbal interactions	Inappropriate tone of voice
Name calling	Cheating
Inappropriate gestures towards peers	Failure to complete tasks
Difficulties sharing	Failure to appropriately request for help
Difficulties taking turns	Poor adaption to changes in activity

Quality of Evidence Evaluation

Quality of evidence evaluations and the application of the WWC SCR Standards were modeled after Maggin, Chafouleas, Goddard, and Johnson (2011). Design standards were applied to each study. In addition, to help ensure design quality, standards for experimental control and opportunities to demonstrate effect were applied to each graph in studies that included multiple graphs. Studies with more than one graph received a two-part rating reported as (S, G). S signifies the overall quality rating of the study, and G represents the additional ratings for each graph within a study. For example, a study with a multiple baseline across behaviors design, and graphs for three participants, would receive three ratings: (S, G₁), (S, G₂), and (S, G₃); where S represents the overall quality rating and G_X represents the rating for each participant’s graph. The purpose of the two-part rating was to identify inconsistencies in experimental control and opportunities to demonstrate effect between the study as a whole and individual graphs within a study. Instances where studies only included one graph received only one rating.

Coding for design standards

The methodological quality rubric focused on research design and methods. The following standards were assessed: (a) systematic manipulation of the independent variable, (b) IOA, (c) fidelity of implementation, (d) opportunities to demonstrate effect, and (e) phase length. Each standard assessed was coded as Meets Design Standards, Meets Design Standards With Reservations, or Does Not Meet Design Standards (see Table 2).

Table 2.

Methodological Quality Rubric Design Standards, Definitions, and Codes.

Design standard	Definition [Codes]
Systematic manipulation of the independent variable	The independent variable (i.e., the intervention) must be systematically manipulated, with the researcher determining when and how the independent variable conditions change. [0, 2]
Inter-observer agreement	Each outcome variable must be measured systematically (i.e., repeatedly) over time by more than one assessor. [0, 2]
	The study needs to collect IOA on (a) at least 20% of the data points overall, and (b) indicates that IOA was collected on 20% of the data points within each condition (e.g., baseline, intervention). [0, 1, 2]
	The inter-assessor agreement must meet minimal thresholds of .80 for percentage agreement indices and .60 for kappa measures. [0, 2]
Fidelity of implementation	Implementation procedures must be assessed for accuracy and consistency by a second observer to ensure the intervention was provided as intended. [0, 1, 2]
	The study needs to collect fidelity on at least 20% of the intervention data points. [0, 2]
	Fidelity of implementation percentages should be at or above 80%. [0, 2]
Opportunities to demonstrate an effect	The study must include at least three attempts to demonstrate an intervention effect at three different points in time or with three different phase repetitions. An attempt to demonstrate a treatment effect refers explicitly to phase contrasts that are adjacent (e.g., AB). [0, 2]
Phase length	For a phase to qualify as an attempt to demonstrate an effect, the phase must have a minimum of three data points to meet with reservations, five data points to meet this standard. [0, 1, 2]
Overall quality rating	Review your responses on the above items to determine whether the study has met design standards, met design standards with reservations, or has not met design standards. [0, 1, 2]

Note. 0 = Does Not Meet Design Standards; 1 = Meets Design Standards With Reservations; 2 = Meets Design Standards; IOA = inter-observer agreement.

Systematic manipulation of the independent variable

The independent variable must be systematically manipulated to be coded Meets Design Standards. The researcher must determine when and how the SSI was implemented. If this was not done intentionally, this standard was not met, and the study was coded Does Not Meet Design Standards.

IOA

Each of the dependent variables must be measured repeatedly over time by more than one observer. Data on agreement between the two assessors should be collected on at least 20% of data points overall, and indicate that data were collected on 20% of data points in each condition, setting, or phase. In addition, agreement reported must meet the minimum thresholds of agreement indices: .80 for percentage of agreement and .60 for measures of Cohen’s kappa.

The WWC standard for IOA was sectioned into three parts: (a) collection of IOA, (b) percentage of data assessed for reliability from each condition, and (c) meeting minimum thresholds of agreement indices. If a study reported reliability data above .80 for percent agreement and/or .60 for Cohen’s kappa, on 20% of data overall and across all conditions, it received a rating of Meets Design Standards. If the study reported reliability for 20% of the data overall but did not indicate whether the 20% represented data from each condition, it received a rating of Meets Design Standards With Reservations. If a study reported reliability data that were below the minimum thresholds for percent agreement or Cohen’s kappa, it received a rating of Does Not Meet Design Standards.

Fidelity of implementation

The adapted quality rubric was modified to include a standard on fidelity. Fidelity of implementation criteria mirrored design standards for IOA requiring studies to collect data on fidelity of implementation for at least 20% of all conditions, with percentages of accurate implementation at or above 80%. To receive a rating of Meets Design Standards, a study must collect and report measures on the fidelity of implementation for 20% of data that are at or above 80% agreement. If a study collected informal measure of fidelity, it received a rating of Meets Design Standards With Reservations. If no measures of fidelity were reported, the study was coded Does Not Meet Design Standards.

Opportunities to demonstrate effect

Opportunities to demonstrate an effect were assessed by the number of adjacent phase changes within a study. To receive a rating of Meets Design Standards, the study must include at least three attempts to demonstrate an intervention effect at three different points in time. Attempts to demonstrate a treatment effect must occur between phase contrasts that are adjacent to one another. If this standard was not met, the study was coded Does Not Meet Design Standards. Examples of designs meeting this standard include ABAB reversal designs, multiple baseline designs (MBDs) with at least three baseline conditions, alternating treatment designs with at least three alternating treatments compared with a baseline condition or two alternating treatments compared with each other, changing criterion designs with at least three different criteria, and more complex variants of these designs. Examples of designs not meeting this standard include basic two-phase AB designs, and some variations of reversal designs (e.g., ABA and BAB).

Phase length

Opportunities to demonstrate an effect within a phase were determined by the number of data points within a phase, or the phase length. For a phase to qualify as an attempt to demonstrate an effect, the phase must have a minimum of three data points to Meets Design Standards With Reservations, and five or more data points are required to Meets Design Standards.

Overall quality ratings

Overall quality ratings were coded as Meets Design Standards, Meets Design Standards With Reservations, or Does Not Meet Design Standards. To receive an overall quality rating of Meets Design Standards, all items assessed in the quality rubric must be coded as Meets Design Standards. If any of the items assessed in the quality rubric are coded Meets Design Standards With Reservations, then the overall quality is coded as Meets Design Standards With Reservations. Similarly, if any item within the quality rubric is coded Does Not Meet Standards, then the overall quality rating is coded as Does Not Meet Design Standards.

The same logic was applied to studies receiving two-part ratings. For a study with multiple graphs to receive an overall quality rating of Meets Design Standards, all ratings for that study must be coded as Meets Design Standards. If one of the graphs within a study received a rating of Meets Design Standards With Reservations, the overall quality rating for that study received a rating of Meets Design Standards With Reservations. If one of the graphs within a study received a rating of Does Not Meet Design Standards, the overall quality rating for that study received a rating of Does Not Meet Design Standards.

Reliability

Reliability estimates were collected for (a) article selection, (b) descriptive coding of studies, and (c) the application of the methodological quality rubric. The first author independently coded each of the 24 included studies. A co-author served as an additional reviewer and coded a random sampling of included studies to check for reliability. Based on the recommendations of Hartmann, Barrios, and Wood (2004), 20% (n = 5) of included studies were checked for reliability by an additional reviewer. If the two coders’ results matched, this was coded as an agreement. If results between the two coders did not match, this was coded as a disagreement. Disagreements were resolved by discussing discrepancies until agreement was reached.

Simple percent agreement and Cohen’s kappa, a more conservative measure of reliability adjusting for chance agreement (Ary & Suen, 1989), were calculated for each area of reliability. Simple percent agreement was calculated by dividing the sum of agreements by the total number of agreements plus disagreements multiplied by 100. Cohen’s kappa was calculated using the Vassar stats website (Lowry, 2016). Percent agreement above 80% and Cohen’s kappa values above .60 were considered acceptable (Kratochwill et al., 2013).

Article selection

To check for the reliability of article selection, an additional reviewer, with expertise in the systematic literature review process, assessed 20% (n = 55) of eligible articles (n = 267) for the application of inclusion and exclusion criteria. Each of the 55 articles was categorized as include or exclude. Initial percent agreement and Cohen’s kappa results for article selection were 90% and .81, respectively. Disagreements related to the inclusion criteria of intervention type were the most common. Further explanation and examples of interventions using social skills to teach school-based prosocial behaviors or positive social interactions were discussed until 100% agreement was reached.

Descriptive coding

Articles included in the review were also checked for reliability on descriptive coding. More than 20% (n = 6) of the included studies were checked for descriptive coding reliability. Each study was coded by an additional coder trained in single-case design methodology and unaware of initial coding results. Reliability on 15 different descriptive items was assessed across the number of participants in a study. For example, a study with three participants allowed for 45 opportunities of agreement. Initial reliability using percent agreement was 87%. Cohen’s kappa was assessed for the coding of dependent measures because the majority of disagreements occurred in this area. Kappa for coding of dependent measures was .70. The most common disagreements related to behaviors described as noncompliance. To resolve differences, outcome behaviors were discussed and the reviewers referred back to the codebook containing operational definitions for all coded variables. All disagreements were discussed until 100% agreement was reached.

Methodological quality coding

The same five articles randomly selected for descriptive coding were checked for reliability on the application of the quality design rubric. Each graph within a study was assessed for opportunities to demonstrate effect and phase length. Initial percent agreement for graphs was 95%. Reliability on overall ratings of methodological quality at the study level was 100% for percent agreement and Cohen’s kappa.

Results

Participant and Study Characteristics

Participants and setting

A total of 75 participants were included across the 24 studies examined in this systematic literature review (see Table 3). Studies were published between 1998 and 2014. The majority of the participants were male (89%, n = 68). Although ethnicity was not reported for 23 participants (31%), Black (33%, n = 25) and White (31%, n = 23) were the two ethnic groups with the greatest representation. All studies involved students with behavioral difficulties with 15 participants (20%) at-risk for EBD, 29 participants (39%) identified with EBD, and 31 participants (41%) identified with ASD. The majority of students were educated in special education settings which included specialized schools for students with disabilities (47%, n = 35). Twenty-six participants (35%) were educated in general educational settings. Only 18% (n = 14) of participants were educated in both special education and general education settings (see Table 3).

Table 3.

Participant and Study Characteristics.

Characteristic	n	%
Gender
Male	66	88
Female	9	12
School Level
Early elementary	46	61
Intermediate/middle	15	20
Secondary	11	15
Not provided	3	4
Race/ethnicity
Black	25	33
White	23	31
Hispanic	1	1
Asian	1	1
Not provided	24	32
Mixed/Other	1	1
Disability
At-risk for EBD	15	20
EBD	29	39
ASD	31	41
Educational setting
Special education	35	47
General education	26	35
Both	14	18
Target behavior
Social interaction	26	35
Classroom behavior	30	40
Both	19	25
Intervention
Individualized	37	49
General	38	51
Experimental design
AB or reversal	4^a	16.5
Multiple baseline	16^a	67
Mixed	4^a	16.5

Note. EBD = emotional and behavioral disorder; ASD = autism spectrum disorder.

Number of studies.

Experimental design and intervention

Because one article included two studies, a total of 24 studies were evaluated. MBDs were the most commonly used experimental design (67%, n = 16) followed by AB or reversal designs (16. 5%, n = 4) and mixed designs (16. 5%, n = 4). The four mixed designs included MBD across subjects with randomization of intervention implementation (Bardon, Dona, & Symons, 2008), MBD across behaviors with two treatments (Blake et al., 2000, Study 1), a combined ABAB and MBD across behaviors (Hagopian, Kuhn, & Strother, 2009), and an MBD across setting and behaviors with reversals (Herring & Northup, 1998).

The majority of studies examined classroom behavior (46%, n = 11) as the dependent measure. Social interaction was the dependent measure for 29% (n = 7) of studies; 25% (n = 6) of the studies measured classroom behavior and social interaction skills. In 62.5% (n = 15) of studies, the intervention implemented was individualized to the student. The majority of studies implemented SSIs alone (62.5 %, n = 15), as opposed to combining the intervention with other behavioral strategies (see Table 4). Table 5 contains a list of the SSIs implemented.

Table 4.

Study Characteristics.

Study	n	Male	School level	Educational setting	Disability	Target behavior	Intervention implementation	Intervention development	Experimental design
Bardon, Dona, and Symons (2008)	3	1	pK–4	General	At-risk	Social interaction	Alone	General	Mixed
Blake, Wang, Cartledge, and Gardner (2000, Study 1)	3	3	5–8	Special	EBD	Mixed	Alone	General	Mixed
Blake et al. (2000, Study 2)	6	6	5–8 (3)pK–4 (3)	Special	EBD	Social interaction	Combined	General	MBD-participant
Blood et al. (2011)	1	1	5–8	Special	EBD	Classroom behavior	Combined	Individualized	AB (B + C)
Bock (2007, Article 1)	1	1	5–8	General	ASD	Classroom behavior	Alone	Individualized	MBD-setting
Bock (2007, Article 2)	4	4	pK–4	Special	ASD	Classroom behavior	Alone	Individualized	MBD-setting
Campbell and Tincani (2011)	3	2	pK–4	Special	ASD	Classroom behavior	Combined	Individualized	MBD-participant
Chan and O’Reilly (2008)	2	2	pK–4	General	ASD	Mixed	Alone	Individualized	MBD-behavior
Hagiwara and Myles (1999)	3	3	pK–4	Both	ASD	Classroom behavior	Alone	Individualized	MBD-setting
Hagopian, Kuhn, and Strother (2009)	1	1	NP	General	ASD	Social interaction	Combined	Individualized	Mixed
Hansen and Lignugaris-Kraft (2005)	9	9	5–8	Special	EBD	Social interaction	Combined	General	ABAB
Herring and Northup (1998)	1	1	pK–4	General	EBD	Classroom behavior	Combined	Individualized	Mixed
Hune and Nelson (2002)	4	3	pK–4	General	At-risk	Social interaction	Combined	General	AB
Keeling, Smith, Myles, Gagnon, and Simpson (2003)	1	0	pK–4	Both	ASD	Classroom behavior	Alone	Individualized	MBD-setting
Kelly and Shogren (2014)	4	4	9–12	Both	EBD	Classroom behavior	Combined	General	MBD-participant
Kuoch and Mirenda (2003)	3	3	pK–4	Special	ASD	Classroom behavior (2)Social interaction (1)	Alone (2)Combined (1)	Individualized	ABA/mixed
Lo et al. (2002)	5	4	pK–4	General	At-risk	Mixed	Combined	General	MBD-participant
Miller and Cole (1998)	1	1	NP	Special	EBD	Social interaction	Alone	Individualized	MBD-behavior
Miller et al. (2011)	3	3	pK–4	General	At-risk	Classroom behavior	Alone	Individualized	MBD-participant
Ozdemir (2008)	3	3	pK–4	General	ASD	Classroom behavior	Alone	Individualized	MBD-participant
Presley and Hughes (2000)	4	3	9–12	General	EBD	Mixed	Alone	General	MBD-participant
Scattone et al. (2006)	3	3	pK–4	General	ASD	Social interaction	Alone	General	MBD-participant
Schneider and Goldstein (2010)	3	3	pK–4 (2)5–8 (1)	Special (1)Both (2)	ASD	Classroom behavior	Combined	Individualized	MBD-participant
Simpson et al. (2004)	4	2	pK–4	Both	ASD	Mixed	Alone	Individualized	MBD-participant

Note. pK = pre-kindergarten; EBD = emotional and behavioral disorder; ASD = autism spectrum disorder; MBD = multiple baseline design; NP = not provided.

Table 5.

Social Skills Interventions.

Study	Social skill interventions
Bardon, Dona, and Symons (2008)	PATHS curriculum
Blake, Wang, Cartledge, and Gardner (2000, Study 1)	Working together curriculum
Blake et al. (2000, Study 2)	Working together curriculum
Blood et al. (2011)	Video-modeling with an iPod Touch
Bock (2007, Article 1)	SODA
Bock (2007, Article 2)	SODA
Campbell and Tincani (2011)	Power card strategy
Chan and O’Reilly (2008)	Social stories
Hagiwara and Myles (1999)	Multimedia social story
Hagopian, Kuhn, and Strother (2009)	Social skills training
Hansen and Lignugaris-Kraft (2005)	Social skills strategies program
Herring and Northup (1998)	Social skills instruction
Hune and Nelson (2002)	Problem-solving strategy
Keeling, Smith, Myles, Gagnon, and Simpson (2003)	Power card strategy
Kelly and Shogren (2014)	SDLMI
Kuoch and Mirenda (2003)	Social story
Lo et al. (2002)	Working together curriculum
Miller and Cole (1998)	Social skills training package
Miller et al. (2011)	Skillstreaming in elementary school child skill cards
Ozdemir (2008)	Social stories
Presley and Hughes (2000)	Triple A strategy (ASSESS, AMEND, and ACT)
Scattone et al. (2006)	Social stories
Schneider and Goldstein (2010)	Social stories
Simpson et al. (2004)	Video/computer-based social skills instruction

Note. PATHS = promoting Alternative Thinking Strategies; SODA = Stop-Observe-Deliberate-Act; SDLMI = self-determined learning model of instruction.

Methodological Quality

Overall ratings

Each study as well as each graph within each study was assessed with the quality rubric and given an overall rating of methodological quality. Twenty four studies and 43 graphs were evaluated (see Table 6).

Table 6.

Design Standards Ratings.

Article	Rating (S, G)
Bardon, Dona, and Symons (2008)	0
Blake, Wang, Cartledge, and Gardner (2000)
Graph 1	(0, 0)
Graph 2	(0, 0)
Graph 3	(0, 0)
Blood et al. (2011)	1
Bock (2007, Article 1)	1
Bock (2007, Article 2)
Graph 1	(1, 1)
Graph 2	(1, 1)
Graph 3	(1, 1)
Campbell and Tincani (2011)	2
Chan and O’Reilly (2008)
Graph 1	(1, 1)
Graph 2	(1, 1)
Hagiwara and Myles (1999)
Graph 1	(0, 0)
Graph 2	(0, 0)
Hagopian, Kuhn, and Strother (2009)	0
Hansen and Lignugaris-Kraft (2005)
Graph 1	(0, 1)
Graph 2	(0, 0)
Graph 3	(0, 1)
Graph 4	(0, 1)
Graph 5	(0, 1)
Graph 6	(0, 1)
Graph 7	(0, 1)
Graph 8	(0, 1)
Graph 9	(0, 1)
Herring and Northup (1998)	0
Hune and Nelson (2002)
Graph 1	(0, 0)
Graph 2	(0, 0)
Graph 3	(0, 0)
Graph 4	(0, 0)
Keeling, Smith, Myles, Gagnon, and Simpson (2003)	0
Kelly and Shogren (2014)
Graph 1	(0, 0)
Graph 2
Kuoch and Mirenda (2003)
Graph 1	(0, 0)
Graph 2	(0, 0)
Graph 3	(0, 2)
Lo et al. (2002)	1
Miller and Cole (1998)	1
Miller et al. (2011)	0
Ozdemir (2008)	1
Presley and Hughes (2000)	1
Scattone et al. (2006)	1
Schneider and Goldstein (2010)
Graph 1	(0, 1)
Graph 2	(0, 0)
Simpson et al. (2004)	1

Note. 0 = Does Not Meet Design Standards; 1 = Meets Design Standards With Reservations; 2 = Meets Design Standards; S = rating by study; G = rating by graphs.

Study ratings

Twenty-four studies were assessed for methodological quality. Only one study received a rating of Meets Design Standards (Campbell & Tincani, 2011). About 42% (n = 10) of the studies evaluated received a rating of Meets Design Standards With Reservations. The remaining 54% (n = 13) of studies Did Not Meet Design Standards (see Table 7).

Table 7.

Methodological Quality Results.

Study	IV	IOA-A	IOA-B	IOA-C	Fidelity A	Fidelity B	Fidelity C	Opportunities for effect	Data points	Overall rating
Bardon, Dona, and Symons (2008)	Y	Y	N	Y	I	N	N	N	≥5	Does not meet
Blake, Wang, Cartledge, and Gardner (2000)	Y	Y	N	Y	F	Y	Y	N	<3	Does not meet
Blake et al. (2000)	Y	Y	N	Y	F	Y	Y	Y	≥5	Reservations
Blood et al. (2011)	Y	Y	Y	Y	N	N	N	N	≥3	Does not meet
Bock (2007)	Y	Y	N	Y	F	Y	Y	Y	≥3	Reservations
Bock (2007)	Y	Y	Y	Y	F	Y	Y	Y	≥3	Reservations
Campbell and Tincani (2011)	Y	Y	Y	Y	F	Y	Y	Y	≥5	Meets
Chan and O’Reilly (2008)	Y	Y	N	Y	F	Y	Y	Y	≥3	Reservations
Hagiwara and Myles (1999)	Y	Y	N	Y	N	N	N	Y	≥5	Does not meet
Hagopian, Kuhn, and Strother (2009)	Y	Y	N	Y	N	N	N	Y	≥3	Does not meet
Hansen and Lignugaris-Kraft (2005)	Y	Y	N	Y	F	Y	Y	N	≥3	Does not meet
Herring and Northup (1998)	Y	Y	N	Y	N	N	N	N	≥3	Does not meet
Hune and Nelson (2002)	Y	Y	N	Y	F	Y	Y	N	≥3	Does not meet
Keeling, Smith, Myles, Gagnon, and Simpson (2003)	Y	N	N	N	N	N	N	Y	≥5	Does not meet
Kelly and Shogren (2014)	Y	Y	Y	Y	F	Y	Y	N	≥5	Does not meet
Kuoch and Mirenda (2003)	Y	Y	N	Y	F	Y	Y	N	≥3	Does not meet
Lo et al. (2002)	Y	Y	N	Y	I	Y	N	Y	≥5	Reservations
Miller and Cole (1998)	Y	Y	N	Y	F	Y	Y	Y	≥5	Reservations
Miller et al. (2011)	Y	Y	N	Y	N	N	N	Y	≥5	Does not meet
Ozdemir (2008)	Y	Y	N	Y	F	Y	Y	Y	≥3	Reservations
Presley and Hughes (2000)	Y	Y	Y	Y	F	Y	Y	Y	≥3	Reservations
Scattone et al. (2006)	Y	Y	Y	Y	F	Y	Y	Y	≥3	Reservations
Schneider and Goldstein (2010)	Y	Y	N	Y	F	Y	Y	N	≥5	Does not meet
Simpson et al. (2004)	Y	Y	Y	Y	F	Y	Y	Y	≥3	Reservations

Note. IV = internal validity; IOA = inter-observer agreement; Y = yes; N = no; F = formal; I = informal. IOA-A = 20% of data; IOA-B = 20% of data with in each condition; IOA-C = minimal thresholds; Fidelity B = 20% of data; Fidelity C = minimal thresholds.

Individual graph ratings

The areas of opportunities to demonstrate effect and phase length from the methodological quality rubric were also applied to individual graphs within a study for a total of 43 graphs evaluated. Almost half of the designs (n = 19) did not meet design standards. About 45% of the designs (n = 22) met design standards with reservations. Only two designs met design standards (Campbell & Tincani, 2011; Kuoch & Mirenda, 2003).

Individual Design Standard Ratings

The quality rubric assessed five standards: (a) systematic manipulation of the independent variable, (b) IOA, (c) fidelity of implementation, (d) opportunities to demonstrate effect, and (e) phase length. Failing to meet all design standards for IOA was the primary reason studies received the rating of Meets Design Standards With Reservations (70%, n = 17), followed by phase length (58%, n = 14). Nine studies did not include at least three attempts to demonstrate intervention effects at three different points in time.

Systematic manipulation of the independent variable

All of the 24 included studies systematically introduced the SSIs. One study (Bardon et al., 2008) used a randomization technique to determine when the intervention would be implemented with each participant.

IOA

The majority of studies (96%, n = 23) reported IOA on 20% of data overall, at or above 80% for percent agreement or 60% for Cohen’s kappa. Only one study (Keeling, Smith, Myles, Gagnon, & Simpson, 2003) failed to provide any information on reliability. However, more than 70% (n = 17) of studies did not specify whether IOA was collected during each condition.

Fidelity of implementation

Formal and informal fidelity of implementation data were reported for 66% (n = 16) and 8% (n = 2) of studies, respectively. Six studies (25%) did not report any data on fidelity of implementation.

Opportunities to demonstrate effect and phase length

Of the 24 studies evaluated, 62.5% (n = 15) of studies included sufficient opportunities to demonstrate intervention effects for at least three different points in time, and 42% (n = 10) of studies utilized designs that included at least five data points per phase.

Discussion

The purpose of this systematic literature review and quality evaluation was to update and evaluate the evidence base of SSIs for students with or at-risk of EBD and students with ASD. Results from the quality evaluation provided information on the methodological rigor within the existing social skills literature. Currently in the field of special education, there is a move to evaluate the methodological quality of studies being conducted due to standards now available for determining quality SCR studies (e.g., Kratochwill et al., 2010). Although some may have concerns about applying recently developed criteria to studies retrospectively, the application of design standards to the social skills literature provides a better understanding of the methodological quality of research that has been historically conducted (Kratochwill et al., 2013). Moreover, application of these design standards provides important information for moving the field of SSIs forward by identifying problematic design issues, which, in turn, will hopefully improve the quality and reproducibility of research conducted in this area. Three research questions were investigated in this study. Findings for each are discussed as follows.

Major Findings

The first question focused on the methodological quality of single-case studies researching SSIs for students with or at-risk for EBD and students with ASD who exhibit challenging behavior. After applying the quality rubric to each of the 24 included studies, results indicated that the methodological quality of the evidence base in this area is not ideal, but holds some potential. More than half of studies identified failed to meet minimum design standards with or without reservations (54%, n = 13). However, 10 studies met standards with reservations. From this evaluation, three areas of methodological weakness were identified: reliability, fidelity of implementation, and experimental control.

First, although 96% (n = 23) of the studies evaluated included acceptable data on reliability, only 29% (n = 7) of studies specified conducting IOA across all participants and phases. Adequate collection and reporting of IOA increases confidence in the reliability of effects reported in research literature. Fifteen of the included studies reported positive intervention effects but did not include adequate IOA measures. Insufficient IOA decreases the overall quality of data collection and, in the present study, decreases confidence in the results for 15 studies (e.g., Bardon et al., 2008; Blake et al., 2000; Chan & O’Reilly, 2008; Schneider & Goldstein, 2010).

Second, six studies did not include information on fidelity of implementation (Blood et al., 2011; Hagiwara & Miles, 1999; Hagopian et al., 2009; Herring & Northup, 1998; Keeling et al., 2003; Miller et al., 2011). The quality rubric developed was based on the WWC design standards but was modified to include criteria on the fidelity of implementation. The addition of fidelity of implementation criteria weakened the overall quality of the evidence for SSIs. Three studies originally rated as Does Not Meet Standards appear to Meet Standards With Reservations when fidelity of implementation is not considered (Hagiwara & Miles, 1999; Hagopian et al., 2009; Miller et al., 2011).

Although fidelity of implementation is not included in the WWC standards, reporting fidelity data ensure that interventions are provided as intended. Fidelity of implementation data is a key element in the description of intervention procedures (Horner et al., 2005). Lack of fidelity measures not only limit confidence in treatment efficacy but also hinder the ability for future researchers to replicate effects. Given that replication is essential to determining EBPs, the absence of fidelity measures is a real problem. Furthermore, without the measurement of treatment fidelity, it is unknown if a SSI was ineffective due to an ineffective strategy or because it was poorly implemented.

The third area of concern, experimental control, is foundational in SCR (Horner et al., 2005). Design standards related to opportunities to demonstrate effect and phase length evaluated the adequacy of experimental control. It is generally known that appropriate experimental control validates the functional relationship between the independent and dependent variable. Of the studies evaluated, 63% (n = 15) included three or more opportunities to demonstrate intervention effect at three different points in time. However, the remaining nine studies (37%) reported positive effects without sufficient experimental control to support their findings (Bardon et al., 2008; Blake et al., 2000; Blood et al., 2011; Hansen & Lignugaris-Kraft, 2005; Herring & Northup, 1998; Hune & Nelson, 2002; Kelly & Shogren, 2014; Kuoch & Mirenda, 2003; Schneider & Goldstein, 2010). Additional single-case studies with proper methodological rigor related to experimental control are needed to strengthen the evidence of SSIs for students with or at-risk of EBD and students with ASD who display challenging behavior.

The second question focused on identifying the most common school-related behaviors targeted for improvement through SSIs. Results were organized into two categories. The first category, social interaction, described behaviors in which the participant engaged in physical or verbal interactions with peers or adults in school settings. This included behaviors such as physical or verbal aggression and difficulties interacting with peers or adults. The second category, classroom behavior, was used to describe all other behaviors that did not explicitly focus on social interactions with peers or adults. Target behaviors categorized as classroom behavior included behaviors such as noncompliance, temper tantrums, and property destruction. The three most common behaviors across all studies in both categories were noncompliance, negative verbal interactions, and class disruptions.

Specific to the category of classroom behavior, off-task behavior, class disruptions, and noncompliance were the three most common targeted behaviors. Specific to the category of social interaction, physical or verbal aggression and negative verbal interactions were the most common targeted behaviors. The majority of participants with target behaviors in the social interaction category were students with or at-risk for EBD. Conversely, the majority of participants with target behaviors in the classroom behavior category were students with ASD. Social stories and video modeling were often used to teach students with ASD social skills related to appropriate classroom behavior. Perhaps one reason studies including students with or at-risk of EBD primarily focused on social interaction skills is that verbal and physical aggressions, which are more characteristic for this population, are more visible than the social interaction deficits of students with ASD, such as off-topic comments during conversations.

The third area of inquiry focused on identifying for whom and under what conditions SSI research has been conducted. Information was collected on the percentages of included studies that (a) were conducted in preschool, elementary, and secondary settings; (b) researched the effects of SSIs for students with or at-risk of EBD; (c) researched the effects of SSIs for students with ASD; (d) individualized treatment to the social skill deficits of the student; and (e) implemented SSIs alone versus combining social skills instruction with other behavioral strategies. Results focused on this third area showed that the majority of studies were conducted in early elementary grades prekindergarten through fourth grade. Consistent with previous research, findings indicated a lack of research on SSIs for students at the intermediate/middle and secondary settings (Maag, 2006). An equal number of studies included participants with ASD compared with participants with or at-risk for EBD. However, at the participant level, more research was conducted with participants with or at-risk of EBD (n = 44) than students with ASD (n = 31).

Similarly, 15 studies individualized the SSI to the student. However, the numbers were almost even at the participant level (individualized, n = 37; general, n = 38). Interventions that were individualized were primarily in studies including students with ASD. In contrast, general SSIs were implemented for groups of students with or at-risk for EBD. This is most likely due to the fact that students with or at-risk for EBD are often separated from the general learning environment and grouped with other students displaying similar behaviors. In addition, the majority of studies implemented SSIs alone (62.5 %, n = 15) as opposed to combining the intervention with other behavioral strategies. For nine studies, SSIs were combined with one of the following strategies: visual reminders, peer training, self-management, reinforcement, corrective feedback, and group contingency.

Limitations and Future Research

Several limitations should be considered when interpreting the findings of this review. First, although efforts were made to identify all studies meeting inclusion criteria, all suitable studies may not have been identified because consideration for inclusion was limited to peer-reviewed studies conducted between 1998 and 2014. It is possible that evaluating the complete body of research related to SSIs could have affected results. In addition, limiting the search to peer-reviewed literature can result in the exclusion of relevant studies that may not be published. Second, the majority of included studies were conducted in pre-kindergarten through fourth-grade settings. Therefore, care should be taken when interpreting results for students in intermediate and secondary settings. Third, the specific focus of this study was on (a) students with or at-risk of EBD, (b) students with ASD, and (c) outcome measures for the remediation of social interaction or classroom behaviors only. These conditions limit the generalization of findings to similar participant and study characteristics.

Fourth, results presented on SCR design standards pertain only to the methodological quality of included SSI studies and not on intervention effect. Many reviews apply criteria for demonstrating evidence of a relationship between an independent and dependent variable to studies that meet standards (with or without reservations). Because the focus of this study was to evaluate the methodological quality of the social skills literature, studies were not evaluated for evidence of effect. Therefore, results of this review relate only to methodological quality of the SSI literature base without assessing whether SSIs are effective for students with or at-risk of EBD or students with ASD who exhibit challenging behavior. Future research should examine SSI effectiveness for these populations.

Results of this systematic literature review on the quality of SCR for SSIs suggest that future research be conducted with greater methodological rigor, particularly in the areas of IOA, fidelity of implementation, and experimental control. If SCR studies are to be used to identify EBPs, then their results are directly linked to the methodological quality of the study design (Cook, Tankersley, & Landrum, 2009). Future research should adhere to, at a minimum, guidelines set by WWC (Kratochwill et al., 2010). Methodological quality at the study, design, or participant level may need to be assessed to accurately capture features of research design. Additional research on the effectiveness of SSIs from studies meeting SCR design standards is also needed to determine SSIs as an EBP.

Since the development of SCR design standards, areas of the literature that were assumed to be evidence-based (i.e., SSI) were revealed to contain methodologically flawed research (Chard, Ketterlin-Geller, Baker, Doabler, & Apichatabutra, 2009; Maggin et al., 2011). Although all studies have methodological limitations, flawed research that does not meet basic design standards is of concern. The concern regarding methodological quality is shared not only in special education but also in the scientific community (Open Science Collaboration, 2015). High quality studies that are reproducible are an important component of establishing the evidence base regarding a particular practice. Reproducibility is especially important in SCR which depends upon across study or systematic replication to establish external validity (Horner et al., 2005). If replications are based on poor quality research, then the evidence base is undermined, and many practices may be put forward as promising or evidence-based, when in fact, the evidence supporting those practices is questionable.

This brings to light the role of the peer-review process in supporting the publication of high-quality research. Although evidence reviews are important for developing recommendations for practice, they may also provide additional guidance for peer reviewers and editorial boards regarding the level of research accepted for publication in peer-reviewed journals. Reviewers have a responsibility to voice concerns for studies that do not meet basic design criteria and are methodologically flawed to a point that their results are questionable. Studies should be scrutinized on their level of methodological quality, and current SCR methodological standards put forward by WWC are arguably, a reasonable way of applying “minimum” quality standards.

The proposed WWC design standards provide a solid framework for assessing methodological rigor in SCR. Additional areas that should be considered are fidelity of implementation and social validity. Fidelity is an important factor in increasing confidence that the intervention effects observed were related to the intervention. Previous research as well as the current study provides compelling support for the inclusion of fidelity measures as part of the WWC design standards (Kratochwill et al., 2013; Wolery, 2013). It is also seen as “best practice” (Keller-Margulis, 2012). Replication is essential to the identification of an EBP (Horner et al., 2005), and fidelity of implementation is a key variable in replicating intervention effects. Future research should measure fidelity and consider including this area as a basic standard for SCR design.

Another area that may warrant consideration for inclusion as a design standard is social validity. Social validity was not included as part of the quality rubric used in this study because it speaks more to intervention effectiveness rather than methodology. However, measures of social validity would add an additional level of rigor to the current SCR design standards. Two major barriers to the implementation of EBPs in school settings are the lack of time and inadequate support from administrators (Kratochwill & Shernoff, 2004). Future research on the social validity of SSIs, including feasibility of application, can further advance this area of research. For example, surveys or post-intervention interviews on the real world application of interventions would provide meaningful data on barriers to implementation and success. Measures of social validity are vital if the ultimate goal is to transfer EBPs into practice. Finally, future studies are needed to extend the research on SSI to other populations of individuals with disabilities (e.g., adults with disabilities), in a variety of settings (e.g., naturalistic, home, or employment settings), and other behaviors of interest (e.g., problem solving, safety skills, or social competence).

In conclusion, this review found that SSI research using SCR contained several methodological problems. Although historically SSIs have been thought of as an EBP, Gresham (1998) asked the question of whether we should “raze, remodel, or rebuild” the social skills literature when pointing out the historically weak effects found in meta-analyses conducted on social skills training. In paving the way forward, we offer that an EBP should be based on the best available evidence. However, there should come a time in a line of research that studies conducted reach a level of maturity in terms of both quality and quantity to conclude that a practice is evidence-based. It appears that the literature on SSIs needs more methodologically rigorous studies to be able to identify them as EBPs. We believe the application of design standards to the SSI literature provides important information for “remodeling” this area of research. Identifying problematic design issues, we hope, will inform the development of future studies that use more rigorous single-case methods.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Ang

R. P.

Hughes

J. N.

(2001). Differential benefits of skills training with antisocial youth based on group composition: A meta-analytic investigation. School Psychology Review, 31(2), 164–185.

Ary

Suen

H. K.

(1989). Analyzing qualitative behavioral observational data. Mahwah, NJ: Lawrence Erlbaum.

Beelmann

Pfingsten

Lösel

(1994). Effects of training social competence in children: A meta-analysis of recent evaluation studies. Journal of Clinical Child Psychology, 23, 260–271.

Bellini

Peters

J. K.

Benner

Hopf

(2007). A meta-analysis of school-based social skills interventions for children with autism spectrum disorders. Remedial and Special Education, 28, 153–162.

Caldarella

Merrell

K. W.

(1997). Common dimensions of social skills of children and adolescents: A taxonomy of positive behaviors. School Psychology Review, 26, 264–278.

Cappadocia

M. C.

Weiss

J. A.

(2011). Review of social skills training groups for youth with Asperger syndrome and high functioning autism. Research in Autism Spectrum Disorders, 5(1), 70–78.

Chard

D. J.

Ketterlin-Geller

L. R.

Baker

S. K.

Doabler

Apichatabutra

(2009). Repeated reading interventions for students with learning disabilities: Status of the evidence. Exceptional Children, 75, 263–281.

Cook

B. G.

Odom

S. L.

(2013). Evidence-based practices and implementation science in special education. Exceptional Children, 79(2), 135–144.

Cook

B. G.

Tankersley

Landrum

T. J.

(2009). Determining evidence-based practices in special education. Exceptional Children, 75, 365–383.

10.

Cook

C. R.

Gresham

F. M.

Kern

Barreras

R. B.

Thornton

Crews

S. D.

(2008). Social skills training for secondary students with emotional and/or behavioral disorders: A review and analysis of the meta-analytic literature. Journal of Emotional and Behavioral Disorders, 16, 131–144.

11.

Denning

C. B.

(2007). Social skills interventions for students with Asperger syndrome and high-functioning autism: Research findings and implications for teachers. Beyond Behavior, 16(3), 16–23.

12.

Elliott

S. N.

Gresham

F. M.

(1993). Social skills interventions for children. Behavior Modification, 17, 287–313.

13.

Flynn

Healy

(2012). A review of treatments for deficits in social skills and self-help skills in autism spectrum disorder. Research in Autism Spectrum Disorders, 6(1), 431–441.

14.

Forness

S. R.

Freeman

S. F. N.

Paparella

Kauffman

J. M.

Walker

H. M.

(2012). Special education implications of point and cumulative prevalence for children with emotional or behavioral disorders. Journal of Emotional and Behavioral Disorders, 20, 4–18.

15.

Gillis

J. M.

Butler

R. C.

(2007). Social skills interventions for preschoolers with autism spectrum disorder: A description of single-subject design studies. Journal of Early and Intensive Behavior Intervention, 4, 532–547.

16.

Gresham

F. M.

(1981). Social skills training with handicapped children: A review. Review of Educational Research, 51, 139–176.

17.

Gresham

F. M.

(1985). Utility of cognitive-behavioral procedures for social skills training with children: A critical review. Journal of Abnormal Child Psychology, 13, 411–423.

18.

Gresham

F. M.

(1998). Social skills training: Should we raze, remodel, or rebuild? Behavior Disorders, 24(1), 19–25.

19.

Gresham

F. M.

MacMillan

D. L.

(1997). Social competence and affective characteristics of students with mild disabilities. Review of Educational Research, 67, 377–415.

20.

Gresham

F. M.

Sugai

Horner

R. H.

(2001). Interpreting outcomes of social skills training for students with high-incidence disabilities. Exceptional Children, 67, 331–344.

21.

Hartmann

D. P.

(1977). Considerations in the choice of interobserver reliability estimates. Journal of Applied Behavior Analysis, 10(1), 103–116.

22.

Hartmann

D. P.

Barrios

B. A.

Wood

D. D.

(2004). Principles of behavioral observation. In Haynes

S. N.

Hieby

E. M.

(Eds.), Comprehensive handbook of psychological assessment, behavioral assessment (Vol. 3, pp. 108–127). New York, NY: John Wiley.

23.

Horner

R. H.

Carr

E. G.

Halle

McGee

Odom

Wolery

(2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71, 165–179.

24.

Horner

R. H.

Kratochwill

T. R.

(2012). Synthesizing single-case research to identify evidence-based practices: Some brief reflections. Journal of Behavioral Education, 21, 266–272.

25.

Kauffman

J. M.

Mock

D. R.

Simpson

R. L.

(2007). Problems related to underservice of students with emotional or behavioral disorders. Behavioral Disorders, 33(1), 43–57.

26.

Keller-Margulis

M. A.

(2012). Fidelity of implementation framework: A critical need for response to intervention models. Psychology in the Schools, 49, 342–352.

27.

Kratochwill

T. R.

Hitchcock

Horner

R. H.

Levin

J. R.

Odom

S. L.

Rindskopf

D. M.

Shadish

(2010). Single-case designs technical documentation. Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf

28.

Kratochwill

T. R.

Hitchcock

J. H.

Horner

R. H.

Levin

J. R.

Odom

S. L.

Rindskopf

D. M.

Shadish

W. R.

(2013). Single-case intervention research design standards. Remedial and Special Education, 34, 26–38.

29.

Kratochwill

T. R.

Shernoff

E. S.

(2004). Evidence-based practice: Promoting evidence-based interventions in school psychology. School Psychology Review, 33(1), 34.

30.

Lane

K. L.

Parks

R. J.

Kalberg

J. R.

Carter

E. W.

(2007). Systematic screening at the middle school level: Score reliability and validity of the Student Risk Screening Scale. Journal of Emotional and Behavioral Disorders, 15, 209–222.

31.

Logan

L. R.

Hickman

R. R.

Harris

S. R.

Heriza

C. B.

(2008). Single-subject research design: Recommendations for levels of evidence and quality rating. Developmental Medicine & Child Neurology, 50, 99–103.

32.

Lowry

(2016). VassarStats: Statistical Computation. Available from http://vassarstats.net/

33.

Maag

J. W.

(2006). Social skills training for students with emotional and behavioral disorders: A review of reviews. Behavioral Disorders, 32(1), 4–17.

34.

Maggin

D. M.

Chafouleas

S. M.

Goddard

K. M.

Johnson

A. H.

(2011). A systematic evaluation of token economies as a classroom management tool for students with challenging behavior. Journal of School Psychology, 49, 529–554.

35.

Mathur

S. R.

Kavale

K. A.

Quinn

M. M.

Forness

S. R.

Rutherford

R. B.

(1998). Social skills interventions with students with emotional and behavioral problems: A quantitative synthesis of single-subject research. Behavioral Disorders, 23(3), 193–201.

36.

Mathur

S. R.

Rutherford

R. B.

Jr. (1996). Is social skills training effective for students with emotional or behavioral disorders? Research issues and needs. Behavioral Disorders, 22(1), 21–28.

37.

McIntosh

Vaughn

Zaragoza

(1991). A review of social interventions for students with learning disabilities. Journal of Learning Disabilities, 24, 451–458.

38.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, 943–951. doi:10.1126/science.aac4716

39.

Quinn

M. M.

Kavale

K. A.

Mathur

S. R.

Rutherford

R. B.

Forness

S. R.

(1999). A meta-analysis of social skill interventions for students with emotional or behavioral disorders. Journal of Emotional and Behavioral Disorders, 7, 54–64.

40.

Rao

P. A.

Beidel

D. C.

Murray

M. J.

(2008). Social skills interventions for children with Asperger’s syndrome or high-functioning autism: A review and recommendations. Journal of Autism and Developmental Disorders, 38, 353–361.

41.

Reichow

Volkmar

F. R.

(2010). Social skills interventions for individuals with autism: Evaluation for evidence-based practices within a best evidence synthesis framework. Journal of autism and developmental disorders, 40, 149–166.

42.

Schneider

B. H.

(1992). Didactic methods for enhancing children’s peer relations: A quantitative review. Clinical Psychology Review, 12, 363–382.

43.

Scruggs

T. E.

Mastropieri

M. A.

Casto

(1987). The quantitative synthesis of single-subject research: Methodology and validation. Remedial and Special Education, 8, 24–33.

44.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference. Belmont, CA: Wadsworth Cengage Learning.

45.

Shavelson

R. J.

Towne

(2002). Scientific research in education. Washington, DC: National Academies Press.

46.

Spear

C. F.

Strickland-Cohen

M. K.

Romer

Albin

R. W.

(2013). An examination of social validity within single-case research with students with emotional and behavioral disorders. Remedial and Special Education, 34, 357–370.

47.

Tate

McDonald

Perdices

Togher

Schultz

Savage

(2008). Rating the methodological quality of single-subject designs and n-of-1 trials: Introducing the Single-Case Experimental Design (SCED) scale. Neuropsychological Rehabilitation, 18, 385–401.

48.

Walker

H. M.

Ramsey

Gresham

F. M.

(2004). Antisocial behavior in school: Evidence-based practices. Belmont, CA: Wadsworth.

49.

Wang

Cui

Parrila

(2011). Examining the effectiveness of peer-mediated and video-modeling social skills interventions for children with autism spectrum disorders: A meta-analysis in single-case research using HLM. Research in Autism Spectrum Disorders, 5, 562–569.

50.

White

S. W.

Keonig

Scahill

(2007). Social skills development in children with autism spectrum disorders: A review of the intervention research. Journal of Autism and Developmental Disorders, 37, 1858–1868.

51.

Wolery

(2013). A commentary: Single-case design technical document of the What Works Clearinghouse. Remedial and Special Education, 43, 39–43.

52.

Wolery

Ezell

H. K.

(1993). Subject descriptions and single-subject research. Journal of Learning Disabilities, 26, 642–647.