Abstract
In this article, we reviewed social validity in single-case research studies that focused on interventions for students who have either been identified as having, or as at-risk for emotional and behavioral disorders. This review focused on studies from four peer-reviewed journals known to publish single-case research with this population: the Journal of Applied Behavior Analysis, Journal of Emotional and Behavioral Disorders, Journal of Positive Behavior Interventions, and Behavioral Disorders. We reviewed 22 studies published from January 2008 to November 2011 that met inclusion criteria. The purposes of this review were (a) to evaluate how researchers are addressing social validity as defined by Horner and colleagues (2005) and (b) to explore how single-case researchers are measuring social validity. Overall, results indicated that the research studies included in this review addressed socially important questions within typical contexts, but that most did not address social validity explicitly.
Keywords
Recently, single-case research has been recognized as an approach for establishing evidence-based practices (Horner et al., 2005; Kratochwill et al., 2010). As the implications of single-case research become increasingly far reaching and focused on socially meaningful outcomes, single-case researchers are expected to not only document a functional relation between manipulating the independent variable and change in the dependent variable, but also to demonstrate that the independent variable under investigation is perceived as socially acceptable and feasible by typical stakeholders (Horner et al., 2005). In other words, single-case researchers may be able to demonstrate sizable effects, but if the intervention is not well received by educators, students, and their families, it is unlikely that the intervention will be adopted or maintained.
Applied behavior analysis, and more recently positive behavior support, emphasize the importance of valued clinical outcomes as an essential feature of behavioral research and practice (e.g., Baer, Wolf, & Risley, 1968; E. G. Carr et al., 2002). In his seminal article addressing the importance of social validity, Montrose Wolf (1978) argued that making socially meaningful contributions through research involves investigating the relationship between what is objectively measured and how it is perceived by its consumers. In the 1970s, the practice of asking participants to report what they think was novel and counterintuitive to the singular focus on directly observable and measurable behaviors that was typical in applied behavior analytic research. Nonetheless, it had become apparent that if practices were going to be adopted, they needed to be not only effective but also well regarded. To avoid making assumptions about the subjective judgments of consumers, researchers needed to begin using methods to assess social validity. Wolf (1978) therefore outlined what he termed three levels of social validity: (a) Are the specific behavioral goals really what society wants? (b) Do the ends justify the means? That is, do the participants, caregivers and other consumers consider the treatment procedures acceptable? and (c) Are the consumers satisfied with the results? . . . including any unpredicted ones? (p. 207)
More recently, Horner et al. (2005) expanded and further defined the features of social validity identified by Wolf in 1978 and incorporated these into their recommended guidelines for documenting evidence-based practices using single-case research. The quality indicators of social validity identified by Horner et al. emphasize dependent variables (i.e., behavioral goals and outcomes) valued by society, acceptable procedures for those individuals implementing interventions within day-to-day settings, and contain additional standards of social validity such as maintenance. The four quality indicators of social validity identified by Horner et al. are (1) the dependent variable is socially important; (2) the magnitude of change in the dependent variable resulting from the intervention is socially important; (3) implementation of the independent variable is practical and cost effective; and (4) social validity is enhanced by implementation of the independent variable over extended time periods, by typical intervention agents, in typical physical and social contexts. (2005, p. 174)
Since their publication, the indicators recommended by Horner et al. (2005) for evaluating and identifying evidence-based practices using single-case research have been used to guide research reviews of specific practices (e.g., Gardner, Spencer, Boelter, Dubard, & Jennett, 2012; Lane, Kalberg, & Shepcaro, 2009) and incorporated into the standards of evidence adopted by major establishments such as the What Works Clearinghouse (Kratochwill et al., 2010). The measurement of social validity can provide researchers with important knowledge about how to adapt and package interventions so that they are consumable and contextually relevant, facilitate researcher understanding of social validity, and reveal unexpected, yet important effects of interventions beyond documented changes of dependent variables (Strain, Barton, & Dunlap, 2012). When social validity is assessed and reported by researchers, that information can also be used by consumers to select practices that are not only empirically supported by single-case research, but that have also been demonstrated to be socially valid and feasible for implementation.
The development and implementation of intervention services for students with emotional and behavioral disorders (EBD) represents one area in which there is a critical need for more understanding of the effectiveness and social validity of various practices. Students with EBD are at-risk of a myriad of negative long-term outcomes including low overall academic achievement, high rates of suspension and absenteeism, low graduation rates, and poor postschool outcomes (e.g., Cook, Landrum, Tankersley, & Kauffman, 2003; Wagner, Kutash, Duchnowski, Epstein, & Sumi, 2005). These students, who make up a small, and often underidentified and underserved population, are characterized by complex and highly disruptive behavior problems that are usually intense and highly variable, making it difficult to develop generalizable interventions. Single-case research, which is known to be particularly useful for identifying effective practices at the individual student level (Horner et al., 2005), can therefore provide important information for identifying interventions that have been shown to be effective when intervening with this resource-intensive population.
The social validity of interventions for students experiencing emotional and behavioral problems is particularly important in light of teachers often having limited training in supporting students with significant behavior problems (Cook et al., 2003; Oliver & Reschly, 2010). Thus, interventions aimed at this population of students not only need to be highly effective but feasible and acceptable to teachers supporting an entire classroom of students.
Despite the importance of measuring social validity, particularly for students with emotional and behavioral problems, a lack of focus on variables associated with social validity in single-case research has been consistently noted (J. E. Carr, Austin, Britton, Kellum, & Bailey, 1999; Clarke & Dunlap, 2008; Conroy, Dunlap, Clarke, & Alter, 2005). For instance, in a review of articles focused on positive behavioral interventions with young children published across 23 journals between 1984 and 2003, Conroy et al. (2005) found that only 26% of the 73 included articles reported social validity measures. Clarke and Dunlap (2008) found similar results in their review of intervention research for children and youth with disabilities published from 1990 to 2005, where direct reporting of social validity measurement ranged from 31% of articles published in the Journal of Positive Behavior Interventions (JPBI), to 2% in the Journal of Applied Behavior Analysis (JABA), and 20% in Education and Training on Mental Retardation and Developmental Disabilities (ETMRDD). J. E. Carr et al. (1999) conducted a review of articles published in the JABA from 1968 to 1998 to identify trends in how frequently social validity measures (i.e., treatment outcome and treatment acceptability) were reported and found that such measures were reported in less than 13% of all articles. These studies provide a description of the frequency with which researchers are reporting social validity outcomes. Yet, there is a paucity of reviews investigating how social validity is currently being assessed by single-case researchers, and to our knowledge, there have been no reviews since the Horner et al. (2005) recommendations can be assumed to have affected published single-case research.
The purpose of this review was to explore the literature base and examine how the quality indicators for social validity outlined by Horner et al. (2005) have been applied in single-case research involving students with EBD. Specifically, we evaluated (a) whether researchers investigating interventions for students with EBD were addressing social validity as defined by Horner and colleagues, and if so, (b) how single-case researchers evaluating interventions for this population were measuring social validity.
Method
Our literature search included studies published from January 2008 to November 2011, as Horner et al.’s proposed standards article was published in 2005. The year 2008 was chosen under the assumption that most research would have been conducted after Horner et al.’s (2005) article had been published. Our search was limited to studies published in the JABA, the Journal of Emotional and Behavioral Disorders (JEBD), the JPBI, and Behavioral Disorders (BD). We do not claim this to be an exhaustive list, but rather suggest that these publications represent examples of peer-reviewed journals with (a) a large selection of single-case research and (b) studies focused on students with EBD. We conducted an archival hand search of all studies that were published in JABA, JEBD, JPBI, and BD between 2008 and November 2011 to select those studies that met inclusion criteria.
Our initial literature search returned 26 studies that met the following initial inclusion criteria: (a) the study involved single-case research methodology, (b) took place in a U.S. school or program, and (c) at least 50% or more of the participants were either explicitly identified as having an emotional or behavioral disorder, identified as at-risk for an emotional or behavioral disorder using a validated screening measure, or individuals, such as teachers, who support students explicitly identified as having or at-risk for an emotional or behavioral disorder.
We then conducted a secondary review where we examined these 26 studies to ensure that they met the experimental design standards set out by Horner et al. (2005), as we wanted to ensure that we included only those studies that could make valid claims of an effect, based on their experimental design. Only those studies that would allow for the demonstration of experimental effects at three different points in time were included in the final analysis. For example, a study that utilized a multiple-baseline (MBL) design with a minimum of three participants and at least three data points per phase would meet the experimental design standards. By contrast, an A-B design with only one participant or a study with a sufficient number of participants, but which included only one datum point during the baseline phase would not be included. Of the 26 articles that met our initial inclusion criteria, 4 did not meet experimental design criteria and were excluded, leaving a total of 22 studies that were included in the final analysis.
Characteristics of Included Studies
All 22 studies involved students or teachers of students with or at-risk for EBD. These studies included participants from preschool to high school, and took place in a wide range of geographic settings across the continental United States. All 22 studies included reports of participant age ranges and diagnoses, though only 14 studies reported the racial or ethnic backgrounds of participants. The included studies utilized various single-case research designs to examine a range of dependent variables. Two studies were conducted using reversal designs, while 4 studies were conducted using alternating treatment designs. Sixteen studies were conducted using variations of a MBL design. Sixteen of the 22 studies involved either a direct or indirect measure of students’ behaviors. Nine of the studies focused primarily on student problem behaviors, and 4 of those 9 studies focused on changing teacher behaviors as a method for decreasing student problem behaviors. One study looked purely at changes in teacher behavior. The other 14 studies focused mainly on academic issues for students with EBD, though 6 of these studies also examined problem behavior. Table 1 presents an overview of the final 22 studies.
Characteristics of Studies Included in the Analysis.
Note. JABA = Journal of Applied Behavior Analysis; EBD = emotional and behavioral disorders; MBL = multiple baseline; AT = alternating treatment; REV = reversal; BD = Behavioral Disorders; FBA = functional behavioral assessment; CBCL = Child Behavior Checklist; SRSD = Self-Regulated Strategy Development; SSBD = Systematic Screening for Behavior Disorders; SSRS = Social Skills Rating System; JEBD = Journal of Emotional and Behavioral Disorders; ABA = applied behavior analysis; JPBI = Journal of Positive Behavior Interventions; ESP = early screening project.
Rating Procedures for Quality Indicators
As noted earlier, Horner et al. (2005) recommended four components of social validity that should be used as quality indicators within single-case research. First, the dependent variable must be socially important. Second, the magnitude of change in the dependent variable resulting from the intervention must be socially important. Third, implementation of the independent variable must be practical and cost-effective. Finally, social validity is enhanced by implementation of the independent variable over extended time periods, by typical intervention agents, in typical physical and social contexts.
To rate studies on the presence or absence of these quality indicators, we created operational definitions by combining recommendations from Horner et al. (2005), Wolf (1978), and Lane et al. (2009). Two out of the three authors coded each article independently across each component of the operational definitions of the four quality indicators. We considered this to be an exploratory examination of the ways in which these quality indicators are being applied across the research base, taking into account the complexity of measuring social validity as a construct and the challenges that this can add to the research process. Assessing social validity involves measuring participants’ perceptions of value, which often means moving beyond the objective measurement of discreet, observable behaviors. However, this measurement challenge should not impede the process of examining questions about factors that critically affect the implementation and success of needed interventions. To attempt to most fully capture the range of ways that researchers are examining all of the complex elements of social validity, we decided to be purposefully liberal in our examination of the application of the quality indicators. We scored each component of the quality indicators with a yes/no rating, but we considered each quality indicator as present if the study met any part of the operational definition components. Each operational definition is described in detail below. Table 2 provides an overview of how each component of social validity was coded.
Operational Definitions of Quality Indicators and Additional Measures of Social Validity.
Quality Indicator 1: The dependent variable is socially important
Quality Indicator 1 was considered present if (a) the dependent variable(s) had high social importance (Horner et al., 2005) or (b) the goals were really “what society wants” (Wolf, 1978, p. 207). Society here was defined as typical behavior-change agents, participants, and family members. This quality indicator was coded as present if the researchers made a compelling case for the high social importance of the dependent variable to society through their description of the significance of the problem, or if they demonstrated that they had included stakeholder input during intervention development (e.g., including school staff in the development of outcome goals or the identification of dependent variables). This represents an example of one of the more liberal applications of the quality indicators; however, we felt this was a potentially important component of social validity in that it showed that researchers were soliciting and including typical stakeholders’ views and values in developing interventions. This operational definition also allowed us to assess whether researchers were addressing elements of social validity throughout the research process, rather than simply through the inclusion of a social validity measure at the end of the intervention.
Quality Indicator 2: The magnitude of change in the dependent variable resulting from the intervention is socially important
The second quality indicator was defined as a “demonstration that the intervention produced an effect that met the defined, clinical need” (Horner et al., 2005, p. 172). Similar to previous reviews (i.e., Lane et al., 2009), we considered this quality indicator present if a functional relation was evident between the introduction of the independent variable and change in the target behavior. Although not necessarily sufficient alone, when considered in conjunction with the previous quality indicator (i.e., the dependent variable is socially important), the demonstration of a functional relationship was judged by the authors to be one indicator of a socially important change. In fact, none of these quality indicators, when considered in isolation, would sufficiently indicate social importance but should rather be evaluated as indicators of the larger construct of social validity.
Visual analysis was used to rate this quality indicator. In this study, visual analysis was first used to rate the extent to which the study author(s) attempted to demonstrate an intervention effect at three different points in time or with three different phase repetitions (Kratochwill et al., 2010). Next the “level” of that effect (i.e., the extent to which the predicted change in the dependent variable covaried with the manipulation of the independent variable) was assessed by examining (a) level, trend, and variability of data within experimental phases; (b) immediacy of effect (i.e., magnitude of change in level, trend, and variability between phases); (c) overlapping data across adjacent phases; and (d) consistency of data patterns in similar phases (Parsonson & Baer, 1978). Each study was visually analyzed by the authors and rated for level of effect using a dichotomous scale (i.e., yes = quality indicator present, no = quality indicator not present) adapted from Kratochwill et al. (2010) by the authors for the purposes of this review. This quality indicator was judged to be present if there was a strong or moderate demonstration of effect. A “strong” rating indicated that the data showed (a) an immediate and significant change in level, trend, and/or variability; (b) limited or no overlapping data between phases; and (c) consistency across similar phases, for all participants. Studies were rated as demonstrating a “moderate” effect if either the data established two clear demonstrations of effect with a third effect judged as present “with reservations” (e.g., significant data overlap due to initially high rates of variability during baseline or intervention), or the data provided three demonstrations of effect, and also included one or more demonstrations of a noneffect.
Quality Indicator 3: Implementation of the independent variable is practical and cost-effective
The third quality indicator was defined as a demonstration that the intervention was practical and feasible given the resources available to typical stakeholders. This quality indicator was considered present either if cost-effectiveness was explicitly stated, or the intervention was delivered by typical agents (e.g., teachers) with typical resources (e.g., classroom materials; Lane et al., 2009).
Quality Indicator 4: Social validity is enhanced by implementation of the independent variable over extended time periods by typical intervention agents, in typical physical and social contexts
The fourth quality indicator was defined as a demonstration that typical stakeholders considered the intervention to be not only acceptable and worthwhile but also something they would consider implementing with longevity. This quality indicator was considered present if typical agents in the setting reported that the procedures were (a) acceptable, (b) feasible with available resources, or (c) that they would choose to continue use of the intervention after the research project ended (Lane et al., 2009). We rated the fourth quality indicator as present if any one of these three components was documented.
We rated each study with a yes or no rating on every component of all four quality indicators. We also noted how, when applicable, these components of social validity were being measured. For instance, we noted whether researchers included a specific social validity measure. We also recorded whether social validity was explicitly addressed, and how researchers reported social validity results.
Typical stakeholders continue to utilize intervention methods after formal supports are removed, and outcomes generalize across contexts. As noted by Horner et al. (2005), a demonstration that the typical agents “choose to continue use of the intervention procedures after formal support/expectation of use is removed” is a compelling sign of the social validity of a research project (Horner et al., 2005, p. 172). We accepted stakeholder reports (e.g., survey responses) of an expectation to continue use of an intervention as evidence that a study met the fourth quality indicator; however, we also rated whether there were any measures of maintenance or generalization reported in each study. While demonstrations of maintenance and generalization are not directly representative of continued intervention use, we consider them to be closely intertwined with this final component of social validity. To be considered truly “effective,” interventions must be capable of producing generalized behavior change that maintains across time (Baer et al., 1968). Interventions that are viewed as socially valid are more likely to be continued after the researchers have removed formal supports and, as a result, are more likely to produce outcomes that endure over time and generalize to novel settings. Therefore, while not sufficient to ensure social validity, generalization and maintenance can be viewed as necessary components of socially meaningful outcomes.
Interrater Agreement (IRA)
IRA was calculated using the overall agreement approach across two phases of the review process, one for ratings of experimental control, and one for ratings of the quality indicators. The formula for this approach was A ÷ A + D × 100%, whereby the total number of agreements was divided by the total number of agreements plus disagreements, and then that sum was multiplied by 100% (C. H. Kennedy, 2005).
First, the 22 studies were independently rated and reviewed to examine whether they met the experimental control criteria outlined above. All three authors were involved in this process, with one author serving as a primary rater, and other authors serving as secondary raters. IRA was calculated on 100% of the studies. IRA for experimental control criteria was 94% (range = 88%–100%). At that point, the first and second authors then independently rated these studies by the four Horner et al. (2005) quality indicators. One study was randomly selected to establish an acceptable rate of IRA (Ardoin, Williams, Klubnik, & McCall, 2009). From there, IRA was calculated on each specific component of our coding definitions and then averaged for each quality indicator on 63% (n = 14) of the remaining studies. IRA for the four quality indicators was 93% overall (range = 83%–100%). Specifically, IRA was at 93% for Quality Indicator 1, 100% for Quality Indicator 2, 96% for Quality Indicator 3, and 83% for Quality Indicator 4. The majority of disagreements on Quality Indicator 4 were around the second component of this indicator, which was coded as present if typical agents reported that the procedures or interventions were feasible with available resources. These disagreements centered on rater interpretations of “feasibility”; however, as IRA rates were still considered within an acceptable range, each disagreement was discussed until consensus was reached, and a final rating was agreed upon.
Results
Results are reviewed in regard to the two primary research questions. First, we explored how many studies in our review met the quality indicators outlined by Horner et al. (2005). Each article was coded across the social validity quality indicator components, and these results are presented in Table 3. Second, we provide results related to the ways in which these studies addressed additional aspects of social validity. Overall, we found that although many studies do not address social validity explicitly, half met all of the quality indicators based on our criteria for this review. Table 4 presents an overview of these findings across all included studies.
Applications of Social Validity Quality Indicators Across Articles.
Percentage of Studies that Included the Quality Indicators for Social Validity (Horner et al., 2005).
Quality Indicator 1
The dependent variable is socially important. All 22 studies made a case that the dependent variable was socially important in their introduction. All 22 studies established this social importance with support from a review of the literature. More specifically, in all but one study, the authors explained why the dependent variable was socially important for students who have been identified as having or as at-risk of emotional and behavioral problems. For example, C. Kennedy and Jolivette (2008) clearly established that suspension and removal from instruction is a consistent problem for students with EBD, and they posited that increasing praise and positive reinforcement for these students may result in increased instructional time. Ardoin et al. (2009), however, made a case for increased focus on reading, but they did not specifically link this to the needs of students with EBD. Only one of the studies established the importance of the dependent variable through consultation with the participants themselves (Billingsley, Scheuermann, & Webber, 2009). However, 14 other studies did include some stakeholder input that helped to establish that the goals of the study were wanted by society, such as surveying teachers about the importance of study goals or the feasibility of study procedures at baseline, including teacher input in the development of performance criteria, or nominating participants. These examples, while imperfect proxies of this aspect of social validity, were considered to represent an effort to include typical stakeholders’ input from the outset of the intervention.
Quality Indicator 2
The magnitude of change in the dependent variable resulting from the intervention is socially important. All of the studies met our criteria for the magnitude of change in the dependent variable being socially important. We rated 10 out of the 22 studies as having a strong effect, and 12 of the studies as having a moderate effect. It should be highlighted again that these ratings were based on our visual analysis and should therefore not be interpreted as the opinions of the study authors; however, our ratings typically aligned with authors’ conclusions, and we had strong rates of agreement across the strong and moderate coding categories using the rating scale described above. For instance, Turton, Umbreit, and Mathur (2011) provided an example of a “strong” effect, with the data demonstrating clear, immediate, and sustained changes in level and trend following the introduction of the intervention for all three participants. One study that was rated as demonstrating a moderate effect was Carter and Horner (2009). The data from this study showed immediate changes in level and trend with limited data overlap for the first two participants. For the third participant, while there was an immediate change in level following intervention, there was also a significant amount of overlapping data due to relatively high rates of variability in baseline and intervention phases.
Quality Indicator 3
Implementation of the independent variable is practical and cost-effective. None of the studies included in this review provided specific information about the cost of the intervention under investigation, or a detailed description of the cost of the resources (e.g., materials, training) needed to implement the intervention. However, 14 of the studies demonstrated the use of “practical” interventions in that the intervention was delivered by typical agents in typical contexts with typical resources. For example, Waller and Higbee (2010) noted that a paraprofessional was trained to conduct the functional analysis conditions and deliver the intervention, and that instruction was delivered by the classroom teacher, which was typical of those studies that met these criteria. Studies that did not meet this component were varied, but most consisted of highly trained researchers or research assistants delivering interventions, such as the one-on-one training delivered by researchers to teachers in the Trussell, Lewis, and Stichter (2008) study, or the use of time or material resources that were not considered typical in school settings.
Quality Indicator 4
Social validity is enhanced by implementation of the independent variable over extended time periods by typical intervention agents, in typical physical and social contexts. To receive credit for this quality indicator, studies had to include an assessment of participants’ views on the intervention. Fifteen of the studies included participants’ reports of the acceptability of procedures. Thirteen studies included reports of whether participants would continue the intervention. Five studies included information on typical stakeholders’ reports of the feasibility with available resources. The majority of studies that did not meet this quality indicator simply did not include an assessment of participants’ views of the intervention (e.g., Ardoin et al., 2009; Daly et al., 2009). Those studies that did meet these criteria generally provided information obtained through interviews, anecdotal conversations, or questionnaires that typically asked participants to address each component (e.g., “How acceptable and socially appropriate were the interventions?” “How likely are you to continue using the intervention?” Carter & Horner, 2009). One study (Mason, Kubina, Valasa, & Mong-Cramer, 2010) used a generalization essay prompt that asked student participants whether other students should be taught how to write using the delivered intervention.
Additional Indicators of Social Validity
Measures of maintenance and generalization were also included as indicators of social validity, as they determine the sustainability of the intervention and its effects (C. H. Kennedy, 2005). Twelve of the 22 studies measured maintenance; 11 of these measured maintenance with the intervention supports removed. Seven of the 22 studies directly measured generalization. Most of the maintenance probes were administered within a few weeks of the end of the study; however, one study (Mastropieri et al., 2009) measured maintenance 11 weeks after the end of the intervention and the removal of researcher supports. Generalization was typically measured across settings or interventionists (e.g., Beare, Torgenson, & Creviston, 2008; Turton et al., 2011); however, several studies also presented different stimuli, such as Mason et al. (2010), who included a standardized writing assessment along with the essay probes that were used as the primary dependent measures. Studies that did not meet these criteria typically did not include a maintenance or generalization probe.
Social validity was addressed, assessed, and included in a variety of ways throughout these studies. The large majority of studies directly addressed some aspect of social validity in their methods and discussion sections, though six studies did not explicitly address social validity in any form. Two studies (Carter & Horner, 2009; Hawkins & Heflin, 2011) addressed the importance of social validity in their introduction, and two studies addressed social validity within one of their primary research questions (C. Kennedy & Jolivette, 2008; Mason et al., 2010). Sixteen out of the 22 studies included a specific measure of social validity. Thirteen of the 22 studies utilized a questionnaire or survey to obtain social validity information. Two studies included interviews, two included anecdotal reports, and one study (Mason et al., 2010) included students’ written essay responses to a social validity prompt. Despite this range of social validity measures, authors generally summarized the results rather than providing the same type of objective reporting used for the dependent variable(s); however, three studies (Carter & Horner, 2009; Hawkins & Heflin, 2011; Todd, Campbell, Meyer, & Horner, 2008) reported full social validity results in table formats.
Discussion
The findings of this review suggest that researchers using single-case design to investigate outcomes of intervention practices for students with EBD are, at least in part, addressing the recommendations outlined by Horner et al. (2005) when it comes to social validity. Overall, data from the current study suggest that researchers in the area of EBD are investigating socially important outcomes and utilizing research designs focused on practical interventions implemented within typical contexts by typical interventionists. All of the studies examined measured socially important dependent variables and reported socially important magnitudes of change in those dependent variables, which may be expected given that these indicators are commonly included, one may even argue expected, for publication. Over half of the studies examined practical interventions, and a similar number measured whether typical agents felt the intervention could be used in their typical contexts.
The explicit use of social validity measures, however, still does not appear to be universal practice. Although 50% of the studies met at least one component of all four quality indicators, none of the studies met all components of all four social validity indicators, and only one study (Turton et al., 2011) met all but one component (cost-effectiveness). Even with the broad definitions used in this exploratory review, many researchers were not fully addressing the recommendations of Horner et al. (2005). For example, the fourth quality indicator assessing use in typical contexts was only reported in 68% of the studies. When it came to maintenance and generalization, the numbers were even lower.
The findings of this review might best be thought of as a starting point for further research and development. Although almost all of the dependent variables included in this review were rated as being socially important, only one study (Billingsley et al., 2009) included practitioner input when developing their intervention. For the remainder of the studies, rather than gathering input from practitioners, students, or their families, thorough literature reviews were used to provide a rationale for why the dependent variable(s) were of social importance. With researchers consuming and contributing to the literature, this leads to a situation where researchers are essentially driving the measure of social need, rather than explicitly consulting and collaborating with practitioners to determine the research questions that are most valued in typical contexts.
This lack of explicit consultation and collaboration with typical stakeholders needs to be considered in light of the evidence of the overrepresentation of students of color in the EBD population, and concerns about cultural competency with students in high incidence disability categories (Blanchett, 2006; Skiba, Poloni-Staudinger, Gallini, Simmons, & Feggins-Azziz, 2006). For this review, Horner et al.’s (2005) first quality indicator was in part a measure of whether the goals of the study were valued by society, with society being operationally defined as typical behavior-change agents, family members, and the participants themselves. The fact that very few of these key stakeholders were consulted in the identification of dependent variables or development of outcome goals is particularly problematic given the cultural assumptions and power dynamics that may be in play when researchers are independently making decisions. It is also interesting to note that 7 of the 22 studies did not report the racial or ethnic background of their participants. Information about contextual and cultural variables are clearly important to consumers of research attempting to determine whether an intervention can effectively be implemented within a given context, and whether the variables and procedures being researched are aligned with values of the population with whom they are working.
Consultation with practitioners is also necessary to determine whether the magnitude of change in the dependent variable is socially valid. For instance, Ardoin et al. (2009) noted that six rereadings of basal passages resulted in minimally higher Words Read Correctly per Minute than three rereadings of the passages. Both interventions resulted in improvement, but no measures of teacher or student views of acceptability were taken. It is possible that while students showed minimal improvement with six compared with three rereadings, that the added time, repetition, and behavior management that might be needed to support six rereadings would limit the social validity of the extended intervention. Given the importance of utilizing effective interventions to support students with EBD in their academic, social, and behavioral learning, researchers need to consider the reality that typical intervention agents are often supporting these students with limited resources, competing demands, and values that may differ from those of researchers.
The fourth quality indicator, use in typical contexts, was counted as present if explicit measures or reports were obtained from the typical agents, or if maintenance or generalization were included. Seventeen studies met this quality indicator. Despite the measurement of social validity being a well-established component of single-case research design (C. H. Kennedy, 2005; Wolf, 1978), these measures, along with maintenance, were often included in a limited fashion. Questionnaire and interview data were often included only in part or limited to a brief summary of what agents reported. Similarly, most researchers did not return to their research site to assess whether the intervention or the effect of the intervention maintained over time. Mastropieri et al. (2009) collected maintenance probes 11.5 weeks after they conducted their intervention, but this was not the norm. Most other researchers who collected maintenance probes conducted them within a few days of the intervention phase. In addition, only seven of the studies reported generalization data. This is particularly problematic, because failing to demonstrate that changes in behavior generalize outside of the experimental setting and maintain over time limits the claims that can be made regarding the “social validity” of those outcomes.
Future Directions: Implications for Future Practice and Research
Our discussion of future directions is aimed at researchers as well as educators, as we view the link between the two as essential to socially important research and good practice. In practice, a previously researched intervention with high social validity but a smaller demonstration of an effect may be a better choice than an intervention with low social validity but a strong effect. The movement toward evidence-based practices is not limited to practices themselves, but includes factors associated with the process of embedding and sustaining effective practices within educational systems (Carter & Horner, 2009). Without regard for factors such as the cost, feasibility, and acceptability, an intervention may be effective within the context of research, but ultimately never improve outcomes for the populations for which it was intended. Due to factors such as limited resources or ease of implementation, educators may choose to adopt an intervention that is more cost-effective and easy to implement over an intervention that has been shown to be more effective but less feasible (Merrell & Buchanan, 2006). Information about the social validity of interventions as well as the actual costs, lists of materials, or prerequisite skills of an implementer can aid educators as they consider practical and contextual factors that guide the selection of interventions and maintenance of evidence-based practices over time.
Collaboration between educators, families, and researchers continues to be a critical feature of embedding social validity into the entire research process. There were several noteworthy examples of collaboration within this review. C. Kennedy and Jolivette (2008) and Mason and colleagues (2010) addressed social validity in one of two primary research questions, giving the evaluation of social validity equal weight to the effectiveness of the intervention under investigation. Beare et al. (2008) used a collaborative approach throughout the planning process, and designed a study that first and foremost met the needs of the participant. And finally, Turton et al. (2011) used teachers’ ratings of intervention feasibility to select participants, and social validity was measured at baseline and post-intervention.
Without input from typical stakeholders, potential generalization and collaboration between researchers, educators, and families may be limited. This is a particularly important consideration when working with students and families from culturally diverse groups and creating interventions and practices that address the dynamic role that culture plays in developing and modifying behaviors (Arzubiaga, Artiles, King, & Harris-Murri, 2008). Thus, we recommend obtaining information of social validity from multiple sources (e.g., parents, teachers, administrators, and students) and providing sufficient information about the cultural and social variables of the context and participants in an effort toward adopting more culturally responsive approaches to conducting research.
Finally, one of the biggest challenges we faced throughout the review process was the lack of explicit measures of social validity in the research we reviewed. In fact, only 4 of the 22 studies included in this review used a research-based social validity measure (Little et al., 2010; Schoenfeld & Mathur, 2009; Todd et al., 2008; Turton et al., 2011). The inclusion of more standardized social validity measures (i.e., rating scales) such as the ones used in the studies we reviewed (e.g., Intervention Rating Profile-15; Witt & Elliott, 1985) is needed to make more objective comparisons between studies. Without the use of psychometrically established measures, it will remain difficult to assess the true state of single-case research as it relates to social validity.
Limitations
The findings of this review of current single-case research offer some important insight into the ways social validity is being addressed in the EBD literature; however, this review provides merely a “snapshot” of the current use of social validity in single-case research involving students with EBD, and studies from a limited number of journals and years were included in the review. Further research is needed to explore whether the field has made changes or improved social validity measurement since the Horner et al. (2005) recommendations were published. In addition, due to the subjective nature of the “social validity” construct and a lack of standardized social validity measures, we developed operational definitions for each of the Horner et al. quality indicators and different definitions may have led to different findings. For example, our definition of Quality Indicator 2 required studies to demonstrate a functional relation between the introduction of the independent variable (IV) and a change in the dependent variable (DV). We interpreted that change as socially valid if the studies established the DV as socially important in Quality Indicator 1; however, this combination may have overlooked instances where a moderate change in a socially valid DV did not adequately meet the defined clinical need. In addition, the specificity of the definitions led to some studies receiving credit for indicators of social validity when their actual measures were highly limited, with other studies not receiving credit, when aspects of the study design or intervention did appear to enhance social validity. For example, Billingsley et al. (2009) included a discussion of the fact that the lead teacher, who was also an author and involved in the study design, felt that the intervention procedures allowed her to individualize instruction for her students and was quite useful in her classroom. Social validity, however, was not explicitly measured, and the three components of Quality Indicator 4 were not specifically addressed, so this study did not receive credit for this quality indicator. Future research should address these issues, and these operational definitions should be viewed as a starting point for measuring the quality indicators recommended by Horner et al.
Conclusion
Single-case research is making important contributions to the evidence-based practice movement. This is particularly the case with research investigating intervention outcomes for children and adolescents identified as having disabilities, including those with emotional and behavioral disorders. As single-case methodology becomes more rigorous (e.g., Horner et al., 2005; Kratochwill & Levin, 2009), so should the measurement of social validity. We believe that an explicit emphasis on social validity throughout the research process and information about how intervention outcomes generalize and maintain over time will maximize the impact of single-case research on practice, thereby improving outcomes for children, educators, and families.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
