Abstract
Restricted interests and repetitive behaviors vary widely in type, frequency, and intensity among children and adolescents with autism spectrum disorder. They can be stigmatizing and interfere with more constructive activities. Accordingly, restricted interests and repetitive behaviors may be a target of intervention. Several standardized instruments have been developed to assess restricted interests and repetitive behaviors in the autism spectrum disorder population, but the rigor of psychometric assessment is variable. This article evaluated the readiness of available measures for use as outcome measures in clinical trials. The Autism Speaks Foundation assembled a panel of experts to examine available instruments used to measure restricted interests and repetitive behaviors in youth with autism spectrum disorder. The panel held monthly conference calls and two face-to-face meetings over 14 months to develop and apply evaluative criteria for available instruments. Twenty-four instruments were evaluated and five were considered “appropriate with conditions” for use as outcome measures in clinical trials. Ideally, primary outcome measures should be relevant to the clinical target, be reliable and valid, and cover the symptom domain without being burdensome to subjects. The goal of the report was to promote consensus across funding agencies, pharmaceutical companies, and clinical investigators about advantages and disadvantages of existing outcome measures.
Keywords
Introduction
The detected prevalence of autism spectrum disorder (ASD) has increased over the past decade with a current estimate of 1.1% in school-aged children (Centers for Disease Control and Prevention (CDC), 2012; www.cdc.gov). In the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5), ASD is characterized by qualitative impairments in social interaction and social communication, as well as restricted interests and repetitive behavior (RRB; American Psychiatric Association (APA), 2013). RRBs vary widely in type, frequency, and intensity among children with ASD. In many cases of ASD, RRBs can be stigmatizing, can interfere with more constructive activities, and contribute to the overall disability (Honey et al., 2012b; Lam et al., 2008; Scahill et al., 2006a; South et al., 2005; Turner-Brown et al., 2011). Accordingly, RRBs may be a target of intervention (King et al., 2009). Several instruments have been developed to assess RRBs in the ASD population, but the rigor of psychometric assessment is variable and few measures have a solid track record as an outcome measure in clinical trials.
The challenges of measuring RRBs in ASDs are manifold. First, ritualistic behaviors are common in young typically developing children (Arnott et al., 2009; Leekam et al., 2007). Thus, depending on the age of the child, it may be difficult to distinguish normal ritualistic behavior from symptomatic RRBs in ASD (Goldman et al., 2009). Second, repetitive behaviors are defining features of Tourette syndrome and obsessive–compulsive disorder (OCD). Tourette syndrome is defined by persistent motor tics and phonic tics such as blinking, facial grimacing, head jerking, grunting, coughing, or throat clearing. Some children exhibit more complex tics such as arm thrusting, twirling, blurting out words, and obscene words in rare cases (Scahill et al., 2006a). In a given child, tics and stereotypic behavior may be difficult to differentiate and some children with ASD may have both.
OCD is characterized by recurrent intrusive thoughts or worries (obsessions), repetitive behaviors (compulsions), or both. The obsessions and compulsions are time-consuming, distressing, and interfere with everyday living. Common obsessions in children with OCD include worries about contamination, harm coming to the self or family members, or need for symmetry. Common compulsions include hand-washing, checking, touching in complex patterns, repeating routine activities such as moving back and forth across a doorway, persistent requests for reassurance in response to specific fears (Scahill et al., 2003). Children with OCD usually describe their worries as unpleasant and difficult to dislodge and compulsions difficult to control.
By contrast, repetitive behaviors in children with ASD may not cause distress and children often appear to have a strong motivation to perform repetitive behaviors. However, the function of RRBs is not always clear (Honey et al., 2012a). The same repetitive behavior may be used for self-stimulation or to reduce high levels of arousal. For example, parents may comment that the child’s hand flapping increases during periods of excitement, suggesting overstimulation or an attempt to reduce arousal. Rocking behaviors are not specific to ASD and may provide a form of self-soothing. Distress and tantrums may occur when the child is instructed to stop an apparently preferred behavior. The drive to engage in the preferred behavior may contribute to noncompliant and disruptive behavior (McDougle et al., 2005; Scahill et al., 2006b). Although many patients with ASD do not appear distressed by their repetitive behaviors, some are troubled by intrusive thoughts or compulsions in a manner that suggests coexisting OCD (McDougle et al., 1995). Thus, careful assessment of the context and triggers of the repetitive behavior is warranted.
A third measurement challenge for RRBs is the diversity of these behaviors in children with ASD. Behaviors can range from stereotyped motor behaviors such as hand flapping, rocking and flipping an object in front of the eyes, lining up toys, and repetitive self-injury to more complex behaviors such as repeating phrases from movies, watching the same video segment over and over, insistence on following routines in everyday living (e.g. getting dressed or ready for bed in a ritualized sequence). Age and intellectual functioning affect the expression of RRBs, but simple typologies may be difficult to identify. For example, stereotypic behavior appears more common in young children (Esbensen et al., 2009), but these behaviors occur in adults as well—especially in lower functioning individuals (Rojahn et al., 2000). In a sample of 830 children from 15 months to 11 years of age, Bishop et al. (2006) observed that stereotypic behavior and repetitive use of objects are common in children with ASD below 5 years of age across the full range of intelligence quotient (IQ). In children over 5 years of age, however, the frequency of stereotypic behavior persisted in children with nonverbal IQ below 70 but declined in children with age in the normal IQ range. This finding suggests an interaction between age and intellectual functioning. Preoccupations with fans, air conditioners, garage door openers, train schedules, or historical facts and figures are more common in higher functioning children (South et al., 2005; Turner-Brown et al., 2011). The behavioral correlate of such preoccupations may include repeated questioning or expounding on the topic long after the listener’s interest has waned. Preoccupations and perseverative behavior can occur in lower functioning youth, but incessant talk about the preferred topic requires language.
The heterogeneity of RRBs in children with ASDs has prompted attempts to classify behaviors into subtypes with inconsistent results. Based on expert opinion, the World Health Organization (WHO, 2007) identified four subtypes of RRBs: (a) preoccupations with part-objects or nonfunctional elements of materials, (b) stereotyped and repetitive motor mannerisms, (c) preoccupations or circumscribed patterns of interest, and (d) compulsive adherence to specific nonfunctional routines or rituals. These subtypes are not entirely consistent with factor analyses conducted over the past decade, showing two factors (repetitive motor behaviors and insistence on sameness) (see Honey et al., 2012b, for a detailed review). This two-factor structure is consistent with the classification of behaviors as lower order (e.g. stereotypy) or higher order (more complex RRBs) (Turner, 1999), but the two-factor model has not been reported in all studies. The differences across studies may be due to differences in sample ascertainment, age, and developmental level of the sample, as well as the selection of the repetitive behavior in the analysis. For example, several studies did not include circumscribed interest as an item in the analysis. In a study of 316 school-aged children, Lam et al. (2008) did include circumscribed interest and identified it as a third factor.
A fourth challenge is the wide variability in language and cognitive functioning in children with ASD. Although some children with ASD may be able to describe the purpose for their repetitive behaviors, nonverbal children cannot. Even children with language may have limited insight into the motivation for their behavior. Thus, assessment of RRBs in children with ASD is likely to depend on behavior reported by parents or observed during clinical assessment.
Finally, self-injurious behaviors (SIBs) pose a particular challenge in the evaluation of repetitive behavior in children with ASD. Bishop et al. (2006) observed SIB in 20%−58% of their sample of 830 children, with the highest percentage in children between 6 and 11 years with nonverbal IQ < 50. Behaviorists note that SIB often serves a function such as attention from an adult, access to preferred item or activity, or escape from environmental demands (Iwata et al., 1994). In some children, especially lower functioning children, SIB appears to be stereotypic in nature (Brown et al., 2002). Moreover, SIB is not specific to ASD. In a sample of 943 children with intellectual disability (aged 4–18 years) from special schools in Britain (Oliver et al., 2012) observed SIB in 17% of the sample and 40% with aggressive behavior with high co-occurrence of SIB and aggression. SIB was associated with lower functioning children and those classified as having a high frequency of other repetitive behaviors. Thus, SIB in youth and adults with ASD warrants careful consideration and classification as an RRB or a serious maladaptive behavior serving a social (e.g. attention), instrumental (e.g. tangible item), or escape function.
Honey et al. (2012b) examined the psychometric properties of frequently used instruments for measuring RRBs in ASD, although the review was not focused on outcome measurement in clinical trials. This article extends this review by ranking the readiness of available measures for use as an outcome measure in clinical trials. The goal of this report is to promote consensus across funding agencies, pharmaceutical companies, and clinical investigators about the advantages and disadvantages of existing outcome measures.
Methods
Identification of existing measures
In 2011, Autism Speaks empanelled workgroups to evaluate outcome measures for clinically meaningful targets, including social communication deficits (Anagnostou et al., in preparation), anxiety (Lecavalier et al., in press), and repetitive behavior. Workgroup members included investigators with clinical trial expertise in ASD or assessment of RRBs in ASD, as well as Autism Speaks science program staff. The workgroup had monthly conference calls and two face-to-face meetings over the course of 14 months.
Measures considered for evaluation by the three workgroups were identified from PubMed, Web of Knowledge, Google Scholar, and clinicaltrial.gov with the search terms of “autism and clinical trial” and “autism and treatment” of literature from 2005 to 2010. The initial literature survey was limited to pharmacological treatments or complementary and alternative medicines and studies in English. Relevant published data in other languages were noted. Additional measures were considered for evaluation based on expertise or knowledge of the workgroup. Measures with potentially relevant subscales were included for consideration. A follow-up search was conducted of literature from 2005 to 2012 that included trials of behavioral interventions with group designs using the terms “autism” and the terms “clinical trial,” “trial,” “randomized,” and “treatment.” The title of each measure was also searched with the terms “clinical trial,” “randomized,” and “trial.” Additional databases included PsycINFO, PsycARTICLES, and Linguistics and Language Behavioral Abstracts. The instrument manuals were obtained and the authors and publishers of each instrument were contacted and invited to share pertinent information.
Evaluation of measures
The literature review identified 51 published reports and 24 identified measures were categorized according to their type (questionnaire, observation, interview), respondent type (clinician-completed, informant-completed, or self-assessed), breadth of symptom domains (clinical relevance), age and developmental range, precedent for use in ASD or other developmental disabilities, availability in other languages, burden on the patient, caregiver, clinician and investigator, and sensitivity to change. The central evaluative criteria included evidence of content and construct validity, internal consistency, interrater reliability (especially for clinician-rated measures) and test–retest reliability. Evaluations across the three workgroups were consensus based and used all available published documentation. Key considerations for the final classification were clinical relevance and strength of the measure’s psychometric properties. Slight differences in how these were defined emerged across the three workgroups due to differences among the actual measures evaluated. Other criteria were acknowledged and could be used to judge suitability for specific studies but were not core criteria used for the overall rating. A full description of the criteria is included in Appendix 1.
Based on the above-mentioned criteria, instruments were characterized as follows: appropriate, appropriate with conditions, potentially appropriate, unproven, or not appropriate. Due to minor differences in the available measures across symptom domains, there were slight differences across the three workgroups on the definition of these categories. Table 1 presents definitions used to evaluate RRB measures. The classification of appropriate, appropriate with conditions, or potentially appropriate was based on the judgment that the content of the instrument was clinically relevant. Following this essential criterion, we examined the data on reliability and validity to classify the measure. Sensitivity to change is an essential requirement for an outcome measure in clinical trials. Given that only a handful of adequately powered trials focused on RRBs have been completed in youth with ASD, however, promising outcome measures could be overlooked if this criterion were rigidly applied in this review. Therefore, measures with demonstrated sensitivity to change were acknowledged for achieving this benchmark. Measures without such evidence would not be dismissed as inappropriate if other evaluative criteria were met and, in the opinion of the workgroup, the measure had promise as a measure of change.
Classification of outcome measures.
ASD: autism spectrum disorder; ADOS: Autism Diagnostic Observation Schedule.
Results
Twenty-four instruments were evaluated, and five were considered appropriate with conditions for use as an outcome measure in clinical trials. Table 2 summarizes the evaluation of each measure considered in this review (for a detailed description, see Appendix 1).
Descriptive information, evaluative criteria for RRBs measures considered appropriate with conditions for youth with ASD.
RRB: restricted interests and repetitive behavior; ASD: autism spectrum disorder; DD: developmentally disabled; PDD: Pervasive Developmental Disorder.
Measure has been translated into one or more non-English language, but reliability and validity of translation may vary; there is limited information on translations; a = sensitivity to change not clearly demonstrated; b = consists of multiple subscales, some with few items; c = includes a narrow set of behaviors; d = has limited range of scores; e = atypical scoring (items not scored on same metric); f = checklist items remain weighted toward obsessive–compulsive disorder; g = lack of variability for the resistance item.
Appropriate with conditions
Children’s Yale-Brown Obsessive Compulsive Scales for Pervasive Developmental Disorder
The Children’s Yale-Brown Obsessive Compulsive Scales for Pervasive Developmental Disorder (CYBOCS-PDD) is a modified version of the original CYBOCS, which was designed to measure symptom severity in children with OCD (Scahill et al., 1997, 2006). The original CYBOCS is a semistructured clinician interview that includes a separate symptom checklist for obsessions and compulsions. The checklists are reviewed jointly with parent and child to identify the obsessions and compulsions present over the previous week. Once the primary symptoms are established, the interviewer considers the five severity dimensions: time spent, interference, distress, resistance, and degree of control separately for obsessions and compulsions. Each dimension is rated from 0 to 4 with higher scores reflecting greater severity. Thus, the original CYBOCS yields three scores: total for obsessions (0–20), total for compulsions (0–20), and overall total (0–40). This scale has demonstrated reliability, validity, and sensitivity to change following treatment in children with OCD (Riddle et al., 2001; Scahill et al., 1997).
The modified CYBOCS-PDD for children with ASD excludes the obsessions checklist or the related severity items. This decision was based on the difficulty of ascertaining information about obsessional thoughts in children with ASD, who may have language and cognitive deficits (Scahill et al., 2006b). The modified scale also added several repetitive behaviors that are relevant to ASD (e.g. motor stereotypy, watching the same video over and over, and insisting on routines in everyday situations). In addition, the revised version included minor alterations in the anchor points of the five severity dimensions (each rated 0–4). Thus, the CYBOCS-PDD has a scoring range from 0 to 20 to capture the degree of the child’s involvement in repetitive behavior (time spent) and impairment (e.g. interference and distress if prevented) over the previous week. Analysis of baseline data from 172 participants (aged 5–17 years) in clinical trials conducted by the Research Units on Pediatric Psychopharmacology (RUPP) Autism Network showed that the clinician-rated CYBOCS-PDD is a valid measure for repetitive behavior. It has solid internal consistency (alpha = 0.85) and excellent interrater reliability (intraclass correlation = 0.97) (Scahill et al., 2006). Sensitivity to change with treatment has been demonstrated in a secondary analysis of the RUPP Autism Network placebo-controlled risperidone trial (McDougle et al., 2005).
The CYBOCS-PDD was used as a central outcome measure in a multisite trial of citalopram in 149 children (aged 5–17 years) with ASD (King et al., 2009). In that study, citalopram was no better than placebo in reducing repetitive behavior. Indeed, there was no change in either treatment group, suggesting acceptable test–retest. The CYBOCS-PDD was also used in a registration trial of a novel formulation of fluoxetine in 158 children with ASDs (http://clinicaltrials.gov/ct2/show/NCT00515320), and once again, there was no difference between drug and placebo. The use of the CYBOCS-PDD in this registration trial suggests that the US Food and Drug Administration (FDA) accepted this instrument as a suitable outcome measure of repetitive behavior in children with ASDs. This is meaningful because the FDA considers the relevance, reliability, and validity of proposed outcome measures carefully for studies seeking to establish a claim of efficacy and safety for a given compound (US FDA, 2009).
The CYBOCS-PDD has several strengths and is ready for use as a primary outcome measure in its current form. Nonetheless, there are areas that require additional research. First, the symptom checklist of the CYBOCS-PDD contains several items (e.g. excessive hand-washing and rituals to prevent harm) that are relevant to OCD but may not be relevant to ASD. Second, the resistance item (scored 0–4) shows little variability in children with ASD (Scahill et al., 2006). Briefly, the resistance item reflects how much the patient struggles against the urge to perform the compulsive behavior. In OCD, it reflects a sign of health. Thus, the patient who pushes against the urge to repeat the behavior most of the time gets a score of 1. The patient who does not resist the urge to perform the compulsion is given a score of 4. In ASD, RRBs are often preferred behaviors suggesting that the concept of resistance does not apply to ASD.
Repetitive Behavior Scale–Revised
The parent-rated Repetitive Behavior Scale–Revised (RBS-R) measures the severity of repetitive behaviors in children and adults with ASD (Bodfish, 2003; Bodfish et al., 2000). The 43 items were organized into six conceptually derived subscales: stereotyped behavior (hand flapping, object spinning), SIB (head hitting, wrist biting, skin picking), compulsive behavior (washing, arranging, or ordering objects, need to touch or tap things), ritualistic behavior (taking a certain route during travel, scripted conversation during social interactions), sameness behavior (resistance to change, insisting that furniture remain in the same place), and restricted behavior (preoccupation with specific topics). Considering frequency, response when behavior is interrupted, and degree of interference in daily living, the parent (primary caregiver) is asked to score items on a 4-point scale:
Mirenda et al. (2010) examined the factor structure of the RBS-R in 287 preschool children. The investigators proposed a three-factor solution (stereotyped + restricted behaviors, SIB, and compulsive + ritualistic + sameness behaviors) and the same five-factor solution identified by Lam and Aman (2007) as the preferred models. A one-factor solution was ruled out.
In a sample of 1825 subjects (4–18 years of age), Bishop et al. (2012) identified five factors similar to the previous studies. These investigators also examined the correlations between five RBS-R factors and two repetitive behavior subscales from the Autism Diagnostic Interview–Revised (ADI-R) (Repetitive Sensory Motor Behaviors and Insistence on Sameness) as well as the association of the RBS-R subscales with age and IQ. The correlations between the RBS-R and ADI-R subscales ranged from 0.18 to 0.57. The strongest correlations were the ADI-R Insistence on Sameness subscale with the RBS-R Ritualistic + Sameness Behavior subscale (r = 0.47) and the ADI-R Repetitive Sensory Motor subscale with the RBS-R Stereotyped Behavior subscale (r = 0.57). These findings suggest good convergent validity between these two RBS-R factors and the two ADI-R subscales. There were no significant associations for age or IQ with RBS-R factors.
The RBS-R has been used as an outcome measure in at least two randomized clinical trials (Hardan et al., 2012; King et al., 2009). Given that citalopram was no better than placebo for reducing repetitive behavior on any measure, the failure to observe change on the RBS-R can be attributed to the lack of efficacy for citalopram. Hardan et al. (2012) evaluated N-acetylcysteine (NAC), which may reduce glutamate transmission, in a sample of 29 children (aged 3–11 years) with autistic disorder. The 14 subjects treated with NAC showed significant improvement on the RBS-R Stereotyped Behavior subscale compared to the placebo group (n = 15). Thus, the evidence for using the RBS-R to measure change with treatment is limited in keeping with the limited support for any treatment for repetitive behaviors in ASD.
Collectively, the available data on the RBS-R support its reliability and validity. The total score does not appear to be useful as an outcome measure because RRBs are not unidimensional. With minor differences, the five-factor solution has been replicated—but only two of these five subscales have more than six items. Subscales with six or fewer items may not provide sufficient symptom coverage or scoring range to serve as primary endpoints in a clinical trial. The three-factor model identified by Mirenda et al. (2010) and replicated by Bishop et al. (2012) would obviate the scoring range problem, reduce the number of outcome measures from five to three, and warrants additional study.
Aberrant Behavior Checklist Stereotypic Behavior subscale
The Aberrant Behavior Checklist (ABC) is a five-factor scale developed to assess treatment effects in patients with intellectual disability (Aman et al., 1985, 1989). The developers of the ABC compiled a list of problem behaviors from the medical records of residents in a developmental center and then asked caregivers to refine these items. Residential caregivers rated 125 items in 418 individuals. Based on preliminary review, this list was subsequently reduced to 76 items and caregivers were asked to rate an independent group of 509 individuals. Across these two samples of individuals with intellectual disability, autism or both, subjects ranged from 5 to 60 years of age. Separate principal components analyses were used to analyze ratings from these two samples. Fifty-eight items were retained on the five factors: irritability (15 items), lethargy/social withdrawal (16 items), stereotypic behavior (7 items), hyperactivity (16 items), and inappropriate speech (4 items). Considering frequency and interference in everyday living activities, informants are asked to rate on a 4-point scale (0 = “not at all a problem” to 3 = “the problem is severe in degree”). The Stereotypic Behavior subscale includes whole-body movements (“rocks body back and forth”) and more specific movements (e.g. “waves or shakes the extremities repeatedly”).
A separate survey in 601 children, adolescents, and young adults (aged 6–22 years) from special education rosters replicated the same five factors and provided normative data in a broad sample of individuals with developmentally disabilities (Brown et al., 2002). This survey also showed that the scores on the Stereotypic Behavior subscale ranged from low to high. The Stereotypic Behavior subscale has also demonstrated sensitivity to change (Aman et al., 1989; RUPP Autism Network, 2002). A potential drawback of the Stereotypic Behavior subscale is that it contains only seven items (mostly describing motor stereotypy and movements of the extremities).
Stereotyped Behavior Scale
Stereotyped Behavior Scale (SBS) is a 24-item measure of repetitive behavior rated on two scales: frequency (0 (behavior does not occur) to 6 (behavior occurs approximately every 5 min)) and a severity scale (0 (none) to 3 (severe); Rojahn et al., 2000). The investigators moved through several steps to develop this measure. First, using factor analysis, the measure was winnowed to 24 items from a much larger set of items (Rojahn et al., 1997). These 24 items cover common, observable, repetitive behaviors as a single factor. Normative data on the frequency scale came from a sample of 550 attendees in community day programs serving intellectually disabled individuals (aged 13–82 years). Although there were some differences by age and gender, scores in the mid-30s on the SBS frequency scale marked approximately 1.5 standard deviations above the population mean and presumably reflect a clinically meaningful score.
A more fine-grained examination of the psychometric properties of this scale was conducted in a separate sample of 120 adults with developmental disabilities. Internal consistency was excellent for the frequency and the severity scales. Other indices of reliability and validity were also strong (e.g. excellent test–retest reliability and high correlation with the ABC-Stereotypic Behavior subscale).
To our knowledge, the SBS has not been used as an outcome measure in a clinical trial. The authors suggest that either the frequency or severity scale could be used as an outcome measure. However, the frequency scale offers greater range and has richer normative data, which could be used to set a benchmark for study entry. Although the SBS has good coverage of stereotypic motor behavior, it does not include more complex repetitive behaviors or circumscribed interests (e.g. watching the same video over and over, talking about the same topic over and over, and lining up objects). Thus, its use as an outcome measure in clinical trials may be confined to treatments focused on repetitive motor behaviors.
Repetitive Behavior Questionnaire
The Repetitive Behavior Questionnaire (RBQ) is a 26-item parent-rated instrument derived from the more detailed Repetitive Behavior Interview (Honey et al., 2012a; Turner, 1999). In its original conceptualization, the RBQ included four domains: repetitive motor behavior, behavior reflecting insistence on sameness, repetitive speech (e.g. echolalia), and circumscribed interests. In a sample of 180 children (aged 3–16 years) with ASD, Honey et al. (2012a) identified two factors for the RBQ: sensory/motor behaviors and insistence on sameness/circumscribed interests. These results are consistent with the concept of higher order (examples of specific items: “rituals for everyday activities,” “plays the same music,” “game or video,” “insists things in the house stay the same”) and lower order (“arrange toys or other items,” “hand and finger mannerisms,” “repetitively fiddle with toys or other items”) repetitive behaviors (Turner, 1999). In the study by Honey et al. (2012a), the subscale scores and the total score showed excellent internal consistency. For convergent validity, the RBQ shows medium or higher correlations (range: 0.43–0.60) with other measures of repetitive behavior such as the RBS-R (Honey et al., 2012a). Test–retest reliability for the RBQ was not examined. To date, the RBQ has not been used in a clinical trial. The scoring is slightly atypical with some items rated on a 3-point scale and others on a 4-point scale. In summary, the RBQ is relatively brief and appears reliable and valid. Whether it consists of four domains or two is not completely settled, and this may affect its application in clinical trials. If future study supported the two-factor scale, the total score may be useful as an outcome measure. On the other hand, if future study supported a four-factor structure (as originally proposed), the total score may be less useful due to the multiple pathways to the same total score.
Selected measures reflecting the evaluation process
Clinical Global Impression–Severity (CGI-S) and Clinical Global Impression–Improvement (CGI-I) are commonly used, clinician-rated measures of overall severity and improvement (Guy, 1976). The CGI-S is a 7-point scale ranging from 1 (normal, not at all ill) through 7 (among the most extremely impaired patients). Over the past decade, many clinical trials have used a minimum score of 4 (moderate) as an entry criterion (King et al., 2009; RUPP Autism Network, 2002). On the CGI-I, the clinician uses all available information to rate the subject’s improvement compared to baseline regardless of whether or not the improvement is due to the experimental treatment. The CGI-I is also a 7-point scale with responses ranging from 1 (very much improved) through 4 (no change) to 7 (very much worse).
An advantage and a disadvantage of the CGI-S and CGI-I are the “global” nature of these ratings (Busner et al., 2009). The specific study provides a clinical context that influences the approach to rating the CGI-S and the CGI-I. For example, clinical trials in children (aged 5–17 years) with ASD may focus on target symptoms such as serious behavioral problems (RUPP Autism Network, 2002), hyperactivity (RUPP Autism Network, 2005), or repetitive behavior (King et al., 2009). To ensure that raters appropriately weight the target problems as well as the overall clinical picture, rater training is essential (Arnold et al., 2000). The narrow range of the CGI-S limits its use as an outcome measure. The CGI-I can be used to classify subjects as “responders” (CGI-I of very much improved or much improved) and “nonresponders” (all other CGI-I ratings on the 7-point scale). The results on the CGI-I can be used to make a statement about the likelihood of therapeutic success for an experimental treatment. For example, a positive response rate of 50% suggests a 50% probability that patients with similar clinical profile will benefit from the treatment. The CGI-I is most often used as an important secondary measure.
Autism Impact Measurement
In a recent study (Kanne et al., 2013), the construct, convergent, and test–retest reliability was examined for the Autism Impact Measurement (AIM) in 440 participants (aged 2–16 years) in the Autism Treatment Network. The original AIM included 41 items each rated for frequency (how often the behavior occurred over the past 2 weeks) and impact (how much the behavior adversely affected everyday living over the past 2 weeks). The frequency and impact dimensions are scored from 1 to 5 for each item. Following the evaluation of the 41 items, the investigators proposed a 25-item measure made up four factors: repetitive/restricted behavior, communication/language, social–emotional reciprocity, and odd/atypical behavior. Eight items inquire about RRBs. The findings of this study support the reliability and validity of this measure. However, the scope of RRBs in the eight items is limited. In addition, the additive scoring of frequency and impact result in a wide range from 0 to 80. Thus, the AIM was classified as potentially appropriate awaiting information from additional research.
Social Responsiveness Scale
Of the 24 measures reviewed, 12 including the Social Responsiveness Scale (SRS) were classified as not appropriate. Although the SRS has been used as an outcome measure in clinical trials in ASD, questions remain on how to apply the scale in clinical trials. The SRS is a 65-item parent- or teacher-rated instrument scored on a 0- to 3-point scale (not true, sometimes true, often true, almost always true). Versions of the SRS are available for individuals aged 3–4 years, 4–18 years old, and adults (Constantino et al., 2000). The instrument has been examined in several population-based studies and results indicate that the total score is normally distributed and that it can be considered a single scale (Constantino et al., 2000, 2003; Constantino and Todd, 2003). In school-aged children, a raw score above 65 indicates clinically meaningful deficits in social behavior and is predictive of an ASD diagnosis. However, high scores may also be observed in children with attention deficit hyperactivity disorder (ADHD) or anxiety disorders (Constantino and Gruber, 2012; Hus et al., 2013).
In addition to a total score (0–195), the SRS has five subscales based on consensus of an expert panel: Social Awareness, Social Communication, Social Cognition, Social Motivation, and Repetitive Behavior (Constantino and Gruber, 2012). Some of the twelve items reflect observable repetitive behaviors (e.g. “thinks or talks about the same thing over and over,” “has unusually narrow range of interests,” and “has repetitive, odd behaviors such as hand flapping or rocking”). However, this subscale also includes behaviors that may not reflect repetitive behavior (e.g. “behaves in ways that seem strange or bizarre” and “is regarded by other children as odd or weird”). The 12-item RBS was used as an outcome measure in the pilot trial of NCS in 29 subjects (aged 3–11 years) with autistic disorder (Hardan et al., 2012). The SRS–Social Cognition and Repetitive Behavior subscales showed superiority to placebo. Given the small sample size, these results are of interest—but not convincing. From a practical perspective, repeated measurement of the 65-item SRS to obtain scores on 12 items on repetitive behavior warrants careful consideration. Thus, the SRS appears useful for population screening, but its use as an outcome measure for repetitive behavior is less certain resulting in the classification as not appropriate.
Discussion
This review of RRB measures focused on the use of available instruments as outcome measures in clinical trials. Thus, measures judged “not appropriate” (e.g. SRS) may indeed be reliable and valid for other purposes such as screening or characterizing children with ASD.
DSM-5 criteria retain RRBs as core features of ASD. Although there are multiple challenges in the assessment and quantification of RRBs in children with ASD, these symptoms can be a source of impairment and are an appropriate target for intervention. Among the many challenges is that RRBs show a wide range in children with ASD from motor stereotypy to incessant talking about a preferred topic. This wide range of behaviors has prompted several investigators to conduct factor analyses, with findings supporting 2–6 factors. In addition to the factor analytic work described above, several investigations have examined the factor structure of the repetitive behavior items on the ADI-R (see Honey et al., 2012b, for a review). These studies consistently identified a two-factor structure: stereotypy and insistence on sameness. However, the repetitive behavior section of the ADI-R includes only 12 items. The limited coverage of RRBs on the ADI-R may hinder the identification of symptom clusters not represented in the truncated list.
What is clear is that RRBs in ASD are not a single dimension. In designing clinical trials in this population, therefore, the subtype of repetitive behavior warrants consideration. For example, an instrument such as the 24-item SBS would be a sound choice as an outcome measure for a treatment targeting motor stereotypy. The SBS would be less relevant for tracking effects of a treatment focused on ritualistic behavior. There is evidence that Insistence on Sameness is associated with specific genetically determined neurobiological pathways, which may inform selection of new medications (Hus et al., 2007; Smith et al., 2009; Sutcliffe et al., 2005). Given that subtypes of repetitive behaviors may have different abnormal neurobiological pathways, the match between a specific drug or behavioral intervention and type of repetitive behavior is essential.
Repetitive behaviors are present in typically developing children, children with developmental disabilities, and are defining features of Tourette syndrome and OCD. This observation poses challenges for delineating normal from abnormal behaviors on the one hand and differential diagnosis on the other. Historically, the most effective treatments for tics are dopamine receptor blocking antipsychotic medications, which have also been shown to improve RRBs in children with autism. For example, the potent dopamine D2 receptor antagonist, haloperidol, was significantly better than placebo in reducing stereotypic behavior in children with autism (Anderson et al., 1984). Two decades later, risperidone also showed superiority to placebo in reducing repetitive behavior in children with autism (McDougle et al., 2005). The success of dopamine D2 receptor antagonists for tics and stereotypic behavior is consistent with an extensive body of preclinical and clinical evidence implicating dysregulation of dopamine-mediated cortical-striatial-thalamo-cortical pathways in these motor symptoms (Alexander et al., 1986). Depending on age and severity, however, clinicians and parents may be reluctant to use antipsychotic medications for the treatment of RRBs in children with ASD.
Preclinical data also support a role for serotonin in the emergence of normal and abnormal habit formation (Barnes et al., 2005). The selective serotonin reuptake inhibitors (SSRIs) are often effective in reducing symptom severity in children with OCD. To date, however, the SSRIs have not shown superiority to placebo in children with ASD (Dove et al., 2012; King et al., 2009). As noted, haloperidol and risperidone are potent D2 receptor blockers. Unlike haloperidol, however, risperidone is also a serotonin receptor antagonist. This pharmacological property fits with preclinical data suggesting that serotonin receptor antagonists may improve behavioral inflexibility, stereotypic behavior, and repetitive grooming (Klejbor et al., 2009; Taylor et al., 2006 Wagner et al., 2004). Clearly, drug development for RRBs in ASD is at an early stage. Identification of a reliable and valid measure that is sensitive to change is an essential prerequisite for effective drug development.
In this review, the CYBOCS-PDD, RBS-R, ABC-Stereotypic Behavior subscale, SBS, and RBQ were rated appropriate with conditions. The clinician-rated CYBOCS-PDD has the strongest track record to date as an outcome measure in clinical trials in the ASD population. It has established reliability, validity, and sensitivity to change (Marcus et al., 2009; McDougle et al., 2005; Scahill et al., 2006). In the federally funded multisite trial of citalopram (King et al., 2009) and the industry-sponsored registration trial of fluoxetine (http://clinicaltrials.gov/ct2/show/NCT00515320), there was minimal change on the CYBOCS-PDD and no difference between active drug and placebo. These results suggest that the CYBOCS-PDD is not highly vulnerable to placebo effects or random fluctuation with time. The CYBOCS-PDD measures severity of the reported behaviors without regard to the RRB type. If a compound under study is aimed at a specific subtype of RRBs, selected measures such as the SBS or RBS-R might be considered as a primary or important secondary measure. Despite its strengths, the CYBOCS-PDD was modified from an instrument designed to evaluate symptom severity and treatment outcome in children with OCD.
The Stereotypic Behavior subscale of the ABC has normative data in developmentally disabled (DD) populations, has established validity via factor analysis, and has demonstrated sensitivity to change (Marcus et al., 2009; RUPP Autism Network, 2002). The 58-item ABC scale has been used in many clinical trials suggesting that it does not pose undue burden on respondents. The drawback of the ABC Stereotypy subscale is the modest symptom coverage of the seven items.
The RBS-R has demonstrated reliability and validity, but more research is needed to evaluate its use as an outcome measure in clinical trials. The scale includes multiple factors with low correlation across some of the subscales suggesting that these factors are measuring separate symptom dimensions. A potential strength of the RBS-R, however, is the ability to capture the subdomains of RRBs. Accordingly, a drug development program may select a specific RRBs subscale such as the Stereotypy subscale or the Sameness subscale as an outcome measure. These two subscales provide adequate symptom coverage in contrast to other RBS-R subscales with fewer items and more limited scoring range (Lam and Aman, 2007).
The SBS has solid psychometric properties, adequate coverage of motor stereotypies, and would likely perform well in a study focused on stereotypic behavior. The test–retest results suggest that it is not vulnerable to random fluctuation. To date, however, it has not yet been used as an outcome measure. Another strength of the SBS is the available data supporting its use in adults with developmental disabilities. The RBQ has solid internal consistency and validity. The total score captures both lower order and higher order behaviors. Test–retest reliability and sensitivity to change have not been evaluated. In addition, it is not clear whether the total score or the individual subscales should be used as a primary endpoint.
Conclusion
Despite the increased recognition and demand for clinical services in children with ASD, drug development in children with ASD remains in the early stage. To date, only two medications are approved for the use in children with Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) defined autistic disorder accompanied by tantrums, aggression, or self-injury. There are no approved medications for core features of ASD and the evidence supporting the use of any drug for RRBs or social disability is limited. To be useful as a primary outcome measure, an instrument must be relevant to the clinical target, must have adequate coverage of the symptom domain without being burdensome to informants or subjects. The measure must also be reliable and valid. Reliability requires demonstration of internal consistency, minimal fluctuation on test–retest, and for clinician-rated instruments, good to excellent interrater agreement. Validity is established by showing that the instrument measures the intended symptom domain. This may be done by showing separation from measures tapping other symptom domains and showing convergence with instruments measuring the intended domain. Sensitivity to change with treatment is an obvious requirement as well and often requires the “test of time” to delineate the utility of the measure from ineffective treatments that fail to produce change. Given the range of RRBs in children with ASDs, this is a challenging, but important, endeavor.
Footnotes
Appendix 1
Acknowledgements
Autism Speaks provided resources for the collaborative activities of this workgroup. Disclosures: Dr Scahill: Roche, consultant; Bracket, consultant, BioMarin, consultant; Shire, research support; Roche, research support; Pfizer, research support. Dr Aman: Roche, consultant; Bristol-Myers Squibb, consultant, research grant; Forest, consultant; Pfizer, consultant; Supernus, consultant; Johnson & Johnson, research grant; ProPhase LLC, investigator training; CogState, investigator training. Dr Handen has received research support from Eli Lilly, Curemark, and Bristol-Myers Squibb. Dr King reports serving as a consultant to Biomarin and NeuroPharm and as an unpaid consultant to Forest, Nastech, and Seaside Therapeutics. He has received or has pending research grant support from Neuropharm and Seaside Therapeutics. Drs Horrigan and Jones are currently employed at Neuren Pharmaceuticals. Dr Dawson: consultant to Johnson and Johnson, Roche, Seaside Therapeutics, SynapsDx, and Nastech. Professional advisory board: Integragen, Inc.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
