Abstract
The use of behavior checklists is a valuable component of the assessment of ADHD, and checklists are recommended for evaluation and treatment of ADHD (Pliszka & American Academy of Child and Adolescent Psychiatry [AACAP] Work Group on Quality Issues, 2007; M. Wolraich et al., 2011). The American Academy of Pediatrics (2000) recommends that in assessing ADHD, clinicians should acquire evidence directly obtained from parents/caregivers and teachers, and systemic assessment of symptoms during follow-up and monitoring is recommended. Rating scales provide multiple advantages to this clinical endeavor, allowing for systematic assessment of symptoms from the perspective of multiple respondents while placing minimal burden on treatment providers (Rushton, Fant, & Clark, 2004).
Due to the time constraints that are universal in ADHD evaluation and treatment settings, there is a need for ADHD rating scales that are not only reliable and valid but also brief and practical. Appointments in child psychiatry, psychology, and primary care pediatrics and family practice clinics require clinicians to accomplish clinical evaluation and treatment goals in short amounts of time. Rating scales can assist in meeting these clinical goals, but only to the extent that they are practical for use in clinical settings. However, despite the potential value of rating scales for the evaluation of ADHD, their use is not universal. For instance, one survey of family physicians and pediatricians noted that up to 20% to 30% of practitioners do not routinely use rating scales in assessing ADHD symptoms (Rushton et al., 2004).
There are many possible explanations for why the use of rating scales is not more widespread in clinical settings: Some clinicians may feel that they lack sufficient knowledge to effectively use rating scales and may not have time to research and learn about their use. Access to rating scales may be limited by factors such as cost and copyright limitations. Implementation of rating scales may be hindered by excessive length and/or complexity of scoring and interpretation. The disparity between published assessment guidelines and recommendations for use of behavior checklists of ADHD and their limited use in clinical settings (particularly for follow-up monitoring of symptoms) further emphasizes the need for addressing barriers to clinical use of those scales.
We propose that, in addition to strong psychometrics, five core properties of a rating scale determine its practical utility in clinical settings: (a) brevity in administration, (b) correspondence to well-accepted diagnostic nomenclature (typically reflected by overlap with Diagnostic and Statistical Manual of Mental Disorders [4th ed.; DSM-IV] criteria; American Psychiatric Association [APA], 2000), (c) breadth of additional relevant content (in the case of ADHD assessment, content reflecting commonly comorbid Disruptive Behavior Disorder [DBD] symptoms), (d) ease of scoring and interpretation, and (e) ease of availability for use.
Brevity in administration reflects the time burden placed on the examiner in presenting the scale and in the respondent on completing the scale. One of the biggest challenges in designing a rating scale is the trade-off between depth and length of the rating scale. Examples of longer, broad-band rating scales include the Behavior Assessment System for Children, second edition (BASC-2; Reynolds & Kamphaus, 2004); the Child Symptom Inventory–4 (CSI-4; Gadow & Sprafkin, 1997b); and the Child Behavior Checklist (CBCL; Achenbach & Rescorla, 2001), all of which have 90 items or more. While helpful for investigating a wide variety of behavioral symptoms, these scales are typically longer to complete and harder to score, reducing their utility for everyday clinical use, particularly treatment monitoring.
Narrow-band ADHD and related scales typically contain less than 50 items, with the shortest ADHD scales consisting of 25 items or less. Commonly used narrow-band scales include the Brown Attention Deficit Disorder Scales (BADDS; Brown, 2001), the ADHD Symptom Checklist–IV (ADHD-SC-4; Gadow & Sprafkin, 1997a), the ADHD Rating Scale–IV (ADHD-RS-IV; DuPaul, Power, Anastopoulos, & Reid, 1998), and short versions of the Swanson, Nolan, and Pelham–IV rating scale (SNAP-IV rating scale; J. M. Swanson, 1992; J. Swanson et al., 2001), the National Initiative for Children’s Healthcare Quality (NICHQ) Vanderbilt Assessment Scale (Schoemaker et al., 2012; M. L. Wolraich, Feurer, Hannah, Baumgaertel, & Pinnock, 1998), and the Conners’ Rating Scales (Conners, 2008). Allowing for 10 s per item, 30 or fewer items would be required for a scale that takes 5 min or less to complete. Hence, even most narrowband scales require more than a 5-min time investment on the part of the respondent. While this might be justified for an initial diagnostic evaluation, such a time investment becomes less feasible for repeat administrations or for completion of multiple questionnaires by a single informant (e.g., teacher completion of multiple ADHD questionnaires over time, or a single respondent completing multiple questionnaires for the same child as a part of an evaluation).
Correspondence to diagnostic nomenclature is advantageous for clinically useful rating scales for several reasons, including relevance for establishing a diagnosis, identifying evidence-based treatments (most of which are based on diagnosis), and communication with other professionals about symptoms. Most of the major ADHD rating scales are either partly or completely DSM-based, although the degree of this varies between scales. For example, the SNAP-IV and the ADHD-RS-IV contain items for each of the 18 DSM-IV (APA, 2000) criteria for ADHD, whereas other scales (e.g., the BASC-2) are not directly based on the DSM-IV.
Clinically useful ADHD behavior checklists ideally address a breadth of additional relevant content reaching beyond ADHD alone. Among the most common comorbid disorders with ADHD are the DBDs: Oppositional-Defiant Disorder (ODD) and Conduct Disorder (CD; Efron & Sciberras, 2010). ADHD comorbid with DBD is associated with poorer outcomes than ADHD alone (Barkley, 2006), and it is unlikely that treatment will be considered to be effective if ADHD symptoms improve but DBD symptoms do not. Although some of the longer ADHD scales (such as the 50-item ADHD-SC-4, 55-item NICHQ Vanderbilt Assessment Scale, and the 40-item SNAP-IV) include additional DBD items commonly associated with ADHD, most short ADHD scales do not. For example, ADHD symptom–based rating scales (such as the 18-item SNAP-IV rating scale and 18-item ADHD-RS-IV) cover only ADHD symptoms. Similarly, the follow-up (short; 26-item) version of the NICHQ Vanderbilt Assessment Scale includes all 18 ADHD symptoms as items, along with 8 academic, social, and activity performance items, but no items evaluating ODD or CD. Hence, existing ADHD behavior checklists have the advantage of significant brevity (30 items or less) or breadth of additional DBD-related content, but typically not both. In particular, short versions of ADHD rating scales, which are often used in repeated measurement to track change, typically do not include items evaluating ODD or CD.
Ease of scoring and interpretation is another component affecting the clinical utility of behavior checklists. The length of subscales and grouping of items determine the ease (or difficulty) of calculating subscale scores, and as a result, the amount of time and resources needed to score and interpret the checklist. Some scales such as the SNAP-IV and ADHD-SC-4 group subscale items together so that subscales can be tallied quickly, enhancing the scoring ease and clinical utility of the scale (although this also increases the chances of response bias affecting clusters of items). Other scales use systematic (although not adjacent) placement of items to simplify scoring, allowing for easy (although slower and possibly more prone to error) hand scoring. Still other scales (BASC-2, CBCL/TRF [Teacher’s Report Form]) mix subscale items in a quasirandom order, requiring the use of templates or computers for scoring. In general, subscales with fewer items placed adjacent to each other are easier to score.
Finally, the scale’s clinical usefulness can be affected by its availability. Factors limiting availability include cost, ordering procedures, needs for storage and inventory of purchased forms, and copyright limitations. The monetary cost of behavior checklists can be large in a busy clinic seeing hundreds of new and return clinic visits. If checklists must be ordered from a publisher, it is necessary to be aware of inventory and to find time to place and monitor an order.
Many excellent behavior checklists exist for the measurement of ADHD and have seen widespread clinical use. Given the existence of these instruments, it is reasonable to have some caution about the need for yet another behavior checklist measuring ADHD. The prior review, however, suggests that such a need does exist for two reasons. First, no existing scale meets the ideal characteristics for clinical utility in the five areas described earlier. Second, research suggests that approximately 20% to 30% of pediatricians and family physicians do not routinely use behavior checklists in the assessment of ADHD (Rushton et al., 2004), and more practices likely do not use behavior checklists regularly to monitor symptoms at every visit. Therefore, there is a need for new instruments to meet the needs of busy clinical practices, in the hopes of increasing the use of behavior checklists to enhance evaluation and treatment of ADHD.
This study was designed to evaluate the psychometric properties of the CHAOS (Conduct-Hyperactive-Attention Problem-Oppositional Symptom) scale, which was designed a priori as a clinically useful scale for the evaluation and symptom monitoring of children with ADHD. Consistent with the five ideal characteristics of a clinically useful scale, the CHAOS scale is concise (22 items); DSM-based; contains subscales to measure ADHD, ODD, and CD symptoms; and is easy to score and interpret. Furthermore, the CHAOS scale is easily available from the authors (see appendix) for unlimited use.
Method
Participants
Subscale development sample
Subscale Development Sample participants were 205 children and adolescents aged 6 to 17 years (M age = 11.1 years, SD = 2.9 years; 160 males, 45 females). The sample was obtained from consecutive patients who (a) were seen for evaluation, medication, or behavior therapy visits at an outpatient ADHD specialty clinic over approximately a 2-year period; and (b) had at least one mother-report CHAOS scale in the chart. If a patient was administered multiple CHAOS scales during that time period, only the earliest mother-completed scale was selected for entry into the data set. Therefore, the inclusion criteria for the Subscale Development Sample were as follows: (a) seen in the ADHD clinic by either the child psychiatrist or psychologist, (b) completion of the CHAOS scale by mother during the (approximately 2-year) time period sampled from the chart, and (c) age 6 to 17 years.
Validation sample
Participants for the Validation Sample were 139 consecutive patients aged 6 to 17 years (M age = 10.7 years, SD = 3.1 years; 82 males, 57 females—10 African American, 1 Indian-Asian, 1 Pacific Islander, and 127 White) who were seen in a comprehensive psychological testing clinic (for cognitive and/or personality assessment) over approximately an 8-year period of time. The inclusion criteria for the Validation Sample were as follows: (a) seen in a comprehensive psychological testing clinic, (b) completion of the CHAOS scale by a parent during the time period sampled, and (c) age 6 to 17 years. The primary missions of this testing clinic are training of advanced graduate students/interns and assessment of attention, learning, and other possible cognitive processing problems; as a result, the testing clinic has a low volume because of its teaching mission. Across primary, secondary, and tertiary referral questions coded by the primary clinician, the most common reasons for referral were attention/concentration problems (n = 103), learning problems (n = 89), and aggression, anger, or oppositionality (n = 35). Based on the testing and interview results, 78 (56%) participants of the sample were clinically diagnosed with ADHD (42 with predominantly Inattentive subtype, 36 with Combined subtype), 26 (19%) participants of the sample were diagnosed with a CD or ODD, 39 (28%) participants were diagnosed with an Anxiety Disorder (including adjustment disorders with an anxiety component), and 52 (37%) participants were diagnosed with a Depressive Disorder (including adjustment disorders with a depression component).
The primary CHAOS scale respondents for the Validation Sample were 15 fathers, 121 mothers, and 3 grandmothers. The 139 CHAOS scales completed by these primary respondents during the psychological testing visit will be referred to as the “Primary CHAOS scales.” In addition, 96 mothers and 17 fathers completed a CHAOS scale at an earlier pretesting clinic intake visit (average duration between visits = 67.8 days for mothers and 120.0 days for fathers). In 27 cases, 2 parents from the same family completed CHAOS scales during the psychological testing visit. Finally, 101 teachers completed the CHAOS scale for participants in the Validation Sample. Socioeconomic status of the sample was estimated from parental level of education; fathers (data available for 137 families) averaged 15.5 years of education (SD = 1.9), while mothers (data available for 138 families) averaged 15.0 years of education (SD = 1.9).
Procedure
Data for the present study were obtained using chart-review methods approved by the university institutional review board. CHAOS scales are routinely completed in the clinic by parents and teachers to provide clinical assessment and monitoring information.
Participants in the Subscale Development Sample were being seen for routine clinic visits and therefore did not complete other measures. As a result, only basic demographic data and CHAOS scale results were available for these participants (this was sufficient because only relationships between CHAOS scale items were analyzed in the Subscale Development Sample). All patients in the ADHD-DBD clinic who had at least one mother-completed CHAOS scale (if multiple CHAOS scales had been completed for a participant, the earliest mother-completed CHAOS scale was selected to have only one CHAOS scale per participant) were selected for the Subscale Development Sample. The only other information abstracted from the chart was the child’s age and gender.
Participants in the Validation Sample (age 6-17 years) were seen for psychological testing and were required to have at least one parent-completed CHAOS scale. Patients in the testing clinic are routinely administered behavior checklists and measures of executive functioning. Clinically assigned diagnoses and reasons for referral (as coded by the clinician responsible for the testing) were abstracted from test reports in the charts.
Measures
CHAOS scale
The CHAOS scale is a very brief (22-item) parent- and teacher-completed rating scale of core symptoms of ADHD and DBDs (see appendix). CHAOS scale items were selected using a content validity analysis in which five clinical experts (two child psychiatrists, two clinical social workers, and one clinical psychologist) were asked to identify the five most prototypical symptoms from the DSM-IV criteria for each of the following four diagnoses: ADHD Predominantly Inattentive Type, ADHD Predominantly Hyperactive-Impulsive Type, ODD, and CD. Symptoms identified by a majority (three) of the experts were retained for the CHAOS scale, resulting in a 20-item scale consisting of 5 items for each of the diagnostic areas. Two additional items reflecting extremely common child behaviors (complaining when asked to do an undesired task; being unhappy when denied something) were also added to the scale to assist with identifying problems with underreporting, resulting in a final 22-item scale. This research project focused only on the first 20 (clinical) items of the scale. The CHAOS scale follows a 0 to 3 point scale ranging from 0 (never) to 3 (very often).
Symptom Inventories–4 (SI-4)
In addition to the CHAOS scale, parents and teachers in the Validation Sample completed the DSM-IV-based SI-4 (either Adolescent Symptom Inventory-4 [ASI-4], Gadow & Sprafkin, 1998, n = 33 parents and 20 teachers or CSI-4, Gadow & Sprafkin, 1997b, n = 100 parents and 88 teachers). Most SI-4 items are rewordings of all DSM-IV symptoms from the most common child and adolescent psychiatric disorders, rated on a 0 to 3 scale of severity. SI-4 subscale (corresponding to DSM-IV diagnosis) severity scores are sums of severity ratings for items from that diagnosis/subscale. In the present study, severity rating raw scores for ADHD-Inattentive, ADHD-Hyperactive-Impulsive, ODD, and CD were obtained from the SI-4 scales. Because the ASI-4 and CSI-4 items for these four subscales are identical, data from those scales could be combined for analysis (133 parents and 108 teachers).
Cognitive testing measures
The majority of sample participants were administered several performance-based measures of executive functioning and intellectual ability. The Stroop Color and Word Test (SCWT; Golden, 1978; n = 111) measures ability to inhibit an automatic response (word reading) in favor of a more effortful, incongruent response (color naming). In the color–word condition of the SCWT, the participant must name the color of the ink in which one of three words (red, green, or blue) is printed. In this study, raw scores on the color–word condition of the SCWT (number of accurate responses in 45 s) were used to assess executive functioning. The Counting Interference Test (CIT; Hummer et al., 2011; n = 110) is a counting Stroop test for which participants must state the number of numerals present in a series of one-, two-, or three-digit numbers (e.g., 222, 11, 3), suppressing numeral naming in favor of identifying the number of digits present. CIT number-count raw scores (the number of accurate responses in 45 s) differentiate participants with ADHD from those with a DBD and from those with no psychiatric diagnosis (Hummer et al., 2011), and CIT scores are related to activation of the dorsolateral prefrontal cortex (Mathews et al., 2005). The Conners’ Continuous Performance Test (CPT; Conners & Staff, 2000; n = 114) is a computer-administered test of attention and inhibition. In the present study, the Hit Reaction Time Standard Error (RTSE) raw score was used to measure vigilance during the test. The RTSE score is the standard error of the participant’s response time to all targets, and is the strongest contributor to discriminant function indexes that predict attention problems (Conners & MHS Staff, 2000). The Kaufman Brief Intelligence Test (KBIT; Kaufman, 1990; n = 110) is a brief, well-validated measure of verbal and nonverbal intellectual abilities. In addition to separate estimates for verbal and nonverbal IQ, it yields an IQ composite score reflecting global intellectual ability.
Data Analysis
Item-level analyses were conducted with the Subscale Development Sample. First, items were evaluated with descriptive statistics for appropriate distribution properties to test for floor or ceiling effects (extremely low or high levels of endorsement) or for restriction of range that might affect item utility and validity. Second, principal components analysis (PCA) was used to empirically group items into components to develop subscales. Although it was hypothesized that the items would fall into groupings consistent with the four DSM-IV diagnoses reflected by the CHAOS scale, PCA was used to provide an empirical test of the item groupings. The a priori criteria for number of components selected and rotated were eigenvalue > 1 and Promax rotation, although other methods of component extraction and rotation were tested to ensure that the PCA results were robust.
The Validation Sample was used for the primary reliability and validity analyses (DeVellis, 1991). Internal consistency of CHAOS subscales was evaluated using Cronbach’s alpha. Interrater reliability (between mothers and fathers and between parents and teachers) and test–retest reliability were evaluated with correlational analyses. Criterion validity was investigated by correlating CHAOS subscale scores and corresponding (DSM-based) subscale scores from the SI-4. Construct validity was evaluated with correlations between CHAOS ADHD-related subscales and the cognitive measures of executive functioning, as executive dysfunction is a core deficit in children with ADHD (Barkley, 2006). Finally, differential validity based on clinically assigned DSM-IV diagnoses was calculated using t tests comparing groups of participants with similar diagnoses: ADHD diagnosis without CD or ODD diagnosis (n = 60; 40 with ADHD-Inattentive Type and 20 with ADHD-Combined Type), CD or ODD diagnosis without ADHD (n = 8), ADHD with comorbid CD or ODD diagnosis (n = 18; 2 with ADHD-Inattentive Type and 16 with ADHD-Combined Type), and a residual clinical group with no ADHD, CD, or ODD diagnosis (n = 53; 17 with an Anxiety Disorder, 21 with a Depressive Disorder, 3 with Bipolar I or II Disorder, 24 with a Learning Disorder, and 2 with a Somatoform Disorder; numbers add to more than 53 due to comorbidities).
Results
Item Distribution
Across the Subscale Development Sample, a full range (0-3) of ratings was given for all 20 clinical CHAOS scale items. Item means fell between 1.6 and 2.4 for ADHD-Inattentive items (SD = 0.8-1.1), between 1.7 and 2.1 for ADHD-Hyperactive-Impulsive items (SD = 0.8-1.2), and between 1.5 and 1.9 for ODD items (SD = 0.9-1.2). Item means (0.33-1.0) for the CD items were lower (SD = 0.4-1.1), reflecting the lower frequency of occurrence of these symptoms (a table with means and SD of all items is available from the authors). Overall, the distribution of the items indicated no significant problems with restriction of range.
Principal Components Analysis
Using the Subscale Development Sample, the 20 clinical CHAOS scale items were subjected to a PCA with Promax rotation (other factor analytic and rotation methods were also applied, with similar results) to investigate whether items grouped empirically into clusters consistent with diagnostic symptom groupings. The eigenvalue > 1 convention and scree plot inspection supported a four-component solution. Inspection of the pattern of PCA loadings (Table 1) indicated that CHAOS scale items fell into groups consistent with DSM-IV groupings, with Components 1 to 4 reflecting ADHD-Hyperactive-Impulsive symptoms, ODD symptoms, CD symptoms, and ADHD-Inattentive symptoms, respectively. The four CHAOS components (subsequently referred to as subscales) were named Hyperactivity-Impulsivity, Oppositional Behavior, Conduct Problems, and Attention Problems. Means and standard deviations for the CHAOS subscales in the Subscale Development Sample are reported in Table 1. Males scored higher than females on Hyperactivity-Impulsivity, t(203) = 3.1, p < .01; Oppositional Behavior, t(203) = 2.0, p < .05; Conduct Problems, t(203) = 2.3, p < .05; and Attention Problems, t(203) = 2.1, p < .05.
Principal Components Analysis of CHAOS Items.
Note. Values for items are component loadings following PCA with Promax rotation. Items are shortened and paraphrased. PCA = principal components analysis; CHAOS = Conduct-Hyperactive-Attention Problem-Oppositional Symptom.
Denotes loadings greater than 0.50.
Reliability
Internal consistency, interrater, and test–retest reliability values were obtained from the Validation Sample. Internal consistency for all CHAOS subscales was very strong for parent- and teacher-completed subscales (Table 2). Interrater agreement between mothers and fathers was moderate to strong (r = .58-.63) for all subscales except for the Conduct Problem subscale (r = .21). Parent–teacher interrater reliability was lower (r = .17-.41) but statistically significant for most subscales (Table 2).
CHAOS Subscale Reliability.
Note. Alpha is Cronbach’s alpha for internal consistency. Values for test–retest and interrater reliability are Pearson correlations. CHAOS = Conduct-Hyperactive-Attention Problem-Oppositional Symptom.
p < .10. *p < .05. **p < .01. ***p < .001.
Across the entire test–retest sample, very strong test–retest reliability was found for all CHAOS subscales (r = .74-.87; Table 2). However, these data included a very wide range of test–retest times, with 15 parents providing test–retest data over a period of greater than 6 months. As a result, separate test–retest analyses were conducted for the 58 parents who provided test–retest data within a period of 2 to 9 weeks and for the 37 parents who provided test–retest data for a 10- to 26-week period. For the 2- to 9-week and the 10- to 26-week test–retest periods, reliability values were high and similar to those found for the entire sample (Table 2).
Validity
SI-4
Correlations between CHAOS subscales and corresponding SI-4 DSM-based subscales for the same respondent were .86 or higher for all subscales with the exception of Conduct Problems (Table 3). Correlations between CHAOS and SI-4 subscales that did not correspond directly in DSM-IV content (e.g., CHAOS Attention Problems and SI-4 ODD) were much lower than those for corresponding subscales, although most remained statistically significant (Table 3).
Relationships Between CHAOS and SI-4 Scores.
Note. Values are Pearson correlations. ADHD-I = SI-4 ADHD-Inattentive subscale; ADHD-H = SI-4 ADHD-Hyperactive-Impulsive subscale; ODD = SI-4 Oppositional-Defiant subscale; CD = SI-4 Conduct Disorder subscale. Top rows of correlations are for parent-report CHAOS scale and parent-report SI-4; second rows are for teacher-report CHAOS scale and teacher-report SI-4. CHAOS = Conduct-Hyperactive-Attention Problem-Oppositional Symptom; SI-4 = Symptom Inventories–4.
p < .10. *p < .05. **p < .01. ***p < .001.
Cognitive measures
Correlations between the CHAOS scales and cognitive measures are reported in Table 4. Parent-report Attention Problems and Hyperactivity-Impulsivity subscale scores correlated in the expected direction with measures of executive functioning (SCWT, CIT, and Conners’ CPT) as well as with IQ. The magnitude of the correlations, although statistically significant, was small in most cases. Correlations for the teacher-report CHAOS scores and the cognitive test scores were generally not significant.
Relationships Between CHAOS Scores and Cognitive Functioning Measures.
Note. Values are Pearson correlations. CHAOS = Conduct-Hyperactive-Attention Problem-Oppositional Symptom.
p < .10. *p < .05. **p < .01. ***p < .001.
DSM diagnosis
Analyses comparing different diagnostic groups on CHAOS subscale scores are presented in Table 5. For parent-report, both subgroups with ADHD scored higher than the other diagnostic subgroups on the Attention Problems scale, whereas only the comorbid ADHD-CD/ODD group scored higher than the other subgroups on the Hyperactivity-Impulsivity scale, reflecting the predominance of the Combined subtype in this subgroup (16 out of 18 participants in the comorbid ADHD-CD/ODD group had Combined subtype ADHD), as opposed to the ADHD without CD/ODD subgroup, which had more participants of the Inattentive (n = 40) than of the Combined (n = 20) subtype. For the parent-report CHAOS Oppositional Behavior subscale, the subgroups with the ODD-CD diagnoses scored higher than the other clinical groups, while the only difference between groups on the Conduct Problems subscale was found between the comorbid ADHD-CD/ODD group and ADHD-only group (Table 5). For teacher-report CHAOS scales, results were similar to those found for parent-report, albeit less significant.
CHAOS Scores by Diagnosis.
Note. Values in parentheses are standard deviations. Comparisons are for group numbers, with greater than (>) indicating significant differences by t test at p < .05. The residual group consisted of all individuals not diagnosed with ADHD, ODD, or CD (n = 53 for parent-report, n = 40 for teacher-report); n’s for ADHD only, CD/ODD only, and ADHD with CD/ODD groups were 60, 8, and 18 for parent-report and 43, 5, and 13 for teacher-report, respectively. CHAOS = Conduct-Hyperactive-Attention Problem-Oppositional Symptom; CD = Conduct Disorder; ODD = Oppositional-Defiant Disorder.
Discussion
This study provided preliminary evidence supporting the reliability and validity of a brief measure of symptoms of ADHD and DBDs. CHAOS subscales were initially created based on expert judgment of the most prototypical symptoms of ADHD (Inattention and Hyperactivity-Impulsivity), ODD, and CD. Those subscales were subsequently supported by PCA and internal consistency analysis. Validity analyses demonstrated very high correlations between same-respondent CHAOS subscales and corresponding DSM-based subscales consisting of all symptoms of ADHD, ODD, and CD.
Although many other measures exist for the assessment of ADHD and related disorders, the CHAOS scale fills an important, largely unaddressed niche by combining brevity with broad coverage of disruptive behaviors, very easy scoring, and correspondence to DSM-IV. Other behavior checklists incorporate some of these features, but no existing checklist covers DSM-based symptoms of ADHD, ODD, and CD in a very short (e.g., less than 25 items) format with items grouped into subscales of identical length (e.g., 5 items) to facilitate extremely fast completion, scoring, and interpretation. Validity analyses suggest that the CHAOS subscales provide very similar results compared with longer measures that include all DSM-IV criteria for ADHD and DBDs.
Interrater reliability analyses demonstrated strong agreement between mother and father scores on the CHAOS Attention Problems, Hyperactivity-Impulsivity and Oppositional Behavior subscales. The mother–father interrater reliability values obtained for these CHAOS subscales (r = .58-.63) were highly significant but somewhat lower than the median parent–parent interrater reliabilities reported for longer scales such as the BASC-2 (median r = .76-.79 across subscales and age ranges; Reynolds & Kamphaus, 2004) and CBCL (mean r = .73-.76 across subscales; Achenbach & Rescorla, 2001). This may reflect differences in the types of items on the scales or may be influenced by the smaller number of items on the CHAOS scale. Parent–teacher interrater reliability of CHAOS subscales was, as expected, lower than parent–parent interrater reliability, consistent with findings on other behavior checklists (e.g., Achenbach & Rescorla, 2001; Reynolds & Kamphaus, 2004). Furthermore, the range of parent–teacher interrater reliability values for CHAOS subscales (r = .17-.41) was consistent with that reported for other major scales (e.g., mean parent–teacher interrater r = .29 for corresponding CBCL and TRF subscales, Achenbach & Rescorla, 2001; mean parent–teacher interrater r = .40 for the total score of the ADHD-RS-IV, DuPaul et al., 1998). Test–retest reliability values for CHAOS subscales were high (r = .74-.87), demonstrating good stability of CHAOS results over time, consistent with findings of other commonly used behavior checklists (e.g., DuPaul et al., 1998).
Although CHAOS subscales correlated very highly with same-respondent ratings of DSM-IV symptoms of corresponding disorders on the SI-4, relationships between CHAOS subscales and cognitive tests were more modest and were significant only for parent-report CHAOS subscales. Consistent with conceptualizations of ADHD symptoms as reflecting executive functioning (e.g., Barkley, 2006), parent-report CHAOS Attention Problems and Hyperactivity-Impulsivity scores correlated significantly with the SCWT, the CIT, and the Conners’ CPT. Importantly, CHAOS Oppositional Behavior and Conduct Problems subscales were unrelated to these measures of attention and executive functioning, consistent with research suggesting that executive dysfunction is not a core characteristic of oppositional-defiant and delinquent behavior independent of symptoms of ADHD (Hummer et al., 2011). Interestingly, teacher-report CHAOS subscale scores were not correlated with measures of executive functioning; this finding should be investigated in future research.
CHAOS subscale scores related well with clinically assigned DSM-IV diagnoses. Those who carried a diagnosis of ADHD (regardless of ODD/CD comorbidity) scored higher on CHAOS Attention Problems scores, and those who carried a diagnosis of CD/ODD (regardless of ADHD comorbidity) scored higher on the CHAOS Oppositional Behavior scores. Although the only participants who scored consistently higher on the CHAOS Hyperactivity-Impulsivity subscale were those who carried the diagnosis of comorbid ADHD+CD/ODD, this likely reflects the predominance of the Combined subtype (89%) relative to the Inattentive subtype (11%) of ADHD in that diagnostic grouping. Conversely, in the diagnostic grouping of participants with ADHD but no diagnosis of CD/ODD, two thirds were diagnosed with the Inattentive subtype of ADHD, and only one third had the Combined Type diagnosis.
Almost no differences were found between participants diagnosed with CD/ODD and other diagnostic groups on CHAOS Conduct Problems scores. This finding may have been influenced by the low endorsement of CHAOS Conduct Problems items in the present sample. Symptoms of CD are much more infrequent in outpatient samples than symptoms of ADHD or ODD (Achenbach & Rescorla, 2001; Gadow & Sprafkin, 1997b). The low endorsement rate of CD behaviors in this study may have restricted the range of that subscale and affected results in some cases. For example, the CHAOS Conduct Problems subscale had the lowest internal consistency and lowest mother–father interrater reliability in the sample. Additional research with clinical samples with a higher base rate of CD symptoms is warranted to better understand that subscale.
Our study results demonstrate that a brief measure of ADHD and key related symptoms (of ODD and CD) can provide valid and relevant data for these diagnoses. With only 22 items, the CHAOS scale takes less than 4 min to complete at a rate of 10 s per item, and item sets of 5 (corresponding to the four CHAOS subscales) can be easily summed by the clinician without the need to refer to information about which items are assigned to which subscales. Validity analyses demonstrated that higher CHAOS subscale scores correspond to greater number and severity of symptoms, as well as to diagnostic classification. As a result, the CHAOS can be used in clinical practice to assess ADHD symptom severity. Because ADHD symptoms occur on a severity continuum (i.e., ADHD is not a dichotomous diagnostic entity; Barkley, 2006), CHAOS raw scores can be used to capture the severity of symptoms without reference to fixed cutoff scores.
However, some benchmarks for clinical elevations would enhance the clinical utility of the CHAOS scale, and future research should investigate CHAOS subscale scores in nonreferred populations (e.g., norms) to provide such benchmarks. In the absence of such norms, the current data can be used to provide some guidance about benchmarks for elevated scores. Post hoc analyses of mean CHAOS Attention Problems scores for the 42 participants with the predominantly Inattentive subtype (parent-report: M = 10.5, SD = 3.3; teacher-report: M = 10.2, SD = 3.6) and the 36 participants with Combined subtype (parent-report: M = 11.0, SD = 3.6; teacher-report: M = 10.5, SD = 4.0) suggest that a raw Attention Problem subscale score of approximately 10 is typical of children with ADHD (differences between Inattentive and Combined subtypes were not significant, p > .48 for parent-report and .76 for teacher-report). Approximately, 65% of ADHD-diagnosed participants in the sample (51/78) received a raw score of 10 or higher on the CHAOS Attention Problems scale for parent-report, and 94% (73/78) scored 5 or higher (comparable values for teacher-report were 68% and 91%, respectively). Mean CHAOS Hyperactivity-Impulsivity scores for the 36 participants with the Combined subtype were also approximately 10, with slightly lower scores for teacher-report (parent-report: M = 10.6, SD = 3.9; teacher-report: M = 8.9, SD = 4.2). Approximately, 69% (25/36) of participants with ADHD Combined Type diagnoses received a parent-report Hyperactivity-Impulsivity score of 10 or higher, and 89% (32/36) scored 5 or higher (values for teacher-report were lower at 52% and 81%, respectively). Mean scores on the Attention Problems and Hyperactivity-Impulsivity subscales for clinically referred participants with other DSM-IV diagnoses were significantly lower (typically a raw score of 6 or less; Table 5). Therefore, raw scores of 10 or higher on the CHAOS Attention Problems and Hyperactivity-Impulsivity subscales are typical of children with ADHD, and a score of 10 on these scales is likely to be a rigorous cutoff value. Future research should investigate not only norms but also sensitivity and specificity of various cutoff values.
Several limitations should be kept in mind when interpreting the results of this study. First, study data are based on the chart review of a clinical sample. As a result, little is known about the background of participants in the Subscale Development Sample, and recruitment was not comprehensive or systematic. However, data from the Subscale Development Sample were used only for the PCA to evaluate relationships between items in forming subscales. Because ADHD, ODD, and CD symptoms have been found across a diversity of backgrounds and cultures (Barkley, 2006), it is unlikely that item–subscale relationships would vary systematically based on background information such as age, gender, or socioeconomic status.
Second, the study samples consisted exclusively of children being seen at outpatient ADHD-DBD and psychological testing clinics. As a result, mean scores for the Subscale Development Sample (drawn from an ADHD-DBD specialty clinic) are likely higher than those for samples with other diagnoses (particularly internalizing disorders). Furthermore, diagnoses represented in the Validation Sample were likely affected by the need for psychological testing. As a result, relatively few participants in the Validation Sample had diagnoses of CD/ODD alone, and over half of the sample had learning problems as one of the referral issues.
A third consideration in the interpretation of study results is that diagnoses were clinically assigned, and referral questions were coded based on the judgment of the primary clinician. These methods are less standardized and therefore provide less confidence in objectivity and interrater reliability of diagnosis and referral question. While referral questions were used only to describe the sample, diagnoses were used to validate the CHAOS scale. Furthermore, because CHAOS scale results were available to the clinician at the time that the diagnosis was made, the relationship between CHAOS scores and diagnosis may reflect, in part, the influence of the CHAOS scale in diagnostic decision making. However, this latter effect is unlikely to be large, because the clinician had access to many other psychological testing results, clinical interview, and more established rating scales (including the SI-4) when making the diagnosis. Nevertheless, relationships between CHAOS scale results and diagnosis should be viewed with caution pending further research.
Finally, with the recent revision of the DSM to DSM-V (APA, 2013) and the resulting changes in symptoms for some disorders, the value of some DSM-IV-based symptom scales has been diminished. However, because the behavioral symptoms of ADHD, ODD, and CD are largely unchanged in DSM-V (other than some additional examples for ADHD items and minor rewordings to increase relevance for adults), the CHAOS scale (like other DSM-IV-based ADHD and DBD behavior checklists) will be appropriate for use under the DSM-V system. In addition, because the CHAOS scale incorporates items judged by experts to be the most relevant for the specific diagnoses that they represent, it is more likely that these items will be continue to be relevant for future revisions of the DSM.
Footnotes
Appendix
Acknowledgements
The authors thank Ann Giauque, Jim Rizzo, Pat Brearton, and Chris McDougle for their assistance with this research.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
