Socioemotional and Autism Spectrum Disorder Screening for Toddlers in Early Intervention: Agreement Among Measures

Abstract

Identification of problems with socioemotional functioning is an important task in early childhood, particularly for children in early intervention (EI). However, socioemotional concerns raised by families may be under-identified in practice. In accordance with Division for Early Childhood (DEC) recommended practices, Part C providers could benefit from additional guidance on socioemotional screening and assessment, including additional research on available tools. Therefore, we examined agreement among three commonly used measures of socioemotional functioning in an EI sample (N = 50). Overall, the measures did not have adequate agreement. We found substantial agreement between the Ages and Stages Questionnaires: Social-Emotional (ASQ:SE, first edition) and the Brief Infant Toddler Social-Emotional Assessment (BITSEA), moderate agreement between the ASQ:SE and the Child Behavior Checklist (CBCL), and fair agreement between the BITSEA and CBCL. We also examined their potential to screen for autism spectrum disorder (ASD) by examining agreement with the Modified Checklist for Autism in Toddlers, Revised with Follow-up (M-CHAT-R/F). The BITSEA had substantial agreement with the M-CHAT-R/F, providing initial support for its use as an ASD screener. These findings are preliminary and further study in larger, more diverse samples would be beneficial. Evaluation of the sensitivity and specificity of these tools is also needed.

Keywords

screening socioemotional ASD BITSEA

Identification of difficulties with socioemotional functioning is an important task in early childhood (e.g., Carter, 2010; Zeanah, Bailey, & Berry, 2009). Socioemotional functioning includes socioemotional competence (i.e., the skills and behaviors that are expected in typical development) and the presence of emotion and behavior problems. Emotion and behavior problems encompass externalizing problems (e.g., aggression) and internalizing problems (e.g., anxiety). In children who are developing typically, difficulties with socioemotional functioning are prevalent, with estimates of up to 10% to 15% in 2 to 3-year-olds, and may portend risk to future psychopathology (Carter, Briggs-Gowan, & Davis, 2004; Jones, Greenberg, & Crowley, 2015). Given such high rates, some researchers argue for universal screening of socioemotional functioning (e.g., Briggs et al., 2012; Weitzman & Wegner, 2015). Screening for problems in socioemotional functioning is particularly relevant for young children with developmental delay (DD). Parent and teacher reports indicate that young children with DD experience higher rates of emotion and behavior problems than their peers without identified delays, with as many as 25% of children with DD experiencing emotion and behavior problems in the clinically concerning range (Baker, Blacher, Crnic, & Edelbrock, 2002). Language delays are associated with increased risk of emotion and behavior problems, whereas strong language abilities serve as a protective factor in typical socioemotional development (Hartas, 2011; Rose, Lehrl, Ebert, & Weinert, 2018). These findings suggest that young children with DD are at increased risk for difficulties with socioemotional functioning relative to their peers.

It is also important to evaluate the ability of socioemotional screening tools to be sensitive to the presence of autism spectrum disorder (ASD) in young children. ASD is characterized by poor socioemotional functioning and affects 1 in 59 children (American Psychiatric Association [APA], 2013; Baio et al., 2018). Early screening of ASD is important to facilitate early provision of ASD-specific intervention services and better outcomes for children and families (e.g., Webb, Jones, Kelly, & Dawson, 2014). Screening practices are evolving to include neurophysiological methods that can predict ASD as early as 3 months of age (Bosl, Tager-Flusberg, & Nelson, 2018). Despite advances in screening and diagnostic practices, many families experience significant delays in obtaining diagnoses (Daniels & Mandell, 2014). Among children with ASD, most parents have concerns about their child’s development prior to age 3, yet less than half of children receive a developmental evaluation before 3 years of age (Baio et al., 2018). This results in a delay in services for many families. ASD screening in the early intervention (EI) system may be an important catalyst for increasing timely access to developmental evaluations. Population estimates indicate that around 46% to 64% of children with ASD also have DD, with delays being more prevalent at younger ages (Christensen et al., 2016; Kogan et al., 2009). Because a large proportion of children with DD receive services through the federally funded Part C EI program, Part C providers are theoretically well-positioned to detect both socioemotional concerns and ASD. However, it is also important to prevent increased burden on providers and families. Therefore, a tool that broadly assesses for problems with socioemotional functioning and is sensitive to the presence of ASD could facilitate increased identification of diverse social needs and reduce burden on service providers and families that could arise with multiple measures.

Unfortunately, current evidence suggests that many Part C providers may not be adequately prepared to detect and treat socioemotional concerns. Based on data from the National EI Longitudinal Study (Hebbeler et al., 2007), more than 25% of parents reported at least one socioemotional concern (e.g., child is jumpy and easily startled; aggressive with other children; does not show interest in nearby adults). However, only 4% of children were identified as eligible for EI due in part to a social or behavioral problem. This suggests that many children enrolled in Part C EI experience socioemotional problems, but that these problems may not be identified or are perhaps less salient to providers and families relative to other delays.

There are a number of reasons why socioemotional concerns, including ASD risk, may be overlooked in children enrolled in Part C EI. Federal regulations require that children’s socioemotional development be assessed as a part of eligibility determination for Part C services. The majority of states (85%) recommend use of the Ages and Stages Questionnaire (ASQ-3; Squires & Bricker, 2009), 73% recommend its socioemotional supplement (ASQ:SE; Squires, Bricker, & Twombly, 2002), and 70% of states recommend both for use in screening, with at least seven other measures being recommended in different states, such as the Parents’ Evaluation of Developmental Status (PEDS; Glascoe, 1997). Only 9% of states recommend the Brief Infant Toddler Social and Emotional Assessment (BITSEA; Briggs-Gowan, Carter, Irwin, Wachtel, & Cicchetti, 2004). A substantial number of states (31%) do not recommend any specific tool for socioemotional screening (Cooper & Vick, 2009). Importantly, we do not yet know which of these tools may be most useful for information about socioemotional concerns and ASD within the EI population, as most screening studies have been conducted in samples of children without DD.

In addition to uncertainty about the validity of screening measures, Part C personnel may not have adequate training or expertise on how to interpret results of these measures, and how to respond when children screen positive for socioemotional concerns. In fact, a survey of Part C coordinators from 48 states indicated that only 16 states (34%) required a professional with expertise in socioemotional development to participate on the evaluation team (Cooper & Vick, 2009). This suggests that EI personnel may not feel prepared to adequately assess for socioemotional concerns and ASD. Similarly, EI personnel may also need additional guidance to implement ASD screening. Only 20% of state Part C EI systems reported having guidelines for diagnostic assessment of ASD, and only 27% had guidelines for treatment of ASD (Stahmer & Mandell, 2007). Furthermore, in a review of Part C service coordinators, 50% indicated that ASD screening was not completed in EI settings, and more than 80% identified lack of knowledge as a barrier to screening (Pizur-Barnekow, Muusz, McKenna, O’Connor, & Cutler, 2012).

Taken together, these findings suggest that EI providers may benefit from additional guidance regarding how to most adequately screen young children in the Part C system for socioemotional concerns and ASD. In their policy statement regarding promoting social-emotional well-being in EI services, Cooper and Vick (2009) state that “there is a clear need for state agencies and local communities to have expert advice from the research community on the validity of the instrumentation” (pp. 22-23). Furthermore, the Division for Early Childhood (DEC) lays out several recommended practices for assessment. These recommendations indicate that practitioners need to use “assessment materials and strategies that are appropriate for the child’s age and level of development,” that “practitioners conduct assessments that include all areas of development and behavior,” and that “practitioners use assessment tools with sufficient sensitivity . . .” (p. 8). To meet these recommendations, it may be useful for Part C practitioners to have additional information about socioemotional tools for screening young children with DD.

Guidelines exist for how to best use existing screening tools to ensure the early identification of DD or other concerns (including socioemotional difficulties of ASD). Filipek and colleagues (1999) have succinctly delineated tool use within the progression of surveillance, screening, referral, and diagnosis. They stratify this process into two levels. Level 1 screening tools are intended to identify at-risk children within a general setting, such as a pediatrician’s office. This includes, first, surveillance and assessment with developmental screening tools such as the ASQ-3. If a child fails at this level, they are to be screened for ASD with a tool such as the Modified Checklist for Autism in Toddlers, Revised with Follow-up (M-CHAT-R/F; Robins et al., 2014). Hardy, Haisley, Manning, and Fein (2015) suggest that a two-stage screening process with the ASQ-3 (first) and M-CHAT-R/F (if indicated) may effectively identify children with ASD while minimizing the burden of ASD-specific screening for all. Following this screening process, the child is then referred for evaluation and to Part C services. Level 2 is the subsequent diagnosis and evaluation; tools used during Level 2 provide further diagnostic information about children referred for an evaluation, or children identified as at-risk for ASD or socioemotional difficulties. However, not all children referred to Part C have gone through this recommended screening and diagnostic pathway. As a result, not all of this valuable information that can inform treatment and referral practices has been collected on children referred to Part C. It is important for providers to understand the available socioemotional and ASD screening instruments.

In this study, we examine agreement among three measures that may be appropriate for socioemotional and ASD screening for children with DD: the Ages and Stages Questionnaires: Social-Emotional (ASQ:SE; Squires et al., 2002), the BITSEA (Briggs-Gowan et al., 2004), and the Child Behavior Checklist for ages 1.5 to 5 (CBCL/1.5-5; Achenbach & Rescorla, 2000). In the general pediatric population, the ASQ:SE and BITSEA are considered as Level 1 screening tools, whereas the Child Behavior Checklist (CBCL) often serves as a Level 2 diagnostic assessment tool, providing further information about children’s emotional and behavioral functioning during an evaluation. We selected these measures because both the ASQ:SE and BITSEA are routinely recommended for use in EI settings, and the CBCL is clinically useful in the general population and has considerable research in populations of children with DD. In this study, we examined the utility of these measures to address the screening needs of children in EI by examining the extent to which they agree on identifying children as at-risk for difficulties with socioemotional functioning.

Ages and Stages Questionnaires: Social-Emotional

The ASQ:SE (Squires et al., 2002) is a socioemotional complement to one of the most frequently recommended (e.g., King et al., 2010) screening tools for evaluating early childhood development, the ASQ-3 (Squires & Bricker, 2009). For the purposes of this study, we examined the first edition of the ASQ:SE (Squires et al., 2002). The second edition of the ASQ:SE became available during the course of this study (ASQ:SE-2; Squires, Bricker, & Twombly, 2015), with authors recommending a transition to the second edition by 2017. However, there is no established consensus in clinical practice regarding a timeframe for transitioning to new editions of measures; some even argue that transitions should not occur until there is sufficient time to conduct research on newer measures (Bush, 2010; Bush et al., 2018). In addition, replacing measures is costly and may not always occur according to publisher guidelines.

The ASQ:SE (first edition) is for children between ages 3 and 66 months and was validated on a standardization sample of over 3,000 (Squires, Bricker, Heo, & Twombly, 2001). The authors of the ASQ:SE used stratified sampling to achieve a representative sample of the United States, as well as sex and age ranges represented by the forms. The sample consisted mostly of children who were developing typically, with less than 20% of the sample qualifying for EI or special education services.

The ASQ:SE provides a total score, which reflects levels of both problem behaviors, and deficits in socioemotional competence. The total scores are compared with cut-off scores, which were derived from optimal sensitivity and specificity¹ according to an outcome criterion, including an earlier version of the CBCL (Achenbach, 1992). Although the ASQ:SE is promising as a socioemotional screening tool, its psychometric properties need to be evaluated further in populations of children with DD. The ASQ:SE also includes items that are intended to identify “red flags” for ASD, and there is preliminary evidence of the validity of the ASQ:SE as an ASD screening tool for preschool children (Alkherainej & Squires, 2016). The validity of its use as an ASD screener in early childhood has not yet been examined, yet it is still widely used as an ASD screening tool. In a survey of Part C program coordinators, the ASQ:SE was the most commonly reported screening tool for ASD, with 83% of coordinators reporting its use in their state or jurisdiction (Shaw & Hatton, 2009). Given the widespread use of the ASQ:SE for ASD screening, further research on its utility as an ASD screening tool is also needed.

Brief Infant-Toddler Social and Emotional Assessment

The Brief Infant-Toddler Social and Emotional Assessment (BITSEA) is a screening tool that assesses problem behaviors and delays in socioemotional competence in young children between the ages of 12 and 36 months (Briggs-Gowan & Carter, 2006; Briggs-Gowan et al., 2004). The BITSEA is comprised of two scales—a Problem scale and a Competence scale (Briggs-Gowan & Carter, 2006). In validity analyses, the Problem scale was positively correlated with the ASQ:SE (Squires et al., 2002; r = .55) and the Competence scale negatively correlated with the ASQ:SE (r = –.55). Similarly, the BITSEA Problem scale was positively correlated with the CBCL Total Problem score (Achenbach & Rescorla, 2000; r = .60).

The BITSEA was standardized on a general pediatric sample of 1,237 children recruited from a random sample of children born at Yale New Haven Hospital (Briggs-Gowan et al., 2004). Children were excluded if they had DD or disabilities from the original study. Sensitivity and specificity analyses were conducted with the CBCL/1.5-5, among other outcome criteria (Achenbach & Rescorla, 2000). Compared with subclinical or clinical elevations on the CBCL, sensitivity of the BITSEA (both scales combined) is .85 and specificity is .75. Psychometric properties of the BITSEA have also been examined among 11- to 36-month-old children who were referred for evaluation through birth to three EI programs (Briggs-Gowan & Carter, 2007). Children with ASD were not included in this sample. In this sample, the BITSEA problem scale was correlated with the CBCL Internalizing, Externalizing, and Total scales (rs = .63-.75; Briggs-Gowan & Carter, 2007). In contrast, the Competence scale was negatively correlated with the CBCL scales (rs = –.39 to –.42). Sensitivity and specificity values in this sample were not reported.

In addition to the Problem and Competence scales, it is possible to calculate an Autism score for the BITSEA (e.g., Kruizinga et al., 2014). This score combines 17 autism-specific items from the Competence and Problem scales, to assess risk for ASD. The Autism score is promising, but may not have better screening accuracy than the Competence scale (Kruizinga et al., 2014). In a comparison of a community sample with an ASD sample, a cut-off score of ≥9 for males had a sensitivity of .88 and specificity of .89, and a cut-off score of ≥8 for females had a sensitivity of .97 and specificity of .89. A cut-off score of ≤15 on the Competence scale resulted in a sensitivity of .82 and specificity of .88 for males, and a sensitivity of .88 and specificity of .94 for females.

Child Behavior Checklist

The CBCL/1.5-5 is a widely researched instrument for evaluating emotion and behavior problems in children (Achenbach & Rescorla, 2000). The CBCL was standardized on 700 children between ages 18 months and 5 years, from a representative sample around the United States. Children were excluded if they had significant disabilities or qualified for mental health or special education services. The CBCL has seven syndrome scales, derived from factor analysis. These include Emotionally Reactive, Anxious/Depressed, Somatic Complaints, Withdrawn, Sleep Problems, Attention Problems, and Aggressive Behavior. There are also five scales that overlap with the Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR; APA, 2000) criteria,² including a Pervasive Developmental Problems scale (APA, 2000). Finally, the CBCL includes three broadband scales: Internalizing Problems, Externalizing Problems, and Total Problems. The CBCL has evidence of reliability and validity, and its psychometric properties have been studied extensively in various populations, including children who are typically developing, children from community mental health settings, and internationally (e.g., Achenbach & Rescorla, 2000; Ivanova et al., 2010; Rescorla, 2005; Rishel, Greeno, Marcus, Shear, & Anderson, 2005). Internal consistencies of the CBCL broadband scales range from α =.89 to α =.95 (Achenbach & Rescorla, 2000). Test–retest reliability of the CBCL Total Problems scale is r = .90, with a mean value of r = .85 across all subscales (Rescorla, 2005). CBCL problem scale scores are also significantly higher in clinically referred young children than non-referred young children. In addition, the factor structure of the CBCL was confirmed internationally and found to have acceptable to good fit (e.g., root mean square error of approximation, RMSEA = .036-.059) in all 23 countries included in the study (Ivanova et al., 2010).

The CBCL is typically used to gather more in-depth information about behavior concerns during an evaluation and may also be used as part of the screening process. The CBCL has also been proposed as a screening tool for ASD; in particular, the Pervasive Developmental Problems scale and Withdrawn scale may be most appropriate for identifying children with ASD (e.g., Rescorla et al., 2019). Findings from a meta-analysis show that clinically significant (t ≥ 65) scores on the Withdrawn scale have an average sensitivity of .81 and average specificity of .87, and clinically significant scores on the Pervasive Developmental Problems scale have an average sensitivity of .89 and average specificity of .81 in identifying ASD in settings of children referred for services (Hampton & Strand, 2015).

The Current Study

A first step to assisting practitioners in conducting more effective socioemotional and ASD screening for children in Part C EI is to better understand which tools may be most useful within this population. The purpose of the current study was to compare three commonly used socioemotional measures in an EI population. We had two research questions:

To what extent do socioemotional measures agree with one another in classifying children as “at-risk” for socioemotional problems?

To what extent are socioemotional tools useful for ASD screening?

We expected that there would be moderate-to-strong agreement between all measures, with the highest agreement between the ASQ:SE and BITSEA as they both measure delays in socioemotional competence. In addition, although these measures were not developed specifically to screen for ASD, they are all proposed to be able to screen for ASD. If tools can serve multiple functions (i.e., screen for socioemotional problems and ASD), this can prevent increased burden on families and providers. Understanding to what extent these measures are measuring the same constructs will help to guide practitioners and researchers as they are selecting tools.

Method

Participants

Participants included 50 caregivers of children with DD from one center-based EI site (n = 37), five other EI sites, and one children’s hospital in a Midwestern state (n = 13). Caregivers participated in this study if they met the following inclusion criteria: (a) their child was 18 to 36 months old, (b) their child qualified for EI services, and (c) the caregiver was a native English speaker.

All children qualified for EI services. In our state, eligibility criteria for EI services are as follows: (a) diagnosed condition with a high likelihood of DD or disability or (b) documented delay of at least 1.5 standard deviations below the mean in at least one developmental domain on an approved evaluation tool (further information about specific tools is detailed in testing procedures below).

All families received a questionnaire packet that was sent home. At the center-based program, these questionnaires are administered as part of routine screening for all children twice per year (fall and spring). Center-based participants were included in this study if they had consented to participation in the center’s research database (separate from this study), met inclusion criteria for this project, and had returned completed measures. During the study period, 120 children were enrolled in the center. Of these children, 67 met study inclusion criteria. Of the children who met study inclusion criteria, 43 families had consented to participate in the research database, and 37 of those consented returned packets. Families were also recruited from 14 EI programs across the state, including the local children’s hospital, through the distribution of flyers and word of mouth from staff. From these recruitment efforts, 22 families called in and consented to the project. Of these families, 14 families returned packets, but only 13 of these families still met the age inclusion at the time that the packet was returned. The institutional review board of the university approved all recruitment and study procedures.

Procedures

Caregivers completed a questionnaire packet on their child’s behavior. Questionnaire order for all children was counter-balanced to minimize order effects. However, families completed packets independently and may have completed them out of order. All measures were paper and pencil measures where caregivers selected appropriate numerical responses. The primary author on the study scored questionnaires for all participants based on measure guidelines provided below. All questionnaire data were subjected to a two-pass verification and were double-entered by multiple study team members. Any discrepancies in scoring or data entry were resolved. For children in the center-based program (n = 37), demographic information was collected from a review of educational records, including enrollment paperwork and most recent testing scores. For children recruited from sites around the state (n = 13), caregivers completed a demographic form and provided their child’s testing results. Results of developmental testing completed by EI staff at Part C entry, or reevaluation, were available for 45 of the children in the sample. Four additional participants provided results from developmental testing through other evaluations. Most of the testing (n = 46) was conducted with the Battelle Developmental Inventory, Second Edition (Newborg, 2005). The remaining testing was conducted with the Bayley Scales of Infant and Toddler Development, Third Edition (n = 1) and with the Mullen Scales of Early Learning (n = 2; Bayley, 2006; Mullen, 1995). Delays on all measures were defined as z-scores of −1.5 or lower, consistent with state guidelines. If scores were provided in scaled or standard scores, they were converted to z-scores to facilitate comparison. Demographic information for children is presented in Table 1.

Table 1.

Child Demographic Information.

	M (SD)	N (%)
Age in months	29.44 (3.89)
Sex (% male)^a
Total sample		38 of 50 (76%)
21-26 months		7 of 13 (54%)
27-32 months		19 of 25 (76%)
33-41 months		12 of 12 (100%)
Race
White		36 (72%)
African American		3 (6%)
Hispanic or Latino		2 (4%)
Multiracial		7 (14%)
Other		1 (2%)
Unreported		1 (2%)
At-risk on M-CHAT-R/F		22 (44%)
Eligibility domain^b
Adaptive		16 (32%)
Cognitive		19 (38%)
Communication		41 (82%)
Motor		18 (36%)
Personal-social		8 (16%)
Diagnosis		2 (4%)
Developmental testing: z-scores
Adaptive	−1.13 (0.86)	n = 48
Cognitive	−1.02 (1.05)	n = 48
Receptive communication	−1.82 (1.26)	n = 49
Expressive communication	−2.05 (0.94)	n = 49
Gross motor	−0.64 (1.13)	n = 47
Fine motor	−0.99 (1.14)	n = 49
Personal-social	−0.88 (0.91)	n = 47

Note. n values reflect the number of children who had data for each developmental domain. M-CHAT-R/F = Modified Checklist for Autism in Toddlers, Revised with Follow-up; ASQ:SE = Ages and Stages Questionnaires: Social-Emotional.

Sex presented by age ranges of ASQ:SE forms.

Percentages for eligibility domain add up to more than 100% as children may be eligible on multiple domains.

Measures

ASQ:SE

The ASQ:SE has separate forms that span eight age intervals (Squires et al., 2002). Parents are asked to check the box that best describes their child’s behavior on each item. Sample items include the following: “Does your child like to hear stories and sing songs?” “Does your child cry, scream, or have tantrums for long periods of time?” ASQ:SE items include three anchors (0 = most of the time, 5 = sometimes, 10 = rarely or never; when appropriate items are reverse scored). There is an additional option to check if the item is a concern (5 points = yes, 0 points = no). Scores are provided as totals. Due to the age range of our study participants, we included three age interval forms: the 24-month questionnaire (26 items; cut-off score of 50), the 30-month questionnaire (29 items; cut-off score of 57), and the 36-month questionnaire (31 items; cut-off score of 59). For the ASQ:SE, children with scores falling above the cut-off are considered to be at-risk. Item counts did not include open-ended items (e.g., “what things do you enjoy most about your child?”), which were excluded from analyses. Although our inclusion age range began at 18 months, none of the children in this study were young enough for the 18-month questionnaire. With the cut-off scores provided by instrument authors, the sensitivity and specificity of the ASQ:SE, compared with a criterion instrument such as the CBCL (Achenbach, 1992), are as follows: 24 months (sensitivity =.71, specificity =.93), 30 months (sensitivity =.80, specificity =.90), and 36 months (sensitivity =.78, specificity = .93).

BITSEA

The BITSEA consists of 42 items and requires about 5 to 7 min to complete (Briggs-Gowan et al., 2004). Parents are asked to select one response that best describes their child’s behavior in the last month on each item. Sample items include the following: “Looks for you (or other parent) when upset.” “Has trouble adjusting to changes.” There are three anchors per item (0 = not true, 1 = sometimes or somewhat true, and 2 = very true or often true). Cut-off scores differ based on the age and sex of children and range from 12 to 14 for girls and 13 to 15 for boys on the Problem scale, and 12 to 16 for girls and 12 to 14 for boys on the Competence scale. Children are considered at-risk on the BITSEA if they meet or are above the Problem scale cut-off, or if they meet or are below the Competence scale cut-off. Cut-off scores for the Problem scale identify children above the 75th percentile, whereas cut-off scores for the Competence scale identify the bottom 24th percentile as at-risk. Lower scores on the Competence scale indicate less competence and greater risk of socioemotional delay. There are two additional questions that are not included in the scores for the scales that ask about the parents’ level of worry. Within the 42 items, there are 17 items that may detect ASD—8 competence items and 9 problem items (Kruizinga et al., 2014). The Autism score can be calculated by reverse-scoring ASD Competence items and summing the total with ASD Problem items for a total Autism score. We used the gender-specific cut-offs for ASD total scores recommended by Kruizinga et al. (≥8 for females and ≥9 for males).

CBCL/1.5-5

The CBCL/1.5-5 is a 99-item problem behavior checklist for young children between ages 18 months and 5 years 11 months (Achenbach & Rescorla, 2000). For each item, parents are asked to select the response that describes their child now or within the past 2 months. Sample items include the following: “Afraid to try new things.” “Has trouble getting to sleep.” Response options for each item are the same as on the BITSEA (0 = not true, 1 = sometimes or somewhat true, and 2 = very true or often true). There is an additional item (Item 100), which provides space to write in additional problem behaviors. For this item, only the highest score is included in the total score. For example, if the parent lists five problem behaviors and all are rated as a “2,” only one score of “2” is included in the total score. Scores on the syndrome scales, DSM-oriented scales, and on the composite Internalizing, Externalizing, and Total Problems scales are provided in terms of t-score distributions (populations M = 50, SD = 10). For the syndrome (e.g., Withdrawn) and DSM-oriented (e.g., Pervasive Developmental Problems) scales, the borderline clinical range is set at t = 65-69 (which corresponds to the 93rd-97th percentiles), and the clinical range is set at t ≥ 70. For the composite scales, the borderline clinical range is set at t = 60-63 (which corresponds to the 83rd-90th percentiles), and the clinical range is set at t ≥ 64. The authors of the CBCL recommend including the borderline clinical range in dichotomous analyses, which would result in a score of t ≥ 60 indicating risk on a composite scale and a score of t ≥ 65 indicating risk on a syndrome or DSM-oriented scale (Achenbach & Rescorla, 2000).

M-CHAT-R/F

The M-CHAT-R/F is a parent-completed checklist of ASD symptoms for children between ages 16 and 30 months (Robins et al., 2014). The M-CHAT-R/F consists of 20 items with response options of “yes” or “no.” Parents are asked to respond based on how their child usually behaves. Sample items include the following: “If you point at something across the room, does your child look at it?” “When you smile at your child, does he or she smile back at you?” Items are scored as either pass or fail. If a child fails 0 to 2 items, he or she is classified as low risk for ASD. If a child fails between 3 and 7 items, he or she is classified as medium risk for ASD. If a child fails 8 or more items, he or she is classified as high risk for ASD. If a child is classified as medium risk, the follow-up interview is administered. The follow-up interview is a structured set of clarification questions and examples with a pass/fail decision tree. Only items that were initially failed (between 3 and 7 items) receive follow-up clarification. Following clarification of these items, if the child’s total score remains at 2 failed items or higher, they are considered to be at-risk for ASD. Consistent with these guidelines, we administered the follow-up interview for every case in which the child was classified as medium risk (n = 18). The primary author conducted all follow-up interviews. Scores described for children in this category (medium risk) are the final scores following the clarifying interview. With the follow-up interview, the M-CHAT-R/F has evidence of good sensitivity (.94) and acceptable specificity (.83) in general pediatric settings (Barton, Dumont-Mathieu, & Fein, 2012). We considered the M-CHAT-R/F to be a gold-standard screening measure for ASD due to the relatively wide body of literature on the measure and its strong psychometric properties in general pediatric settings. However, it is important to provide a caveat that less is known about the psychometric properties of the M-CHAT-R/F in samples of children with DD. There is some research to suggest that its psychometric properties are not as strong in these samples, as screening positive may reflect other developmental problems, such as motor or cognitive concerns (Kim et al., 2016; Weitlauf, Vehorn, Stone, Fein, & Warren, 2015). Nonetheless, we include it here as a strong ASD screening tool for this age range.

Analyses

To calculate agreement, we classified scores on each measure as at-risk or not at-risk. For the ASQ:SE and BITSEA, we used the cut-off scores recommended by the authors of the measure (Briggs-Gowan & Carter, 2006; Squires et al., 2002). Children were considered to be at risk on the BITSEA if they scored in the at-risk range on either the Problem or Competence scale. Consistent with the recommendations of the CBCL authors for dichotomous analyses, children were considered to be at risk on the CBCL if they were elevated in either the borderline or clinical range for any of the composite scales: Internalizing, Externalizing, or Total Problems. We decided to calculate agreement in this manner, by collapsing across scales, to mirror what would most likely occur in a community setting—that a child would be considered at-risk if they were at-risk on either scale of the BITSEA or on any of the composite scales of the CBCL. We felt this would best allow us to answer our primary question of whether these questionnaires agree in terms of identifying children as at-risk. We use the term at-risk to reflect that children are classified as at-risk for problems with socioemotional functioning according to the measure. Thus, we examined agreement in how the measures classify children.

We also conducted exploratory analyses to examine two possible sources of measurement disagreement. We explored agreement with the BITSEA Problem scale and CBCL composite scales separately because we hypothesized that the BITSEA’s inclusion of socioemotional competence may lead to disagreement. Measurement agreement may also be confounded by different percentile cut-offs across measures; therefore, we conducted exploratory analyses with adjusted percentiles with the BITSEA and CBCL, by setting percentiles to be equal to each other. We were not able to conduct these analyses with the ASQ:SE as the authors do not provide percentile information for their measure, and as the ASQ:SE provides one total score that subsumes socioemotional problems and competence.

For the ASD analyses, we calculated agreement between our “gold-standard” ASD screening tool (the M-CHAT-R/F) and the following possible ASD screening scales: The ASQ:SE total score, the BITSEA Competence scale, the BITSEA Autism score, and the CBCL Withdrawn and Pervasive Developmental Problems scales. We classified children as at-risk or not at-risk based on their final score on the M-CHAT-R/F, after follow-up interview, if they were medium risk (as outlined above). We did not examine agreement among all proposed ASD screening scales because, unlike in socioemotional screening, there is already a well-established measure of ASD screening. As with our socioemotional analyses, we calculated agreement with dichotomous risk classification.

We calculated measurement agreement with Cohen’s κ and determined that a sample size of n = 25 participants would be needed to detect a moderate agreement (κ = .5), with 80% power (Sim & Wright, 2005). One participant did not return the CBCL and another was out of the age range of the BITSEA. Consequently, κ values for the CBCL and BITSEA were calculated based on 48 participants and κ values for agreement with either the BITSEA or the CBCL were calculated based on 49 participants. Calculation of the remaining κ value, for the ASQ:SE and M-CHAT-R/F, included all 50 participants. Since the M-CHAT-R/F is only validated up to 30 months, we also completed exploratory analyses in an age-restricted subsample. Therefore, age-restricted analyses between the M-CHAT-R/F and the CBCL included 28 participants and remaining analyses, between the M-CHAT-R/F and ASQ:SE or BITSEA, included 29 participants. We followed guidelines set forth by Landis and Koch (1977) for interpreting κ values. According to these guidelines, κ = .01-.20 is slight agreement, κ = .21-.40 is fair agreement, κ = .41-.60 is moderate agreement, κ = .61-.80 is substantial agreement, and κ = .81-1.0 is almost perfect. For descriptive purposes, we also provide non-chance corrected absolute agreement (i.e., percent agreement).

In addition to Cohen’s κ, we calculated agreement with Krippendorff’s α. In a review of reliability statistics, Krippendorff’s α was recommended as a standard measure (Hayes & Krippendorff, 2007). Krippendorff’s α provides calculations of agreement between multiple measures simultaneously. Since it is not strongly affected by missing data, data from all 50 participants were used to calculate α. Values of at least .667 may be used to draw tentative conclusions and values of .800 or higher are considered sufficient agreement (Krippendorff, 2004). For binary comparisons, Krippendorff’s α values were similar to Cohen’s κ values; therefore in this article, we report only κ values. However, a limitation of κ is that agreement cannot be calculated among all three measures; for this purpose, we report α. Finally, in addition to dichotomous agreement, we calculated correlations among the measures based on continuous raw scores.

We conducted all analyses in SPSS and used an SPSS macro to calculate Krippendorff’s α (Hayes & Krippendorff, 2007). Study data were collected and managed using REDCap (Research Electronic Data Capture) electronic data capture tools hosted at our university (Harris et al., 2009). REDCap is a secure, web-based application designed to support data capture for research studies, providing (a) an intuitive interface for validated data entry; (b) audit trails for tracking data manipulation and export procedures; (c) automated export procedures for seamless data downloads to common statistical packages; and (d) procedures for importing data from external sources.

Results

Descriptive Statistics

Descriptive statistics for all measures are reported in Table 2. The ASQ:SE and the BITSEA identified 58% to 61% of the sample as at-risk for problems with socioemotional functioning, whereas the CBCL identified 37% of the sample as at-risk. A visual depiction of how children were classified according to each measure is provided in Figure 1. This figure includes the 48 participants who completed all three primary measures. Overall, 70% of children (n = 35) were identified as at-risk on at least one of the three measures. In addition, 44% of the children in this sample were at-risk for ASD on the M-CHAT-R/F. A visual depiction of the overlap among the tools for ASD screening is shown in Figure 2.

Table 2.

Descriptive Statistics for Measures.

	Completed measures	Total scoresM (SD)	Sample range	Cut-off score for concern	Sample classified at-risk (%)
ASQ:SE	50	86.66 (64.72)	0-315		58
24 months	13	37.69 (23.68)	0-80	50	23
30 months	25	99.60 (64.26)	5-265	57	64
36 months	12	112.75 (71.29)	35-315	59	83
BITSEA	49				61
Problem		12.53 (8.12)	1-40	12-15	33
Competence		14.04 (4.51)	4-22	12-16	51
Autism score		8.08 (5.70)	1-25	8-9	43
CBCL	49				37
PDP		61.47 (12.26)	50-91	t = 65	39
Withdrawn		59.63 (11.49)	50-100	t = 65	29
Internalizing		50.80 (12.89)	29-76	t = 60	27
Externalizing		53.08 (11.54)	28-76	t = 60	31
Total		53.49 (12.57)	29-79	t = 60	29

Note. Scores for ASQ:SE and BITSEA presented as raw total scores. Scores for CBCL presented as t-scores. For the Withdrawn and Pervasive Developmental Problems scales, scores are truncated, so the lowest possible score is t = 50. BITSEA cut-off scores vary by age and sex; here, we list the full range of cut-off scores. Scores above cut-off indicate concern with two exceptions: (a) for the BITSEA Competence score, scores below the cut-off indicate concern and (b) for the BITSEA Autism score, scores at or above cut-off indicate concern. ASQ:SE = Ages and Stages Questionnaires: Social-Emotional; BITSEA = Brief Infant-Toddler Social and Emotional Assessment; CBCL = Child Behavior Checklist; PDP = Pervasive Developmental Problems scale.

Socioemotional Agreement

All measures were significantly correlated (see Table 3). Agreement statistics are presented in Table 4. Absolute percentage agreement (AA) between the ASQ:SE and the BITSEA was 84%, between the ASQ:SE and the CBCL was 69%, and between the BITSEA and the CBCL was 65%. According to Cohen’s κ, there was a substantial agreement between the ASQ:SE and the BITSEA (κ = .66, p < .001) and moderate agreement between the ASQ:SE and the CBCL (κ = .42, p = .001). Agreement between the BITSEA and the CBCL was only fair (κ = .34, p = .006). Overall, Krippendorff’s α for all three measures (calculated with 10,000 bootstrap samples) was below adequate (α = .46, 95% CI = [.28, .62]).

Table 3.

Pearson’s Product-Moment Correlations Between Measures.

	ASQ:SE	BITSEA-P	BITSEA-C	CBCL-I	CBCL-E	CBCL-T
ASQ:SE	1.00
BITSEA-P	.77**	1.00
BITSEA-C	−.71**	−.57**	1.00
CBCL-I	.77**	.83**	−.60**	1.00
CBCL-E	.68**	.70**	−.47**	.71**	1.00
CBCL-T	.78**	.83**	−.57**	.91**	.91**	1.00

Note. ASQ:SE = Ages and Stages Questionnaires: Social-Emotional; BITSEA = Brief Infant-Toddler Social and Emotional Assessment; BITSEA-P = BITSEA Problem scale; BITSEA-C = BITSEA Competence scale; CBCL = Child Behavior Checklist; CBCL-I = CBCL Internalizing; CBCL-E = CBCL Externalizing; CBCL-T = CBCL Total.

All correlations are significant at the p < .01 level.

Table 4.

Agreement Values.

	Absolute agreement	Cohen’s κ	Krippendorff’s α [95% CI]
ASQ:SE and BITSEA	.84	.66
ASQ:SE and CBCL	.69	.42
BITSEA and CBCL	.65	.34
All measures	.56	—	.46 [.28, .62]

Note. ASQ:SE = Ages and Stages Questionnaires: Social-Emotional; BITSEA = Brief Infant-Toddler Social and Emotional Assessment; CBCL = Child Behavior Checklist.

For our exploratory analyses (completed with the full sample), we examined agreement between the CBCL and the BITSEA Problem scale. Agreement was substantial for κ (AA = 85%, κ = .68, p < .001). We also examined agreement with adjusted percentiles. We adjusted CBCL risk classification to identify children who were in the 75th percentile or higher on any composite scale (Internalizing, Externalizing, Total Problems) and compared CBCL elevations with elevations on either scale of the BITSEA. Agreement was in the moderate range for κ (AA = 73%, κ = .46, p = .001). Results were similar when we adjusted the BITSEA cut-off to align with the CBCL percentiles; agreement was barely in the moderate range (AA = 71%; κ = .42, p = .002).

ASD Agreement

Results from ASD analyses are provided in Table 5. All scales were significantly correlated with the M-CHAT-R/F. The M-CHAT-R/F had moderate agreement with the ASQ:SE (AA = 78%, κ = .57, p < .001) and substantial agreement with the BITSEA Competence scale (AA = 84%, κ = .67, p < .001) and BITSEA Autism score (AA = 84%; κ = .67, p < .001). Agreement with the CBCL Withdrawn scale (AA = 67%, κ = .32, p = .018) and Pervasive Developmental Problems scale was fair (AA = 69%; κ = .37, p = .008).

Table 5.

Agreement Values With the M-CHAT-R/F.

	Absolute agreement^a	Cohen’s κ^a	r ^a	Absolute agreement^b	Cohen’s κ^b	r ^b
	Full sample	Full sample	Full sample	≤30 months	≤30 months	≤30 months
ASQ:SE	.78	.57	.69**	.86	.72	.62**
BITSEA-C	.84	.67	−.77**	.97	.93	−.71**
BITSEA Autism	.84	.67	.82**	.90	.78	.81**
CBCL Withdrawn	.67	.32	.75**	.71	.38	.72**
CBCL PDP	.69	.37	.58**	.71	.39	.51**

Note. All agreement values measure agreement among the stated measure and the M-CHAT-R/F. r is Pearson’s product-moment correlations between measure and M-CHAT-R/F. ASQ:SE = Ages and Stages Questionnaires: Social-Emotional; BITSEA-C = Brief Infant-Toddler Social and Emotional Assessment Competence scale; BITSEA Autism = BITSEA Autism score; CBCL Withdrawn = Child Behavior Checklist Withdrawn scale; CBCL PDP = CBCL Pervasive Developmental Problems scale; M-CHAT-R/F = Modified Checklist for Autism in Toddlers, Revised with Follow-up.

Data calculated for full sample.

Data calculated for subsample of participants 30 months and younger.

All correlations are significant at the p < .01 level.

Figure 1.

Children classified as at-risk by measure.

Figure 2.

Children classified as at-risk for autism spectrum disorder in comparison with the M-CHAT-R/F.

All agreement values were higher in a subsample of children 30 months and younger (see Table 5). Agreement values with the CBCL were still fair: Withdrawn scale (AA = 71%, κ = .38, p = .024) and Pervasive Developmental Problems scale (AA = 71%, κ = .39, p = .030). In this age range, the ASQ:SE had substantial agreement with the M-CHAT-R/F (AA = 86%, κ = .72, p < .001), as did the BITSEA Autism score (AA = 90%, κ = .78, p < .001). The BITSEA Competence scale had almost perfect agreement with the M-CHAT-R/F (AA = 97%, κ = .93, p < .001).

Discussion

The primary objective of this study was to examine the agreement among three socioemotional tools: the ASQ:SE, the BITSEA, and the CBCL. We examined agreement in a sample of toddlers with DD; a population that is at increased risk of problems with socioemotional functioning (e.g., Baker et al., 2002). In this sample, the BITSEA and CBCL had fair agreement, the ASQ:SE and CBCL had moderate agreement, and the ASQ:SE and BITSEA had substantial agreement. When all three measures were compared together, they did not have adequate agreement. These preliminary results suggest that these instruments differ in terms of which children are classified as at-risk for problems with socioemotional functioning.

The higher classification rates of the ASQ:SE and BITSEA appear to result from the fact that the ASQ:SE and BITSEA assess for deficits in socioemotional competence, especially considering that agreement between the CBCL and the BITSEA was substantial when only the Problem scale was compared. It appears that if the CBCL identifies a child as at-risk, she or he is also likely to be at-risk on the ASQ:SE and/or BITSEA. Closer examination of the data supports this assertion and shows that almost every child identified by the CBCL was also identified by the ASQ:SE and BITSEA (see Figure 1). Of the 17 children identified as at-risk on the CBCL, 14 were at risk on both the ASQ:SE and the BITSEA and an additional 2 were at-risk on either the ASQ:SE or the BITSEA. However, there were 17 children who were identified on the ASQ:SE, BITSEA, or both who were not classified as at-risk on the CBCL. Higher identification rates of the ASQ:SE and BITSEA have two possible interpretations. First, it is possible that the CBCL does not adequately identify children with delays who are in need of further evaluation and services. However, it is also possible that the ASQ:SE and BITSEA over-identify children with DD, and that elevations on these measures can be attributed to overall delays in other areas of development, rather than additional problems with socioemotional functioning. For example, items such as “does your child let you know how she is feeling with gestures or words” may reflect language delays (ASQ:SE; Squires et al., 2002). Overall, these preliminary findings suggest that the CBCL may be measuring a different construct than the ASQ:SE and BITSEA (i.e., solely emotion and behavior problems) and may not be comparable to them as a comprehensive socioemotional screening tool that also assesses for risk in socioemotional competence. Although we are not able to determine solely based on this study whether the CBCL is under-identifying children or the ASQ:SE and BITSEA are over-identifying children, it is worth considering whether the inclusion of items that frame behaviors in a positive way, such as competence items, may be beneficial to parents as they are completing these measures. This would be consistent with a strengths-based approach that seeks to capitalize on children’s strengths and not only identify areas of weakness.

Some of our results differed from those in previous research on these measures. Absolute agreement values were lower than previously reported results from the authors of the ASQ:SE (Squires et al., 2001), suggesting that agreement between the ASQ:SE and the CBCL may be lower in an EI population. In our study, chance-corrected agreement between the ASQ:SE and the CBCL was only fair, but still higher than agreement between the CBCL and the BITSEA. This may be explained by differences in how the cut-off scores were derived. The creators of the ASQ:SE derived cut-off scores that maximize sensitivity and specificity with a criterion, including the CBCL (Yovanoff & Squires, 2006), whereas the authors of the BITSEA did not (Briggs-Gowan et al., 2004). Therefore, it is not surprising that the CBCL had stronger agreement with the ASQ:SE than the BITSEA.

All of the primary measures correlated significantly with each other, which is consistent with prior research (Briggs-Gowan & Carter, 2007; Briggs-Gowan et al., 2004). Correlations between the ASQ:SE and BITSEA were higher in this study than were previously reported in a sample of children with typical development (Briggs-Gowan & Carter, 2006). However, correlations between the BITSEA and the CBCL were similar to reported values in an EI sample (Briggs-Gowan & Carter, 2007). Lower correlations between CBCL scales and the BITSEA Competence scale, compared with the Problem scale, were consistent with lower agreement among the measures.

The findings from this study revealed lower agreement than previous findings from de Wolff, Theunissen, Vogeis, and Reijneveld (2013) in a general pediatric population in the Netherlands. At 24 months, they found moderate agreement among Dutch translated versions of these measures, with higher agreement between the BITSEA and the CBCL than between the ASQ:SE and the CBCL. They did not report agreement between the ASQ:SE and the BITSEA. We found lower agreement among these measures, as well as the opposite pattern—the BITSEA and CBCL had lower agreement than the ASQ:SE and CBCL. There were notable differences between our method and the method in their study. Unlike our study, they operationalized risk on the CBCL as a clinically significant elevation (90th percentile or higher) on the Total Problems scale. In this study, we included the borderline range and included elevations on the Internalizing and Externalizing composite scales. Furthermore, since there were no cut-off scores available for Dutch versions of measures, de Wolff et al. derived cut-off scores that maximized specificity with the CBCL (i.e., specificity of ≥ .9).³ Thus, it would be expected that they would find higher agreement with the CBCL than we would, given that we used cut-off scores provided by measure developers; although, as mentioned previously, the original ASQ:SE cut-off scores were derived with the CBCL as one outcome criterion. Finally, their sample characteristics differ from ours in the following ways: a slightly younger sample, a general pediatric sample, and a Dutch sample. Discrepancies in our findings highlight the importance of evaluating agreement among these measures in various populations because findings may not necessarily generalize between populations, methods, or versions of measures.

The secondary aim of this study was to examine the agreement of the socioemotional tools for the identification of ASD. The ASQ:SE is often interpreted in EI settings as an ASD screening tool (Shaw & Hatton, 2009), and the BITSEA is promising as an ASD screening tool (e.g., Kruizinga et al., 2014). The CBCL Withdrawn and Pervasive Developmental Problem scales have also been suggested for use in ASD screening (Hampton & Strand, 2015). If these broadband measures that identify risk for general problems with socioemotional functioning can also screen for ASD, this could eliminate the need to use an additional autism-specific screening tool and reduce burden on caregivers and health care providers.

In this study, agreement between the M-CHAT-R/F and the CBCL scales was only fair. These results are consistent with a recent study examining a version of the CBCL for older children that demonstrated inadequate sensitivity and specificity in detecting ASD in an outpatient behavioral health setting (Hoffman, Weber, König, Becker, & Kamp-Becker, 2016). It appears that measurement disagreement stems primarily from the CBCL under-identifying children relative to the M-CHAT-R/F (see Figure 2). With the ASQ:SE, agreement was moderate in the full sample, and substantial in children up to 30 months. Agreement was highest with the BITSEA scales. Both the BITSEA Autism score and Competence scale had substantial agreement with the M-CHAT-R/F in the full sample, but the Competence scale had higher agreement in the subsample of younger children. It is still unclear whether the Autism score adds substantial value to the Competence scale, which is important for providers to know given that calculation of the Autism score takes additional time during the screening process. These findings again highlight that measures may perform differently in different samples, whether due to age, gender (all of the females in this sample were in the younger group), or the presence of DD. It is important to be aware of this when thinking about the generalizability of current research on these measures.

These findings provide some initial support for the use of the ASQ:SE and BITSEA as ASD screening tools and suggest that the BITSEA may be a strong broadband socioemotional screening tool for identifying ASD in EI. Clinically, EI providers may find the BITSEA to be particularly useful because it categorizes whether the child is at-risk due to behavior problems, competence, or both. However, while these findings are promising, agreement with another screening tool does not provide sufficient information about the screening accuracy of a measure. Further research should examine the sensitivity and specificity of measures by comparing them with full diagnostic evaluations of ASD. Furthermore, the BITSEA Autism score is continuing to undergo further research and revisions. The authors of the BITSEA recently examined preliminary cut-off scores for maximizing sensitivity and specificity with pre-existing ASD diagnoses (Kiss, Feldman, Sheldrick, & Carter, 2017). They demonstrated evidence of good preliminary sensitivity and adequate specificity, but found that lower specificity values related to language abilities, highlighting the need for continued research and possible adjustments to cut-off scores specifically for children with DD.

Limitations

This study had several limitations. First, there were some demographic limitations to our study. Our sample was small and most of the children enrolled in our study came from our university’s center-based EI program. Most of the children were White, and a higher proportion had language delays than would be expected. As most of our sample was White and we did not collect information on caregiver education for all children, we were unable to explore how these factors may have influenced caregiver ratings. In addition, the population of interest was children receiving EI; this resulted in a heterogeneous DD population. Further research is needed in larger, more diverse samples. We were unable to characterize current developmental functioning of the sample. Although developmental testing scores were available for almost all children, for some children, they were provided from the time of entry into services and may have been more than a year old. Ideally, we would have been able to characterize the developmental functioning of the sample at the time of measure completion. We also are not able to conclude which measure is best based on our findings. To determine which measures are best, a head-to-head comparison of the psychometric properties of each measure is still needed. We included the M-CHAT-R/F as a gold-standard ASD screening tool. However, there is limited information on the performance of the M-CHAT-R/F in DD samples. Therefore, high agreement with the M-CHAT-R/F does not necessarily reflect high accuracy of ASD screening. Further research is needed in this area. Finally, a newer version of the ASQ:SE (ASQ:SE-2; Squires et al., 2015) was published while we were conducting this study. We were unfortunately unable to examine the agreement with the ASQ:SE-2, which is a significant limitation of our study. Furthermore, the ASQ:SE-2 includes additional screening items for ASD; therefore, the second version is potentially more useful as an ASD screener than the first version. Future research should examine this possibility.

Future Directions and Clinical Implications

These findings highlight the importance of evaluating measures in different populations. The three socioemotional tools evaluated in this study were standardized based on young children with typical development. These measures might perform differently for children with DD. Results from this study suggest that agreement values may be lower in a sample of children with DD than in a sample of children who are developing typically. Research on the psychometric properties of these measures in samples of children with DD is still needed and can help to elucidate the lack of agreement among the ASQ:SE, BITSEA, and CBCL. Furthermore, additional research on how these measures perform at the younger end of their age range (i.e., 12-18 months) would be beneficial. The question of whether the CBCL is under-identifying children or whether the ASQ:SE and BITSEA are over-identifying children with delays remains. Comparison with more comprehensive socioemotional assessment (e.g., with the Preschool Age Psychiatric Assessment, a structured interview for 2- to 5-year-old children) can help determine whether the CBCL is sensitive enough, and whether the ASQ:SE and BITSEA are specific enough, in a population of children with DD (Egger & Angold, 2004). Specificity of these measures is particularly important to examine. It is likely that specificity is lower in a high risk sample, such as children in EI. However, specificity still needs to be adequate for use within a given population, or children will be inappropriately over-identified. These findings also highlight that some of the broadband socioemotional screening tools may be able to adequately identify children with ASD. As stated previously, future research should examine sensitivity and specificity with comprehensive ASD evaluations. We could also benefit from more information on how well the M-CHAT-R/F identifies ASD in a sample of children with DD. It may be particularly helpful to conduct a head-to-head comparison of the ASQ:SE, BITSEA, and M-CHAT-R/F to see which measure is best at identifying ASD for children with DD. Identifying measures that have adequate sensitivity and specificity in a population of children with DD, while also being brief and easy to administer, is the key for utility in clinical practice. Eligibility and testing requirements for DD and ASD vary widely across states. In many cases, it is difficult for children to qualify for EI services based on social-emotional delays alone. Furthermore, children with suspected ASD may go through different assessment processes (which may or may not include socioemotional screening) depending on local procedures and requirements. Therefore, a brief, accurate assessment tool that could capture socioemotional delays, emotion and behavior problems, and ASD risk, and that would be useful within a variety of different types of evaluation and screening systems, would be especially useful for ensuring that these concerns are being adequately addressed and children referred for appropriate follow-up evaluation and services.

Conclusion

In this sample of children with DD, a large percentage (~40% to 60%) was identified as at-risk for problems with socioemotional functioning, which is consistent with prior reports in similar samples (Baker et al., 2002; Briggs-Gowan & Carter, 2007). Results of this study reveal that some of the most commonly used tools, the ASQ:SE, BITSEA, and CBCL, do not have adequate agreement for toddlers with DD. The CBCL had particularly poor agreement with the other two tools. As a problem behavior checklist, it does not appear to identify the same children as “at-risk,” compared with measures that assess for a broader range of problems with socioemotional functioning. This is not wholly surprising, given that the CBCL, unlike the BITSEA and ASQ:SE, is not a Level 1 screening tool. However, providers should be aware that these tools are not measuring the same constructs and elevations on these tools (or lack thereof) are not necessarily comparable. Further research is needed to determine which measure(s) are best at screening for socioemotional problems and ASD in an EI population and should be recommended for use. It would be ideal if one measure could do both well. Although the CBCL has been proposed to have utility as an ASD screening tool (e.g., Rescorla et al., 2019), our findings suggest that the CBCL may not be useful as an ASD screening tool for young children in EI. The ASQ:SE has potential as an ASD screener, but further research on the ASQ:SE-2 is needed. The BITSEA appears to be a particularly promising measure for ASD screening, which means that it may be able to perform “double duty” and reduce burden on families and caregivers. Our findings are preliminary and warrant replication in a larger sample prior to the establishment of clinical guidelines. Both the ASQ:SE-2 and the BITSEA should be further evaluated for use as potential universal broadband socioemotional screening tools for children with DD.

Footnotes

Acknowledgements

This project was a master’s thesis and we would like to thank the integral feedback of the committee: Drs. Theodore Beauchaine, Luc Lecavalier, and Barbara Andersen. We thank all participating families and the ECE staff, without whom this work would not have been possible.

Authors’ Note

Some information contained in this article was presented during poster sessions at the International Meeting for Autism Research in Baltimore, MD (May 2016) and at the American Psychological Association Annual Convention in Denver, CO (August 2016).

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was funded by the Ohio State University Center for Clinical and Translational Science Grant support (National Center for Advancing Translational Sciences, Grant Number UL1TR001070) and an Award from the Nisonger Center Research Fund. It was also supported by NIH/NCRR Colorado CTSI Grant Number UL1 RR025780. Its contents are the authors’ sole responsibility and do not necessarily represent official NIH views.

ORCID iDs

Dana Kamara

Andrea N. Witwer

Notes

References

Achenbach

T. M.

(1992). Manual for the Child Behavior Checklist/2-3 and 1992 profile. Burlington: Department of Psychiatry, University of Vermont.

Achenbach

T. M.

Rescorla

L. A.

(2000). Manual for the ASEBA preschool forms and profiles. Burlington: Research Center for Children, Youth, & Families, University of Vermont.

Alkherainej

Squires

(2016). Accuracy of three screening instruments in identifying preschool children at risk for autism spectrum disorder. Journal of Intellectual Disability—Diagnosis and Treatment, 3, 156-163. doi:10.6000/2292-2598.2015.03.04.1

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed.; text rev.). Washington, DC: Author.

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Arlington, VA: American Psychiatric Publishing.

Baio

Wiggins

Christensen

D. L.

Maenner

M. J.

Daniels

Warren

Dowling

N. F.

(2018). Prevalence of autism spectrum disorder among children aged 8 years—Autism and developmental disabilities monitoring network, 11 sites, United States, 2014. Morbidity and Mortality Weekly Report (MMWR) Surveillance Summaries, 67, 1-23. doi:10.15585/mmwr.ss6706a1

Baker

B. L.

Blacher

Crnic

K. A.

Edelbrock

(2002). Behavior problems and parenting stress in families of three-year-old children with and without developmental delays. American Journal on Mental Retardation, 107, 433-444. doi:10.1352/0895-8017

Barton

M. L.

Dumont-Mathieu

Fein

(2012). Screening young children for autism spectrum disorders in primary practice. Journal of Autism and Developmental Disorders, 42, 1165-1174. doi:10.1007/s10803-011-1343-5

Bayley

(2006). Bayley Scales of Infant and Toddler Development (3rd ed.). San Antonio, TX: Harcourt Assessment.

10.

Bosl

W. J.

Tager-Flusberg

Nelson

C. A.

(2018). EEG analytics for early detection of autism spectrum disorder: A data-driven approach. Scientific Reports, 8, Article 6828. doi:10.1038/s41598-018-24318-x

11.

Briggs

R. D.

Stettler

E. M.

Silver

E. J.

Schrag

R. D. A.

Nayak

Chinitz

Racine

A. D.

(2012). Social-emotional screening for infants and toddlers in primary care. Pediatrics, 129, e377-e384. doi:10.1542/peds.2015-2927

12.

Briggs-Gowan

M. J.

Carter

A. S.

(2006). BITSEA: Brief Infant-Toddler Social and Emotional Assessment. Examiner’s manual. San Antonio, TX: Harcourt Assessment.

13.

Briggs-Gowan

M. J.

Carter

A. S.

(2007). Applying the infant-toddler social and emotional assessment (ITSEA) and Brief-ITSEA in early intervention. Infant Mental Health Journal, 28, 564-583. doi:10.1002/imhj.20154

14.

Briggs-Gowan

M. J.

Carter

A. S.

Irwin

J. R.

Wachtel

Cicchetti

D. V.

(2004). The Brief Infant-Toddler Social and Emotional Assessment: Screening for social-emotional problems and delays in competence. Journal of Pediatric Psychology, 29, 143-155. doi:10.1093/jpepsy/jsh017

15.

Bush

S. S.

(2010). Determining whether or when to adopt new versions of psychological and neuropsychological tests: Ethical and professional considerations. The Clinical Neuropsychologist, 24, 7-16. doi:10.1080/13854040903313589

16.

Bush

S. S.

Sweet

J. J.

Bianchini

K. J.

Johnson-Greene

Dean

P. M.

Schoenberg

M. R.

(2018). Deciding to adopt revised and new psychological and neuropsychological tests: An interorganizational position paper. The Clinical Neuropsychologist, 32, 319-325. doi:10.1080/13854046.2017.1422277

17.

Carter

A. S.

(2010). The field of toddler/preschool mental health has arrived—On a global scale. Journal of the American Academy of Child and Adolescent Psychiatry, 49, 1181-1182. doi:10.1016/j.jaac.2010.09.006

18.

Carter

A. S.

Briggs-Gowan

M. J.

Davis

N. O.

(2004). Assessment of young children’s social-emotional development and psychopathology: Recent advances and recommendations for practice. Journal of Child Psychology and Psychiatry, 45, 109-134. doi:10.1046/j.0021-9630.2003.00316.x

19.

Christensen

D. L.

Bilder

D. A.

Zahorodny

Pettygrove

Durkin

M. S.

Fitzgerald

R. T.

. . . Yeargin-Allsopp

(2016). Prevalence and characteristics of autism spectrum disorder among 4-year-old children in the autism and developmental disabilities monitoring network. Journal of Developmental & Behavioral Pediatrics, 37, 1-8. doi:10.1097/dbp.0000000000000235

20.

Cooper

J. L.

Vick

(2009). Promoting social-emotional wellbeing in early intervention services: A fifty-state view. New York, NY: National Center for Children in Poverty.

21.

Daniels

A. M.

Mandell

D. S.

(2014). Explaining differences in age at autism spectrum disorder diagnosis: A critical review. Autism, 18, 583-597. doi:10.1177/1362361313480277

22.

de Wolff

M. S.

Theunissen

M. H. C.

Vogeis

A. G. C.

Reijneveld

S. A

. (2013). Three questionnaires to detect psychosocial problems in toddlers: A comparison of the BITSEA, ASQ:SE, and KIPPPI. Academic Pediatrics, 13, 587-593. doi:10.1016/j.acap.2013.07.007

23.

Egger

H. L.

Angold

(2004). The preschool age psychiatric assessment (PAPA): A structured parent interview for diagnosing psychiatric disorders in preschool children. In DelCarmen-Wiggins

Carter

(Eds.), Handbook of infant, toddler, and preschool mental assessment (pp. 223-243). New York, NY: Oxford University Press.

24.

Filipek

P. A.

Accardo

P. J.

Baranek

G. T.

Cook

E. H.

Jr. Dawson

Gordon

Volkmar

F. R.

(1999). The screening and diagnosis of autistic spectrum disorders. Journal of Autism and Developmental Disorders, 29, 439-484.

25.

Glascoe

F. P.

(1997). Parents’ evaluations of developmental status: A method for detecting and addressing developmental and behavioral problems in children. Nashville, TN: Ellsworth and Vandermeer Press.

26.

Hampton

Strand

P. S.

(2015). A review of level 2 parent-report instruments used to screen children aged 1.5-5 for autism: A meta-analytic update. Journal of Autism and Developmental Disorders, 45, 2519-2530. doi:10.1007/s10803-015-2419-4

27.

Hardy

Haisley

Manning

Fein

(2015). Can screening with the ages and stages questionnaire detect autism? Journal of Developmental and Behavioral Pediatrics, 36, 536-543. doi:10.1097/DBP.0000000000000201

28.

Harris

P. A.

Taylor

Thielke

Payne

Gonzalez

Conde

J. G.

(2009). Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics, 42, 377-381. doi:10.1016/j.jbi.2008.08.010

29.

Hartas

(2011). Children’s language and behavioural, social and emotional difficulties and prosocial behaviour during the toddler years and at school entry. British Journal of Special Education, 38, 83-91. doi:10.1111/j.1467-8578.2011.00507.x

30.

Hayes

A. F.

Krippendorff

(2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1, 77-89. doi:10.1080/19312450701277427

31.

Hebbeler

Spiker

Bailey

Scarborough

Mallik

Simeonsson

Nelson

(2007). Early intervention for infants and toddlers with disabilities and their families: Participants, services, and outcomes. Menlo Park, CA: SRI International.

32.

Hoffman

Weber

König

Becker

Kamp-Becker

(2016). The role of the CBCL in the assessment of autism spectrum disorders: An evaluation of symptom profiles and screening characteristics. Research in Autism Spectrum Disorders, 27, 44-53. doi:10.1016/j.rasd.2016.04.002

33.

Ivanova

M. Y.

Achenbach

T. M.

Rescorla

L. A.

Harder

V. S.

Ang

R. P.

Bilenberg

. . . Jeng

(2010). Preschool psychopathology reported by parents in 23 societies: Testing the seven-syndrome model of the Child Behavior Checklist for ages 1.5-5. Journal of the American Academy of Child & Adolescent Psychiatry, 49, 1215-1224. doi:10.1016/j.jaac.2010.08.019

34.

Jones

D. E.

Greenberg

Crowley

(2015). Early social-emotional functioning and public health: The relationship between kindergarten social competence and future wellness. American Journal of Public Health, 105, 2283-2290. doi:10.2105/AJPH.2015.302630

35.

Kim

S. H.

Joseph

R. M.

Frazier

J. A.

O’Shea

T. M.

Chawarska

Allred

E. N.

Kuban

K. K.

(2016). Predictive validity of the modified checklist for autism in toddlers (M-CHAT) born very preterm. The Journal of Pediatrics, 178, 101-107. doi:10.1016/j.jpeds.2016.07.052

36.

King

Tandon

Macias

Healy

Duncan

Swigonski

. . . Lipkin

P. H.

(2010). Implementing developmental screening and referrals: Lessons learned from a national project. Pediatrics, 125, 350-356. doi:10.1542/peds.2009-0388

37.

Kiss

I. G.

Feldman

M. S.

Sheldrick

R. C.

Carter

A. S.

(2017). Developing autism screening criteria for the brief infant toddler social emotional assessment (BITSEA). Journal of Autism and Developmental Disorders, 47, 1269-1277. doi:10.1007/s10803-017-3044-1

38.

Kogan

M. D.

Blumberg

S. J.

Schieve

L. A.

Boyle

C. A.

Perrin

J. M.

Ghandour

R. M.

. . . van Dyck

P. C.

(2009). Prevalence of parent-reported diagnosis of autism spectrum disorder among children in the US, 2007. Pediatrics, 124, 1395-1403. doi:10.1016/s0084-3954(10)79730-x

39.

Krippendorff

(2004). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks, CA: Sage.

40.

Kruizinga

Visser

J. C.

van Batenburg-Eddes

Carter

A. S.

Jansen

Raat

(2014). Screening for autism spectrum disorders with the Brief Infant-Toddler Social and Emotional Assessment. Public Library of Science One, 9, e97630. doi:10.1371/journal.pone.0097630

41.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174. doi:10.2307/2529310

42.

Mullen

E. M.

(1995). Mullen scales of early learning. Circle Pines, MN: AGS.

43.

Newborg

(2005). Battelle developmental inventory (2nd ed.). Itasca, IL: Riverside.

44.

Pizur-Barnekow

Muusz

McKenna

O’Connor

Cutler

(2013). Service coordinators’ perceptions of autism-specific screening and referral practices in early intervention. Topics in Early Childhood Special Education, 33, 153-161. doi:10.1177/0271121412463086

45.

Rescorla

L. A.

(2005). Assessment of young children using the Achenbach system of empirically based assessment (ASEBA). Mental Retardation and Developmental Disabilities Research Reviews, 11, 226-237. doi:10.1002/mrdd.20071

46.

Rescorla

L. A.

Winder-Patel

B. M.

Paterson

S. J.

Pandey

Wolff

J. J.

Schultz

R. T.

Piven

(2019). Autism spectrum disorder screening with the CBCL/1½-5: Findings for young children at high risk for autism spectrum disorder. Autism, 23, 29-38. doi:10.1177/1362361317718482

47.

Rishel

C. W.

Greeno

Marcus

S. C.

Shear

M. K.

Anderson

(2005). Use of the child behavior checklist as a diagnostic screening tool in community mental health. Research on Social Work Practice, 15, 195-203. doi:10.1177/1049731504270382

48.

Robins

D. L.

Casagrande

Barton

Chen

C. M. A.

Dumont-Mathieu

Fein

(2014). Validation of the modified checklist for autism in toddlers, revised with follow-up (M-CHAT-R/F). Pediatrics, 133, 37-45. doi:10.1542/peds.2013-1813

49.

Rose

Lehrl

Ebert

Weinert

(2018). Long-term relations between children’s language, the home literacy environment, and socioemotional development from ages 3 to 8. Early Education and Development, 29, 342-356. doi:10.1080/10409289.2017.1409096

50.

Shaw

Hatton

(2009). Screening and early identification of autism spectrum disorders. Updated. Queries: An occasional paper compiling states’ approaches to current topics. Chapel Hill, NC: National Early Childhood Technical Assistance Center. Retrieved from http://ectacenter.org/~pdfs/pubs/queries/queries_asdscreening.pdf

51.

Sim

Wright

C. C.

(2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy, 85, 257-268. doi:10.1093/ptj/85.3.257

52.

Squires

Bricker

(2009). Ages and Stages Questionnaires (ASQ-3): A parent-completed child monitoring system (3rd ed.). Baltimore, MD: Paul H. Brookes.

53.

Squires

Bricker

Heo

Twombly

(2001). Identification of social-emotional problems in young children using a parent-completed screening measure. Early Childhood Research Quarterly, 16, 405-419. doi:10.1016/S0885-2006(01)00115-6

54.

Squires

Bricker

Twombly

(2002). The ASQ:SE user’s guide. Baltimore, MD: Paul H. Brookes.

55.

Squires

Bricker

Twombly

(2015). The ASQ:SE-2 user’s guide. Baltimore, MD: Paul H. Brookes.

56.

Stahmer

A. C.

Mandell

D. S.

(2007). State infant/toddler program policies for eligibility and services provision for young children with autism. Administration and Policy in Mental Health and Mental Health Services Research, 34, 29-37. doi:10.1007/s10488-006-0060-4

57.

Webb

S. J.

Jones

E. J. H.

Kelly

Dawson

(2014). The motivation for very early intervention for infants at high risk for autism spectrum disorders. International Journal of Speech-language Pathology, 16, 36-42. doi:10.3109/17549507.2013.861018

58.

Weitlauf

A. S.

Vehorn

A. C.

Stone

W. L.

Fein

Warren

Z. E.

(2015). Using the M-CHAT-R/F to identify developmental concerns in a high-risk 18-month-old sibling sample. Journal of Developmental Behavioral Pediatrics, 36, 497-502. doi:10.1097/DBP.0000000000000194

59.

Weitzman

Wegner

(2015). Promoting optimal development: Screening for behavioral and emotional problems. Pediatrics, 135, 384-395. doi:10.1542/peds.2014-3716

60.

Yovanoff

Squires

(2006). Determining cutoff scores on a developmental screening measure: Use of receiver operation characteristics and item response theory. Journal of Early Intervention, 29, 4-62. doi:10.1177/105381510602900104

61.

Zeanah

P. D.

Bailey

L. O.

Berry

(2009). Infant mental health and the “real world”—Opportunities for interface and impact. Child and Adolescent Psychiatric Clinics of North America, 18, 773-778. doi:10.1016/j.chc.2009.03.006