Abstract
Although symptoms of autism are present early in life and early diagnosis can lead to better outcomes, there is a dearth of validated caregiver-report interviews designed for children under the age of 3 years. We developed the Toddler Autism Symptom Inventory, a semi-structured interview designed to assess the presence and absence of skills and symptoms in children aged 12–36 months. Reliability and validity of items and a cutoff score for likelihood of autism spectrum disorder were established. Specificity and sensitivity of this cutoff were confirmed with a cross-validation sample. The Toddler Autism Symptom Inventory effectively identified most children with autism without excessive false positives. The Toddler Autism Symptom Inventory is a developmentally appropriate caregiver interview for use in diagnostic evaluations of children under age 3 years that offers clearly operationalized diagnostic criteria and a cutoff for autism likelihood for very young children.
Lay abstract
Determining whether a young child has an autism spectrum disorder requires direct observation of the child and caregiver report of the child’s everyday behaviors. There are few interviews for parents that are specifically designed for children under 3 years of age. The Toddler Autism Symptom Inventory is a new interview that asks caregivers of children age 12–36 months about symptoms of possible autism spectrum disorder. The Toddler Autism Symptom Inventory uses a cutoff score to indicate likelihood for autism spectrum disorder; this cutoff score appears to accurately identify most children who are diagnosed with autism spectrum disorder without identifying too many who do not have autism spectrum disorder. The Toddler Autism Symptom Inventory interview can help clinicians to determine whether a young child shows symptoms suggestive of an autism spectrum disorder.
Symptoms of autism spectrum disorder (ASD) usually emerge within the first 3 years of life (Ozonoff & Iosif, 2019). Whereas typically developing (TD) children point, show objects to others, look toward others, and respond to their name, these behaviors are often reduced or absent as early as the first birthday in children later diagnosed with ASD (Osterling & Dawson, 1994; Ozonoff et al., 2010). In addition, young children later diagnosed with ASD show early difficulty in disengaging visual attention, reduced or decreasing eye contact, reduced expression of positive affect and anticipation during a social interaction, atypical play behaviors, repetitive movements, and motor delays (Jones & Klin, 2013; Leonard et al., 2014; Ozonoff et al., 2008; Sacrey et al., 2018; Zwaigenbaum et al., 2005). Although repetitive behaviors and a preference for routine are not uncommon in TD children (Evans et al., 1997; Harrop et al., 2014), by 12–24 months of age, stereotypical and restricted and repetitive behavior (RRB) frequency is higher in high-risk children diagnosed with ASD than high- and low-risk children not diagnosed with ASD (Wolff et al., 2014). The divergence of both social-communicative and RRB symptoms starting by age 12 months suggests that this may be an appropriate time to begin detection of elevated ASD likelihood in the general population.
In spite of some providers’ hesitation about diagnosing children with an ASD in the first 2 years of life, diagnoses assigned under the age of 2 years have been shown to be stable and reliable (Chawarska et al., 2007; Kleinman, Ventola, et al., 2008), with one recent study suggesting that diagnoses are stable for most children as young as 14 months of age (Pierce et al., 2019). Children with less clear or milder manifestations of ASD, however, may not be able to be detected until later in life. The finding of diagnostic stability holds even in children with cognitive, social, and language abilities under the age equivalent of 1 year (Hinnebusch et al., 2017).
Although ASD symptoms are present and recognizable early in life, the median age of diagnosis in the United States in 2014 is still around 51 months (Maenner et al., 2020). Use of ASD-specific screeners in young children has been shown to aid in early identification and to reduce disparities in age of diagnosis, provided that the child is referred for a developmental evaluation after a positive screening result (Chlebowski et al., 2013; Herlihy et al., 2014; Robins et al., 2014; Sánchez-García et al., 2019; Wetherby et al., 2008).
Early diagnoses are essential for optimizing outcome; children who access intervention services early have better outcomes than those who access services later (MacDonald et al., 2014; Rogers et al., 2014). Thus, reliable and valid screening and evaluation tools for assessing ASD symptoms in young children are important for achieving the best possible outcomes.
Diagnosis of ASD in toddlers
While screening tools are invaluable for identifying children at elevated likelihood for ASD, they do not provide a formal diagnosis of ASD, which is often required for children to receive access to the specialized intervention approaches that have been associated with the most favorable outcomes. Full diagnostic evaluations, which are more time- and cost-intensive than screeners, allow a trained clinician to gather more information about a child’s functioning, and ideally include both direct observation and caregiver report (Kim & Lord, 2012a; Sacrey et al., 2018). Direct observation allows a clinician to assess a child’s social interactions with familiar and unfamiliar adults, which is crucial. Most commonly used is the Autism Diagnostic Observation Schedule–Second Edition (ADOS-2; Lord et al., 2012), an empirically supported observational measure that can be used to determine presence and severity of ASD. Several factors indicate the need for caregiver report in addition to direct observation: the child’s behavior may not be typical for him or her in an unfamiliar setting and caregiver report provides information about the child’s everyday functioning. In addition, caregiver report includes behaviors that may not be observable in an unfamiliar office or during a time-limited observation, including interaction with peers, response to displays of emotion by others, and repetitive play with a favorite toy.
Caregiver report, which is generally quite accurate (Miller et al., 2017; Sacrey et al., 2018), can be gathered using semi-structured interviews. Caregivers often are able to report a child’s daily functioning with high accuracy (Miller et al., 2017), and when differences between parents and clinicians are found, parent report shows greater ability to differentiate children with ASD from those without ASD (Sacrey et al., 2018). The most widely used and evidence-supported caregiver report interview is the Autism Diagnostic Interview–Revised (ADI-R; Rutter et al., 2003), a standardized interview used in research and clinical diagnostic evaluations when ASD is suspected. Toddler algorithms have been developed, extending use of the ADI-R down to 12 months chronological or 10 months developmental age, with sensitivity ranging from 67% to 100% and specificity between 64% and 94% for a Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV) diagnosis (Kim et al., 2013). The ADI-R toddler algorithms are based on verbal ability and age, and the lowest-functioning published sample includes children under 20 months of age and non-verbal children age 21–47 months (Kim & Lord, 2012b). However, this combined sample is heavily weighted toward older non-verbal children (n = 318, age 21–47 months with ASD) rather than younger children (n = 43, under 20 months of age with ASD). Thus, the reported validity of the interview may not accurately describe how it functions with children under 20 months of age. In addition, the ADI-R does not align with the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; American Psychiatric Association, 2013) and is not designed to specifically ask about ASD symptoms as they manifest in toddlers; most questions on the ADI-R apply to individuals of all ages, and behaviors that may characterize toddlers with ASD in particular are not addressed. For example, the ADI-R does not specifically ask caregivers about some of the pre-verbal social communication behaviors that are important for early detection of ASD, including integrating eye contact with gestures, noticing another’s positive emotions and joining in them, pointing to show something the child has done or is doing (as opposed to pointing to express interest in something at a distance), or engaging in back-and-forth babbling. While other semi-structured caregiver interviews are available, including the Diagnostic Interview for Social and Communication Disorders (DISCO-11) and the Developmental, Dimensional, and Diagnostic Interview (3di), to our knowledge there are no published, validated semi-structured interviews specifically designed and validated for caregivers of toddlers suspected of having ASD (Randall et al., 2018). A relatively brief interview that incorporates how ASD symptoms may manifest in toddlers has a validated cutoff score indicating elevated likelihood of ASD, and is aligned with DSM-5 symptoms will aid in streamlining diagnostic evaluations for very young children.
Challenges associated with early diagnosis
Several important challenges are present when young children are referred for diagnostic evaluations. First, caregivers of children later diagnosed with ASD most often report language concerns, which are also common caregiver concerns in children with global delay and language disorders (Coonrod & Stone, 2004; Richards et al., 2016). While many children with other developmental disorders (DDs) or even typical development may demonstrate some behaviors seen in children who are subsequently diagnosed with ASD, determining whether the number, frequency, and severity of ASD symptoms are sufficient for an ASD diagnosis can be difficult.
Second, the DSM-5 ASD diagnosis describes criteria for individuals of all ages; there are no specific criteria based on the individual’s age or developmental level. These criteria can be challenging to apply to toddlers (Barton et al., 2013; Matson et al., 2012). Operationalizing these criteria into specific, observable toddler-level behaviors is essential for accurate identification of ASD in early childhood.
Third, toddlers with delayed cognitive skills and other circumstances such as inconsistent caregiving or a lack of consistent exposure to peers may be particularly difficult to diagnose. For example, children who do not yet have the requisite cognitive abilities, such as imitation and representation (Nielsen & Dissanayake, 2004), are unlikely to demonstrate pretend play because of developmental delays. If a young child spends much of their time at home or with only adults, it will be difficult to judge their difficulty forming and maintaining age-appropriate peer relationships.
A caregiver interview addressing the toddler-specific behaviors that are characteristic of ASD, using DSM-5 criteria, may help clinicians identify the specific skills and symptoms that are needed for diagnosis. An operationalized measure will also allow for more careful examination of behaviors common in ASD that are also observed in children with other DDs or limited experiences.
The aim of this study was to develop a caregiver interview for use as part of the ASD diagnostic evaluation in toddlers age 12–36 months. Reliability and validity demonstrating the measure’s ability to identify children with ASD are reported.
Methods
Participants
Participants were recruited by pediatricians involved in large cross-site studies of the early detection of ASD. Caregivers completed one or more age-appropriate screening tools: the Modified Checklist for Autism in Toddlers–Revised with Follow-up (M-CHAT R/F) at age 15 or 18 months (Robins et al., 2009, 2014), First Year Inventory–Lite (FYI-L) at 12 and 15 months (Baranek et al., 2014), or Infant-Toddler Checklist (ITC) at 12 months (Wetherby & Prizant, 2002). All children screened before 18 months were rescreened with the M-CHAT R/F at 24 or 36 months. Children (aged 12–36 months) identified by a positive screen or by ASD-specific concerns raised by a participating pediatrician were referred for a cost-free developmental/diagnostic evaluation. Exclusionary criteria for children included a previous diagnosis of an ASD and significant sensory or motor impairments that would preclude diagnostic testing (e.g. blindness, deafness, and severe cerebral palsy).
Caregivers of children who attended an evaluation (n = 336) were administered an extensive, semi-structured pilot interview during the evaluation. Forty-two interviews (19 children with ASD, 23 non-ASD; mean age 27.25 months, standard deviation (SD) = 4.14) were excluded for having more than four missing responses. Data were gathered over 5 years. Data from the first 204 children were used to develop the current version of the Toddler Autism Symptom Inventory (TASI) and the total score cutoff indicating likelihood of ASD (Sample 1). Data from the next 90 children were used to cross-validate this cutoff (Sample 2). Demographic and other characteristics of each sample are reported in Table 1.
Comparison of participant samples on demographic characteristics and diagnoses.
SD: standard deviation; CBE: clinical best estimate; ASD: autism spectrum disorder; DD: developmental disorder; TD/ND: typically developing or no-diagnosis; SES: socio-economic status.
TASI item and cutoff score development
The TASI is a semi-structured interview developed to be used by clinicians during diagnostic evaluations of very young children. The TASI was produced in a four-step process. First, a team of expert clinicians, including psychologists (M.L.B., D.L.R., W.L.S., and D.A.F.) and a developmental-behavioral pediatrician (T.D.-M.), collaborated to identify behaviors seen in toddlers age 12–36 months with any form of ASD. Observable behaviors relating to each Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR) symptom were first generated. Behaviors commonly associated with ASD that are not apparent in very young children, such as changing behavior to suit specific social contexts, were not included. The pilot interview comprised 63 such items.
Next, data were collected on this 63-item pilot interview during diagnostic evaluations at three sites. Caregiver responses were classified as indicating symptom presence or absence.
Third, and occurring concurrently with data collection, our team evaluated TASI items for fit with the DSM-5 diagnostic criteria. Items that did not align with these diagnostic criteria, or that were found to be frequently misunderstood by caregivers, were removed. One example of such an item is how much the child seems to enjoy praise and attention, which was often misunderstood.
Fourth, mapping each TASI item onto a DSM-5 symptom was a challenge, as has previously been found with other measures (Barton et al., 2013). Extensive discussion led to consensus about assigning each item to the most relevant DSM-5 symptom. In general, our mapping of behaviors onto DSM-5 symptoms largely aligns with other work with similar measures (Evers et al., 2020). For example, while the authors of the 3di classified responses to a parent’s facial expression as a behavior aligned with the social-emotional reciprocity symptom of the DSM-5, as the TASI does, other authors have argued that this item better fits within the DSM-5 non-verbal communication symptom (Evers et al., 2020). Examination of the TASI’s mapping of symptoms compared to other measures revealed only one difference compared to the 3di (smile to familiar adult; 3di classified this in non-verbal communication while the TASI classifies it in social-emotional reciprocity), one in the ADOS-2 (response to joint attention; in social-emotional reciprocity symptom on ADOS-2 (Evers et al., 2020), while in non-verbal communication symptom of TASI), and none compared to the DISCO-11.
Finally, the percent of caregivers endorsing each TASI item was examined. One item (child uses no gestures) was removed from the total score calculation because it was almost never endorsed, and one item (how child responds to a hurt or sad adult) was removed due to its frequent endorsement in children with DD and TD.
Each TASI item was aligned with one ASD symptom in the DSM-5. Within the social-emotional reciprocity symptom, for example, the TASI includes questions about a child’s responses to others’ emotions, response to their name, initiation of joint attention for the purpose of showing items of interest or things the child has done, and engaging in back-and-forth babbling. Behaviors in the non-verbal communication symptom include making eye contact when requesting or during physical or social games, use of gestures, and whether the child follows an adult’s point. The relationship symptom includes questions regarding a child’s interest in other children, how the child usually plays when other children are present or when approached by another child, and whether the child engages in spontaneous imitation or pretend play. The RRB symptom of stereotyped and repetitive movements or speech includes questions about any atypical play behaviors such as lining up toys, immediate and delayed echolalia, and repetitive hand, finger, and body movements. The insistence on sameness section includes questions about how a child responds to a change in routine or in the environment, and whether the child attempts to impose their routines on others. The restricted, fixated interests symptom includes two questions about whether the child has any intense or atypical interests. Finally, the sensory table includes 29 examples of sensory symptoms often observed in toddlers with ASD and sorts them into three categories: sensory-seeking (e.g. spins car wheels while watching), hyper-sensitivity (e.g. startles easily, and covers ears), and hypo-sensitivity behaviors (e.g. does not react to painful stimuli) across sensory modalities of visual, tactile, auditory, and smell and taste. Each symptom (e.g. sensory-seeking) only requires one behavior to be scored as present.
The final TASI is composed of 37 interview items and a table for eliciting information about sensory symptoms. The TASI interview form, administration and scoring manual, and scoring tool are accessible online at https://mchatscreen.com/tasi/ .
The TASI also includes specific questions addressing regression of skills, which are not included in the TASI total score. On average, the TASI interview takes about 40 min to administer and score.
Diagnostic procedures
Diagnostic evaluations were conducted by a clinical team composed of a licensed psychologist or developmental-behavioral pediatrician, all of whom were autism specialists, and a graduate student or other trainee. Final diagnosis was a DSM-5 clinical best estimate (CBE) assigned by the clinical team; CBE has long been regarded as the gold standard in diagnosing ASD (Klin et al., 2000; Ventola et al., 2006). Diagnosis was based on data from the Mullen Scales of Early Learning (MSEL), the ADOS-2, the Vineland Adaptive Behavior Scales–2 (VABS-2), the Child Behavior Checklist (CBCL), demographic and history forms, and the TASI pilot interview. Each DSM-5 symptom was determined to be either present or absent based on these data, and only children who met full DSM-5 criteria were given an ASD diagnosis. The TASI scores were not used in CBE decision-making and clinicians knew that TASI was in development, but were free to use caregiver answers as part of their clinical judgment.
Children not meeting criteria for an ASD diagnosis were evaluated for other neurodevelopmental disorders, including Language Disorder, Global Developmental Delay, and Developmental Coordination Disorder. All children with these diagnoses, and all those who demonstrated sub-threshold ASD symptoms that did not meet DSM-5 criteria, were included in the other DD group. The third group was composed of typically developing or no-diagnosis (TD/ND) group; these were children who demonstrated some mild delays but did not meet full diagnostic criteria for any disorder as well as children without any apparent delays.
Legal guardians provided informed consent for themselves and their child. Following the evaluation, results and recommendations were discussed with caregivers, and a full written report followed. All procedures were approved by the Institutional Review Boards at the University of Connecticut, Drexel University, and Georgia State University.
Diagnostic measures
The MSEL is a developmental assessment designed for use with children age 0–68 months (Mullen, 1995). Visual Reception, Fine Motor, Receptive Language, and Expressive Language domains were assessed. The MSEL provides T-scores for each domain as well as a composite score (Early Learning Composite, ELC).
The ADOS-2 (Lord et al., 2012) is the most widely used observational measure of symptoms of ASD. It is a play-based measure designed to elicit social and communicative behaviors and provides opportunities for children to engage in RRBs. Calibrated severity scores (CSS; Esler et al., 2015; Gotham et al., 2009) for the Toddler Module and Module 1 were calculated.
Psychometric evaluation of the TASI
Inter-rater reliability
A third sample was used to calculate inter-rater reliability of the TASI interview. This sample was composed of 15 evaluated children aged 12–36 months, not in Samples 1 or 2. These children were evaluated after the collection of Sample 2. They were included sequentially, without reference to diagnosis or specific age; interviews in which the audio was not clear enough to understand the clinician and parent were excluded. The mean age (20 months) was similar to the two main samples (19 and 19.5 months). Approximately half of the sample were children with ASD (eight children, 53%), whose mean ELC (62) and CSS (8.4) were similar to the ASD groups in the two main samples (ELC of 64 and 65 and CSS of 7.9 and 7.9). Two clinicians (K.L.C. and D.A.F.) independently reviewed videotapes of TASI administrations and independently scored each item on the TASI. Hyper-sensitivity and hypo-sensitivity were scored together as a single category indicating atypical sensory response. Cohen’s κ and percent agreement are shown for each individual item and for ASD likelihood classification (see section “Validity” below) (Table 3).
Validity
Sensitivity (ability to detect ASD) was operationalized as the percent of children with a CBE diagnosis of ASD who were identified by the TASI cutoff. Specificity was operationalized as the proportion of children without ASD who were correctly identified by the cutoff as not having ASD (Dow et al., 2017). Positive predictive value (PPV; the confidence that a positive score indicates the disorder) was defined as the percent of children who met the TASI cutoff for an ASD diagnosis and independently received a CBE of ASD among all children who met the TASI cutoff. Negative predictive value (NPV; the confidence that a negative score indicates the disorder is not present) was defined as the percent of children who did not meet the TASI cutoff and who did not have a CBE diagnosis of ASD among all children below the TASI cutoff. Validity statistics were first calculated for the initial 204 children (Sample 1; Tables 4 and 5) based on the receiver operating characteristic (ROC) curve for total TASI score (Figure 1); the cutoff and other psychometric properties were then verified in a cross-validation sample (Sample 2; Tables 4 and 5).

Receiver operating characteristic curve for Sample 1.
Validity was also calculated for subsamples of children 23 months and younger and 24–36 months of age (Table 6), for subsamples of boys and girls, and for the TASI’s ability to differentiate ASD from other DDs (Table 7). Finally, agreement between the ADOS-2 and TASI scores were examined in children with ASD.
Community involvement
Participating children and families were not involved in study design, implementation, analysis, or interpretation.
Results
Sample
In Sample 1 (n = 204), 53 children were given a CBE diagnosis of DSM-5 ASD. In Sample 2, the cross-validation sample, (n = 90), 19 children received an ASD diagnosis (Table 1). Proportion of diagnostic classification (ASD vs DD vs TD/ND) was different in the two samples, with a greater proportion of DD children in Sample 2.
Samples 1 and 2 differed in racial/ethnic makeup, with a greater proportion of Black/African American children in the cross-validation sample (Table 1). There were no significant differences in annual income between Samples 1 and 2.
Samples 1 and 2 were largely comparable in ADOS-2 and MSEL scores (Table 2). The ASD groups in Samples 1 and 2 did not differ on any ADOS-2 or MSEL score, whereas the other two diagnostic groups (DD and TD/ND) had a few small differences between Samples 1 and 2; the only difference that remained significant after Bonferroni’s correction was the finding of more RRBs in the DD Sample 2 group than Sample 1.
Comparison of participant samples on assessment results.
ASD: autism spectrum disorder; MSEL: Mullen Scales of Early Learning; ELC: Early Learning Composite; ADOS-2: Autism Diagnostic Observation Schedule–Second Edition; SA: social affect; RRB: restricted repetitive behavior; TASI: Toddler Autism Symptom Inventory; CSS: calibrated severity scores; DD: developmental disorder; TD/ND: typically developing or no-diagnosis.
Bonferroni’s corrected p-value = 0.00556.
MSEL ELC standard score (M = 100, SD = 15).
MSEL domain T-score (M = 50, SD = 10).
Reliability
Cohen’s κ was calculated to compare the two independent evaluators’ TASI scores for each item and for the cutoff score that determined ASD likelihood classification (Table 3).
Inter-rater reliability for TASI diagnostic cutoff score and items (reliability sample).
TASI: Toddler Autism Symptom Inventory; CI: confidence interval.
Item not included in TASI total score.
κ not able to be calculated due to insufficient variability.
Inter-rater reliability of all TASI items was good or very good (κ >0.60; Altman, 1991), except for item 36: “Is there anything that your child is interested in that seems like all he or she wants to do?,” which had moderate reliability. At the level of ASD likelihood classification, discussed below, the raters achieved 100% agreement. Thus, Cohen’s κ was in the “very good” range (κ = 1.00).
Validity
A total score cutoff indicating likelihood of ASD was derived using ROC analyses. Using Sample 1, an ROC curve (Figure 1) indicated excellent discrimination between ASD and non-ASD groups (area under the curve (AUC) = 0.92; Table 4). An optimal cutoff that maximized sensitivity while maintaining specificity above 0.80 was determined to be a score of 7 or more out of a total possible score of 40 (Table 4). A second ROC curve (Figure 2) calculated for Sample 2 showed a similarly high AUC (0.89; Table 4). Validity for alternate cut-off points, which may be helpful for purposes in which an emphasis on higher sensitivity or specificity is required, is reported in Table 4.
Sensitivity and specificity of alternate cutoff points for ROC of Samples 1 and 2.
AUC: area under the curve.

Receiver operating characteristic curve for Sample 2.
Validity is summarized in Table 5. In Sample 1, with the cutoff of 7 or more, sensitivity was 88.68 and specificity was 81.46. PPV for ASD was 62.67 and NPV was 95.35. The sensitivity of Sample 2 was slightly higher (89.47), with a concomitant decrease in specificity (67.61). PPV for ASD in this sample was 42.50 and NPV was 96.00.
TASI sensitivity, specificity, PPV, and NPV for Sample 1 (development) and Sample 2 (cross-validation) using a cutoff of 7.
TASI: Toddler Autism Symptom Inventory; CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value.
As discussed in Pandey et al. (2008), false positives are primarily undesirable because they can create unnecessary alarm. However, if the false positive cases are primarily children with other developmental disorders rather than TD children, this concern is somewhat obviated. Therefore, we also calculated PPVanyDD where false positives in this analysis were defined as children in the TD/ND group. These values were calculated for both Sample 1 (86.67) and Sample 2 (95.00; Table 5), indicating that the non-ASD children exceeding the cutoff score usually had another developmental disorder.
In order to determine whether the TASI’s psychometrics are good in children under 24 months of age, for whom other measures often fall short, we repeated several analyses in this subsample. Our sample was young; 89% of our sample was under 24 months of age (32% were under 18 months of age). Validity statistics were calculated for children age 12–23 months old and compared to those for children age 24–36 months old (Table 6). Sensitivity and specificity of the younger group were found to be quite similar to those of the older group; however, PPV was lower in the younger group. The AUC value for children under 24 months of age was found to be 0.89 compared to 0.97 in children 24–36 months of age.
Sensitivity, specificity, PPV, and NPV for subsamples of children younger and older than 24 months of age using a cutoff of 7.
CI: confidence interval; PPV: positive predictive value; NPV: negative predictive value.
TASI psychometrics were also calculated separately for girls and boys. Within the whole sample, the AUC values for girls (AUC = 0.95, standard error (SE) = 0.021) and boys (AUC = 0.89, SE = 0.027) were comparable.
We also sought to demonstrate that the TASI effectively differentiates children with ASD from those with other DDs and that the psychometrics above were not simply driven by the ability to differentiate ASD from TD. Not surprisingly, AUC and specificity were found to be slightly lower for discriminating ASD from other DD’s than for discriminating ASD from non-ASD (above), but remained satisfactory (AUC >0.85, Table 7).
Sensitivity and specificity of alternate cutoff points for ASD versus DD ROC of Samples 1 and 2.
AUC: area under the curve; DD: developmental disorder; ROC: receiver operating characteristic.
Finally, agreement between the ADOS-2 and TASI was evaluated. Of the 72 children diagnosed with ASD as per CBE, only one scored in the non-ASD range on the ADOS-2; this child had a TASI score that fell in the elevated likelihood range. Six of the 72 children with ASD did not meet the TASI cutoff score; all of these children were identified by the ADOS-2.
Discussion
The TASI is a relatively brief, easily scored parent interview addressing the symptoms and behaviors often observed in toddlers with ASD. A cutoff score was developed and tested across two samples. The TASI form itself offers clinicians an opportunity to identify not only whether the child meets the cutoff for elevated ASD likelihood but also whether each operationalized DSM-5 criterion is present or absent based on caregiver report. We evaluated the validity of the cutoff against a DSM-5 CBE diagnosis in children aged 12–36 months (32% of whom were under 18 months of age), replicated these psychometrics in a cross-validation sample, and demonstrated inter-rater reliability of interview items and TASI ASD likelihood classification.
The TASI fills a gap in available interviews designed for use with very young children with suspected ASD, and has the advantage of operationalizing DSM-5 symptoms by providing specific behaviors and examples relevant to toddlers. This operationalization provides support in identifying specific language and/or play delays, which are clear challenges in early ASD diagnosis. The sample was referred based on positive results on standardized ASD screening or pediatric provider concern about ASD, making it comparable to clinically referred community samples. Children were included regardless of developmental level and represented a sample younger than most other studies of caregiver interviews. Furthermore, the TASI is efficient relative to other ASD semi-structured interviews, taking an average of 40 min to complete. The TASI’s ease of administration and immediate, simple scoring should make it a useful addition to available tools for assessment of ASD in clinical and research contexts.
Reliability
Inter-rater reliability of TASI items was strong; all Cohen’s κ values were “good” or “very good” (κ >0.60; Altman, 1991), except for one item (#36) on intense interest in a certain activity (Is there anything that your child is interested in that seems like all he or she wants to do?). Low reliability of this question may indicate that assessing whether the intensity of a young child’s interests is age-appropriate based exclusively on caregiver report is challenging. Brief intense interests in activities (“read it again!” and “read it again!”) can be developmentally appropriate in children this age (Evans et al., 1997), and it is difficult to determine whether the intensity of a specific interest exceeds typical behavior using only caregiver-report information. At the level of TASI ASD likelihood classification, inter-rater reliability was found to be 100%. Thus, item-level and ASD likelihood classification reliability of the TASI is comparable to other widely used diagnostic measures, including the ADI-R (Chakrabarti & Fombonne, 2001; Lord et al., 1994) and ADOS-2 (Lord et al., 2012; Zander et al., 2016), suggesting that it has sufficient reliability for eliciting caregiver report of symptoms in this population. A further strength of the TASI is that strong reliability was possible without any intensive training; raters used a scoring manual that includes specific examples and scoring guidance.
Validity
The TASI appears to achieve our goal of adequate sensitivity in a young sample, but due to the overlap in symptoms commonly observed in ASD and other DDs in young children, it also produces some false positives. The TASI’s validity (Table 4) is comparable to other diagnostic measures with mixed age samples (Charman & Gotham, 2013; Kim & Lord, 2012b); however, direct comparisons with other measures such as the ADI-R, whose algorithms also differ based on age and verbal abilities (Kim et al., 2013), are challenging due to these sample differences. Using a cutoff of 7, sensitivity ranged from 88% to 89%, and specificity ranged from 67% to 81%. Sample differences may have contributed to the observed differences in specificity between groups. The TASI’s normative sample includes a larger sample of very young children (children <24 months) and more children with developmental delays than normative samples for other measures (Kim & Lord, 2012b). In spite of the difficulty associated with identifying symptoms of ASD in this young age group, including the confounding effects of adaptive skills and developmental level (Zwaigenbaum et al., 2016), as well as the high frequency of language concerns in children with other developmental disorders (Coonrod & Stone, 2004; Richards et al., 2016), validity remained good even in the subsample of children under 24 months (Table 6), with relatively similar sensitivity and specificity, but lower PPV in the younger sample. PPV of the TASI cutoff for ASD indicates that 43%–63% (in the two samples) of children who meet the cutoff get a CBE diagnosis of ASD. Low PPV values are common in measures of symptoms in very young children (Kleinman, Robins, et al., 2008), since many ASD symptoms in young children overlap with the clinical presentation of other developmental disorders or delays. However, PPV for having any diagnosable developmental disorder, including ASD, was found to be 87%–95%. Therefore, there were very few children (12 total across both samples, n = 294) showing typical development or non-diagnosable mild delays who were inaccurately classified by the TASI cutoff score. Detecting other neurodevelopmental disorders is still a misclassification for an instrument designed to detect autism and should be taken into account as such by users. However, most false positives did warrant another diagnosis and appropriate intervention referral.
NPV, or the probability that a child whose score did not meet the ASD threshold on the TASI will not have ASD, is important when considering that a non-ASD diagnosis may limit access to early intervention. Failing to identify a child with ASD and thus disqualifying that child from accessing early intensive services is a key concern. The TASI was found to have high NPVs, such that fewer than 5% of children who score as non-ASD will in fact receive a CBE diagnosis of ASD, suggesting that the TASI can appropriately rule out ASD. This NPV is higher than that of other commonly used instruments (Charman & Gotham, 2013), perhaps due to the TASI’s focus on symptoms most common to a narrow age range of very young children.
Results indicate that even when differentiating children with ASD from those with other delays (DD), the TASI performs well (Table 7). The TASI was also found to work well in both boys and girls.
Finally, research or clinical evaluations may more strongly favor either detecting all children presenting with symptoms even at a cost to specificity, or classifying only the most certain cases by maximizing specificity (e.g., for genetic studies). Validity for alternate cutoff scores is reported in Table 4. For clinical or research purposes that seek to maximize sensitivity, a cutoff score of 5 or more yields sensitivity of 96%. To prioritize specificity, a cutoff score of 9 or more yields specificity of 92%. In settings in which specificity is more important, a cutoff score of 8 or 9 would likely be appropriate. The identified cutoff of 7 was determined based on the psychometrics observed in Sample 1; differences were noted in Sample 2, and further work with other samples should continue to elucidate the most appropriate cutoff score.
It has been noted that diagnostic accuracy with young children improves when observation is supplemented by parent report (Kim & Lord, 2012a; Sacrey et al., 2018). Our data indicate that neither the TASI nor the ADOS-2 catches all children determined to have ASD, although the ADOS-2 in the hands of expert diagnosticians only missed one child, who was accurately identified by the TASI. In other situations, children attending an evaluation may be shy, tired, ill, and so on; these factors may result in an inaccurate sample of the child’s typical behavior. Therefore, parent report is essential. In situations where the opportunity for direct observation is limited, a caregiver interview that can be conducted via telehealth may enable clinicians to still conduct ASD evaluations and may ensure continued access to intervention services.
In addition to a clear cutoff score, the TASI also provides explicit operationalization of the DSM-5 symptoms as they manifest in toddlers. The TASI’s accompanying scoring sheet enables a clinician to clearly visualize which DSM-5 symptoms are present based on caregiver report.
Limitations
Several limitations should be noted. DSM-5 diagnostic criteria are written to capture individuals of all ages; there may be different perspectives within the field on how to best operationalize individual criteria for children under 3 years of age. Very few differences were noted between the TASI and other measures (Evers et al., 2020), suggesting that our group’s consensus in mapping TASI items to diagnostic criteria was very similar to others’ assignments.
Another limitation is sample ascertainment. Our sample was gathered by screening positive on an ASD screener, or by provider concern during a primary care check-up. Different types of samples (younger siblings of affected children, those with other risk factors such as prematurity) may yield different results. The TASI includes questions about regression that are not included in the calculation of TASI total score; however, clinicians may interpret a positive response to regression questions as a cue to complete a more comprehensive interview about regression (Ozonoff & Iosif, 2019).
In addition, Sample 1 (development), Sample 2 (cross-validation), and Sample 3 (inter-rater reliability) were not collected during the same time period, or randomly assigned to development and cross-validation groups, or matched for sample characteristics. Samples 1 and 2 were generally very similar in characteristics (Table 2); there was only one significant difference between DD groups in ADOS RRB CSS, which we assume is a random finding. Instead, data collected first were used to develop the cutoff score, while data entered later were assigned to the cross-validation sample. This resulted in small demographic differences between the samples, as well as time of data collection as a confound. Due to the young age of the children in our sample, children and parents were not separated; although this may have resulted in some bias in scoring the TASI, the responses elicited from parents were not changed by the interviewer based on direct observation. Future work should validate the TASI in situations where the parent is interviewed before the clinician observes the child.
Finally, our current sample is not reflective of the current US census or of the specific sites. It is essential that new tools such as the TASI are developed with data from a diverse sample in terms of race, ethnicity, and socio-economic factors, and its performance within specific groups assessed. The TASI should be evaluated in other samples in order to further validate or revise the recommended cutoff score.
Directions for future work will include examining the TASI’s performance in various racial/ethnic groups, and with larger samples. It will also be important to examine the TASI’s performance of children ascertained in different ways, such as using different screeners, or in specific high-risk groups (e.g. baby sibs of affected children and premature children). Validity should be further studied by comparing TASI results with those of other diagnostic interviews such as the ADI-R, when both are given experimentally to a common sample. Additional psychometrics of the TASI including item loadings in factor analysis and internal consistency will also be examined.
Conclusion
The TASI is a novel caregiver-report interview designed to identify the presence and absence of ASD symptoms in children 12–36 months old. This interview aids in differentiating ASD from other developmental disorders and from typical development. The TASI is brief, semi-structured, and offers a clear operationalization of DSM-5 diagnostic criteria for young children. The TASI allows for both the calculation of a total sum to evaluate elevated likelihood of ASD, and a clear determination of whether each individual DSM-5 symptom is present or absent based on caregiver report.
The TASI cutoff score offers good psychometrics, suggesting it can be a useful way to assess elevated likelihood of ASD. The TASI shows good reliability and validity, and is an improvement over existing caregiver-report diagnostic measures in its ease of administration, clear scoring aligned with DSM-5, and brevity. Furthermore, false positives detected by the ASD likelihood cutoff score are very likely to have a diagnosable neurodevelopmental disorder, and thus require referral to appropriate intervention services.
Combining information from caregiver report instruments and direct observation improves accuracy of these tests to identify those with a clinical best estimate diagnosis of ASD (Risi et al., 2006). The data presented here indicate that while neither the TASI nor the ADOS-2 alone accurately identified every child with ASD, their use in conjunction reduced the number of misses to zero. The TASI is only one source of information regarding a child’s functioning, and thus should always be used in conjunction with other diagnostic tools including a complete developmental history, developmental testing, and structured behavioral observations. The TASI, when used in conjunction with other measures, will allow for early and accurate diagnoses, thus permitting children to access early intervention services.
Footnotes
Acknowledgements
The authors are deeply grateful to all of the children, families, pediatric providers, and colleagues who have been a part of our work. The authors also thank Lauren B. Adamson, PhD, Sarah Dufek, PhD, Sherira Fernandes, PhD, Elizabeth Karp, PhD, and Aubyn Stahmer, PhD, for their thoughtful contributions to the TASI. Data used in the preparation of this manuscript were submitted to the National Institute of Mental Health (NIMH) Data Archive (NDA). NDA is a collaborative informatics system created by the National Institutes of Health to provide a national resource to support and accelerate research in mental health (data set identifier: DOI: 10.15154/1519083). This manuscript reflects the views of the authors and may not reflect the opinions or views of the NIH.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Drs D.L.R., M.L.B., and D.A.F. are co-owners of M-CHAT LLC, which licenses use of the M-CHAT in electronic products. Dr D.L.R. sits on the advisory board of Quadrant Biosciences Inc.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by funds from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (R01HD039961) and the National Institute of Mental Health (R01MH115715).
