Abstract
Males are more likely to be diagnosed with autism than females, and at earlier ages, yet few studies examine sex differences in screening. This study explored sex differences in psychometric properties, recommended cutoff scores, and overall scores of the Modified Checklist for Autism in Toddlers, Revised, with Follow-Up. Participants were 28,088 toddlers enrolled in four early detection of autism studies. Children (N = 731) at high likelihood for autism attended evaluations after screening and/or primary care clinician concern. Females were less likely to screen at high likelihood for autism at each stage of screening and therefore less likely to be invited for evaluations. Positive predictive value was significantly lower among females than males, but sensitivity was similar. False positive females were likely to have another developmental delay. Cutoff scores for males and females matched recommended guidelines. Final scores on the Modified Checklist for Autism in Toddlers, Revised, with Follow-Up did not differ between males and females diagnosed with autism, but did for the overall sample identified at high likelihood for autism. Our findings suggest that females are less likely to be referred for evaluations, but the Modified Checklist for Autism in Toddlers, Revised, with Follow-Up accurately identifies both males and females with autism at established cutoffs. Future research should examine methods to reduce false positive in females.
Lay abstract
This study examined a widely used autism screening tool, the Modified Checklist for Autism in Toddlers, Revised, with Follow-Up to identify differences in screening for autism between toddler males and females. Examining sex differences in screening for autism in toddlerhood is important as it determines who will be referred for evaluations and receive diagnoses, which is critical for access to autism-specific early intervention. This study found that females were less likely to screen positive and be invited for evaluations compared with males. Females at high likelihood for autism were less likely to be diagnosed with autism, which decreases confidence in the screener’s results. Importantly, the Modified Checklist for Autism in Toddlers, Revised, with Follow-Up accurately identified both males and females with autism. Future research should examine ways to improve accuracy in screening results for females.
Although autism can be reliably diagnosed in toddlers (Landa et al., 2013; Pierce et al., 2019), most children are first identified later in childhood (Maenner et al., 2021). Delayed diagnosis causes children to miss the opportunity for autism-specific early intervention, which contributes to poorer outcomes (Anderson et al., 2014; Landa, 2018; Vivanti et al., 2016). Given the positive effects associated with early intervention (Fuller & Kaiser, 2020; Fuller et al., 2020; Zwaigenbaum et al., 2015), there is a pressing need to improve early detection practices to streamline diagnosis and access to services for young autistic children.
The Modified Checklist for Autism in Toddlers, Revised, with Follow-Up (M-CHAT-R/F; Robins et al., 2009) is one of the most widely used autism screening tools. Although estimates of M-CHAT sensitivity vary (range = 22%–100%; Carbone et al., 2020; Chlebowski et al., 2012; Guthrie et al., 2019; Robins et al., 2014; Wieckowski et al., 2023), there is consensus that not all children who screen at high likelihood (HL) for autism are ultimately diagnosed with autism, although most are diagnosed with another developmental disorder (DD; Robins et al., 2014). However, reducing the number of false positive (FP) screens is important to shorten waitlists and facilitate timely access to autism-specific services for both female and male autistic children.
Recent prevalence estimates indicate that 4.2 males are diagnosed for every female with autism (Maenner et al., 2021). Studies examining sex differences in core autism symptoms in young children display equivocal findings. Some toddler autism studies suggest that autistic males demonstrate more restricted and repetitive patterns of behavior (RRBs), and fewer social communication impairments compared with females (Hartley & Sikora, 2009; Lawson et al., 2018; Ros-Demarize et al., 2020; Sipes et al., 2011). In contrast, a study examining retrospective caregiver ratings of preschool behavior in children later diagnosed with autism indicated that females engaged in more complex social imitation than males and that females later diagnosed with autism used mimicking in social situations, whereas male counterparts were more likely to isolate themselves (Hiller et al., 2016). Furthermore, other studies have found minimal to no sex differences in toddlers/preschoolers (Mussey et al., 2017; Reinhardt et al., 2015; Van Wijngaarden-Cremers et al., 2014). The equivocal findings may impede optimizing early detection for both females and males.
Indeed, evidence indicates sex differences in the timing of an autism diagnosis. Specifically, females were diagnosed significantly later than males (i.e. 20.19 months) despite similar age of first caregiver concerns (McDonnell et al., 2020). Notably, this effect was moderated by verbal IQ such that stronger verbal skills were associated with later diagnosis more so for females than males, consistent with findings of a 1.8-year delay in childhood diagnosis of Asperger’s disorder for females compared with males (Begeer et al., 2013). Evidence of sex-based disparity in the timing of an autism diagnosis illustrates the urgent need to improve early detection practices to facilitate access to early interventions for autistic females.
Previous studies have not found significant sex differences in the age of first caregiver concerns, but have documented differences in the types of early caregiver concerns that emerge (Dillon et al., 2021; Hiller et al., 2016; McDonnell et al., 2020; Ramsey et al., 2018). Caregivers of males reported more concerns about RRBs than caregivers of females among toddlers (Ramsey et al., 2018), preschoolers (Hiller et al., 2016), and children and adolescents (Dillon et al., 2021). Caregivers also report greater concern about externalizing behavior in females and internalizing behavior in males during the preschool years (Hiller et al., 2016), and greater concern about social interactions for males diagnosed with autism, compared with females diagnosed with autism and children diagnosed with another developmental disability (Little et al., 2017). These findings of possible sex differences in caregiver concerns may inform improvements to caregiver-report screeners to close the gap between age of first concerns and autism diagnosis for females.
Males tend to score higher (indicative of more autism characteristics) than females on autism screening tools in the general population (Autism Spectrum Quotient (ASQ); Auyeung et al., 2008; Baron-Cohen et al., 2001, 2006; Childhood Autism Spectrum Test (CAST); Williams et al., 2008; Social Responsiveness Scale (SRS); Constantino & Todd, 2003). A similar pattern emerges for toddler-specific screeners in the general population, including the Qualitative-Checklist for Autism in Toddlers (Allison et al., 2008) and the M-CHAT-R/F (Øien et al., 2017). In contrast, among autistic children, no sex differences have emerged in the M-CHAT-R/F total score (Øien et al., 2017; Ros-Demarize et al., 2020) or likelihood of endorsement of most items on the M-CHAT among autistic children, although caregivers endorsed “difficulty with imitation” more often in females compared to males and “difficulty following a point” more often in males than in females (Øien et al., 2017).
Considering findings suggesting possible sex differences in the early clinical presentation of autism, combined with evidence of later autism diagnosis for females, there is an urgent need to systematically explore sex differences in early detection tools for autism. This study explored sex differences in the performance of the M-CHAT-R/F in a large sample of toddlers screened during well-child primary care visits. We predicted that males would be more likely than females to screen at HL of autism on initial and follow-up screening, and that sensitivity and positive predictive value (PPV) of the M-CHAT-R/F would be stronger in males compared to females. Exploratory analyses examined whether sex-specific cutoff scores would improve the utility of the M-CHAT-R/F for females.
Methods
Participants
Participants included 28,088 toddlers (14,331 males, 13,757 females), aged 14.07–30.98 months, who participated in one of four studies examining early detection of autism between 2009 and 2020, across four universities located in Philadelphia, Pennsylvania; Storrs, Connecticut; Atlanta, Georgina; and Sacramento, California. Toddlers classified as HL of autism on the M-CHAT-R/F, another study-specific screener, or whose primary care clinician (PCC) indicated concern for autism were invited for a no-cost evaluation at their university site (n = 1112; 728 males, 384 females); 731 toddlers (65.7%; 488 males, 243 females) attended a diagnostic evaluation (see Table 1). Of those evaluated, 677 (92.6%; 451 males, 226 females) were HL based on initial screen and/or surveillance. Eight children (4 males, 4 females) were missing information about the reason for evaluation.
Demographic characteristics for all screened toddlers and for toddlers who completed an autism evaluation.
Other developmental disorders.
Inclusion and exclusion criteria
For inclusion in parent studies, participants were screened using the English or Spanish M-CHAT-R/F (Robins et al., 2009) during at least one primary care well-child visit prior to 31 months of age. When children had more than one M-CHAT-R/F, only initial screen data were included. Exclusion criteria for this study included (1) missing information on child’s sex (n = 398), (2) three or more items missing on the M-CHAT-R/F, which makes classification of score indeterminate (n = 99), and (3) completed diagnostic evaluation prior to initial M-CHAT-R/F screen (n = 63). Missing data on the M-CHAT-R/F were not recoded. In addition, evaluation data were excluded (but screening data were kept) for 786 participants whose final outcome was undetermined: (1) screen positive on the M-CHAT-R/F but family did not attend evaluation, or (2) evaluation was invalid, due to severe motor or sensory impairment, or low compliance. Thirteen children classified as medium likelihood (ML) who did not receive Follow-Up but attended evaluation were excluded from analyses specifically examining Follow-Up scores, but were included in all other analyses with their initial score counting as their final score.
Description of studies
Participants data were aggregated from four early detection of autism studies that screened toddlers with M-CHAT-R/F during well-child visits, and invited children at HL of autism for no-cost evaluation: (1) Early Detection of Pervasive Developmental Disorders (R01HD039961, 2009-2014) screened toddler during 18- and/or 24-month well-child visits, validated the M-CHAT-R/F, and demonstrated improved utility compared to the original M-CHAT (Robins et al., 2014); (2) Validation of Web-Based Administration of the M-CHAT-R/F (Autism Speaks #8368, 2012-2016) screened using electronic delivery at 18- and/or 24-month visits; Follow-Up was administered in the same electronic session (54%) or by phone with a trained member of the research staff, validating electronic screening (46%; Attar et al., in press); (3) Early Detection of Autism (R01HD039961, 2014-2020) randomized pediatric practices to one of three screening schedules, beginning at 12, 15, or 18 months and encouraged rescreening at 18, 24, and 36 months. Caregivers completed the M-CHAT-R/F electronically or on paper during all visits except 12 months, demonstrating that earlier and repeated screening detects autism (Wieckowski et al., 2021); and (4) Connecting the Dots (R01MH115715, 2017-2022) screened electronically during well-child visits at 18 and 24 months to relate primary care detection to outcomes at age 5 (McClure et al., 2021). See supplemental Tables 1 and 2 for study/site subgroups.
Measures
Screening
Demographic information
Caregivers of screened toddlers reported demographics after enrolling: child’s age, sex, race, ethnicity, and maternal education (as a measure of socioeconomic status).
Modified Checklist for Autism in Toddlers, Revised, with Follow-Up
The M-CHAT-R/F (Robins et al., 2009) is a two-part autism screener validated in 16- to 30-month-old children (Robins et al., 2014). The initial screen is a 20-item caregiver questionnaire assessing communication, joint attention, and pretend play. Toddlers who initially scored a 0–2 were classified as low likelihood (LL). Caregivers of toddlers who scored 3–7 (ML) completed structured Follow-Up, and a final score of 2 or higher was classified as HL of autism and 0–1 as LL. Toddlers who scored an 8 or above on initial M-CHAT-R were classified as HL of autism.
Clinician concern
PCCs were asked to indicate concern for autism at each screening visit.
Diagnostic evaluation measures
Across all studies, the clinical best-estimate diagnosis incorporated data from the Toddler Autism Symptom Interview (TASI; Barton et al., 2012; Coulter et al., 2021) or Autism Diagnostic Interview–Revised (ADI-R; Rutter et al., 2003); Autism Diagnostic Observation Schedule (ADOS), original or 2nd Edition (Lord et al., 2012, 1999); Mullen Scales of Early Learning (Mullen, 1995); Vineland Adaptive Behavior Scales, 2nd or 3rd Edition (Sparrow et al., 2005, 2016); and a Medical, Developmental, and Family History form.
Procedure
Screening for all studies occurred in pediatric practices during well-child visits using electronic and/or paper M-CHAT-R/F. For paper forms, caregivers of children with ML scores were contacted by research staff to complete Follow-Up over the phone. For electronic screeners, Follow-Up questions displayed immediately after completion of the initial screener. Electronic screens were automatically scored, and PCCs were able to view the results. Caregivers were not given screening results directly for any of the studies. In Study 1, all screen-positive participants were eligible for Follow-Up and were reclassified as LL or HL based on the results; the ML threshold was established based on this sample. In Studies 2, 3, and 4, only children in the ML range received the Follow-Up. For the purpose of analyses in this study, Follow-Up data were only included for children who should have received Follow-Up based on the current protocol. Children were re-screened at later ages depending on original study procedures; although only data from first screens were included in this study, final diagnosis of children identified with autism after rescreening was used to identify false negative cases. Screeners were offered in both English and Spanish at the Connecticut, Sacramento, and Philadelphia sites due to capacity for clinical evaluations in Spanish; Atlanta was an English-only site. Caregivers waived documentation of consent at screening and provided written, informed consent at evaluations. Toddlers classified as HL of autism based on M-CHAT-R/F, other study-specific screeners, and/or whose PCC reported an autism concern were invited for no-cost diagnostic evaluations. Average time between HL of autism classification and diagnostic evaluation was 4.40 (SD = 4.43) months. Evaluations occurred at university clinics or pediatric offices; teams included a licensed psychologist, certified school psychologist, or developmental pediatrician and a trainee. Each member of assessment teams was research-reliable on all measures they administered. Clinical best-estimate diagnosis was based on International Classification of Diseases, 10th edition (ICD-10; World Health Organization, 2004) or Diagnostic and Statistical Manual of Mental Disorders, 4th edition, text revision (DSM-IV-TR; American Psychiatric Association [APA], 2000) or 5th edition (DSM-5; APA, 2013) criteria. Diagnosis of Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified, Atypical Autism, Childhood Autism, and Asperger’s Syndrome were grouped into an autism classification. When autism was ruled out, other DDs were considered, including Global Developmental Delay or Language Delay, or the child was determined to have no diagnosis (ND). Caregivers received oral and written feedback about diagnoses, including recommendations and information about local intervention resources. When caregivers declined to complete the M-CHAT-R Follow-Up or evaluation, the PCC was informed. Institutional review boards approved each of the four studies; three studies had approval at each university and Connecting the Dots used a single institutional review board (IRB) on which other universities relied. PCCs and caregiver advocates contributed to study design, implementation, and dissemination of findings.
Statistical analyses
To evaluate difference in the distribution of autism likelihood classifications between males and females, 2 (sex) × 3 (likelihood level) chi-square analyses were run, followed by pairwise comparisons of autism likelihood level through 2 × 2 chi-square analyses, providing the appropriate effect size (V or ø). Similarly, two additional chi-square analyses were run to explore the relationship between sex and likelihood classification of Follow-Up scores and evaluation attendance among those who screened positive. To evaluate the association of autism diagnosis and sex with final scores on the M-CHAT-R/F among those who screened as HL of autism, a two-way analysis of variance (ANOVA) was run, with sex and diagnosis (autism spectrum disorder (ASD), non-ASD) as independent variables. Exploratory 2 (sex) × 2 (item endorsement) chi-square analyses were run to examine potential item-level sex differences among diagnosed children. For the M-CHAT-R/F psychometric properties, sensitivity (i.e. detecting autism when truly present) was calculated by dividing the number of true positive (TP) cases (i.e. positive screen and received autism diagnosis) by the total number of children diagnosed with autism. In the study samples pooled for this analysis, not all children who screened positive received a diagnostic evaluation. The sensitivity value reported in this article is computed based on the subset of screen-positive children who attended the evaluation, and as such, it should be considered an estimate and interpreted with caution. Positive predictive value for autism (PPVautism; likelihood that positive result is a true autism case) was calculated by dividing TP for autism by all screen positives, whereas PPV for any developmental disability (PPVDD; likelihood that positive result indicates autism or another DD) was calculated by dividing TP for autism or DD diagnosis by all screen positives. TP and FP cases were determined by M-CHAT-R/F results, whereas false negative (FN) cases were detected by PCC concern, positive M-CHAT-R/F rescreen, or other positive screener result. Due to the lack of confirmatory evaluations among the whole sample to confirm true negatives (TNs), specificity and negative predictive value were not included. Chi-square analyses were run to compare components of sensitivity (TP to FN), initial and final PPVautism (TP for autism to FP), and final PPVDD (TP for autism or DD to FP) by sex. Due to the strong influence of the Follow-Up interview, PPV was examined for both initial and final scores. For these calculations, children classified as LL (not evaluated) were presumed not to have autism. The two-proportion Z test (Lowry, n.d.) was examined to compare the improvement of PPV from initial to final scores in males and females. Exploratory receiver operating characteristic (ROC) curve analyses were conducted to investigate different cutoff scores at initial and final M-CHAT-R/F screening for males and females.
Results
Autism likelihood classification from M-CHAT-R/F
Initial score classification and item level
The proportion of males and females classified in the low (LL), medium (ML), and high (HL) range on the initial M-CHAT-R significantly differed (χ2(2, 28,088) = 87.28, p < 0.001, V = 0.056; see Table 2). Pairwise comparisons indicated that on initial M-CHAT-R, females were less likely than males to be classified as HL vs LL (odds ratio (OR) = 0.48, 95% confidence interval (CI) = [0.37, 0.61]; χ2(1, 26212) = 37.98, p < 0.001, ø = –0.038), HL vs ML (OR = 0.67, 95% CI = [0.52, 0.87]; χ2(1, 2182) = 9.14, p = 0.003, ø = –0.065), and ML vs LL (OR = 0.71, 95% CI = [0.64, 0.78]; χ2(1, 27782) = 51.69, p < 0.001, ø = –0.043). Notably, in the subsample diagnosed with autism, there was no significant difference between males and females in the proportion of HL vs ML classification on the initial M-CHAT-R (χ2(1, 272) = 0.20, p = 0.658, ø = –0.027) or on any of the exploratory item-level analyses (ps > 0.05). Effect sizes of analyses were small.
Sex distribution in initial and follow-up screening likelihood levels.
A total of 266 cases were excluded for missing data in Follow-Up from the medium-likelihood group.
Final score among those who completed follow-up
Among children in the ML category who completed Follow-Up, a significant relationship between sex and Follow-Up outcome was found (OR = 0.72, 95% CI = [0.59, 0.88]; χ2(1, 1610) = 10.23, p = 0.001, ø = –0.08); females were less likely than males to be classified as HL of autism. Two-way ANOVA examined the association of autism diagnosis and sex among those who screened at HL (final) of autism. Final scores significantly differed (F(1, 727) = 30.56, p < 0.001,
Psychometric properties, autism rates, and evaluation attendance
Overall estimated sensitivity of M-CHAT-R/F was strong, but there was no relationship between sensitivity and sex (χ2(1, 319) = 0.11, p = 0.743, ø = –0.018; see Table 3). On both the initial M-CHAT-R (TPmale = 22.5%, TPfemale = 12.8%; FPmale = 77.5%, FPfemale = 87.2%; OR = 0.50, 95% CI = [0.38, 0.67]; χ2(1, 1468) = 22.19, p < 0.001, ø = –0.123) and final M-CHAT-R/F (TPmale = 51.2%, TPfemale = 37.4%; FPmale = 48.8%, FPfemale = 62.6%; OR = 0.57, 95% CI = [0.40, 0.82], χ2(1, 555) = 9.42, p = 0.002, ø = –0.13) scores, females were less likely than males to have a TP versus an FP score. Notably, PPV significantly improved by a factor of 2.92 times for females (z = 7.46, p < 0.0001), 2.28 times for males (z = 10.01, p < 0.0001), and 2.51 times overall (z = 12.80, p < 0.0001) after Follow-Up compared to after initial score. When the final M-CHAT-R/F was examined in a combined autism and DD diagnosis group, females were less likely than males to be TP versus an FP (TPmale = 89.5%, TPfemale = 79.7%; FPmale = 10.5%, FPfemale = 20.3%; OR = 0.46, 95% CI = [0.28, 0.75]; χ2(1, 555) = 12.13, p < 0.001, ø = –0.135).
Frequency of classification and M-CHAT-R/F psychometric properties for males and females.
M-CHAT-R/F: Modified Checklist for Autism in Toddlers, Revised, with Follow-Up; TP: true positive; FP: false positive; FN: false negative; TN: true negative.
Autism rates were examined by combining TP and FN cases, compared to the total sample. The rate of autism was 1.63% for males and 0.62% for females, with approximately 2.8 males receiving an autism diagnosis for every female. Among children classified as HL, there was no significant association between males and females on evaluation attendance (χ2(1, 1112) = 1.57, p = 0.21, ø = 0.038). Across all analyses, effect sizes were small.
Exploratory ROC analysis
Due to the high rates of FP for females in our sample, an exploratory receiver operating curve (ROC) analysis was conducted to investigate whether optimal cutoff scores at initial and final M-CHAT-R/F screening differ for males and females. Area under the curve (AUC) metrics showed strong performance across cutoff scores at both initial and final screening (see Table 4). The sample showed optimal performance for both sensitivity and specificity at a score of ⩾2.5 at initial screen and at a score of ⩾1.5 at final screen in the overall sample as well as separately for both males and females (see Table 5). Our findings are consistent with the M-CHAT-R/F authors’ recommended cutoff scores of 3 at initial screen and 2 at final screen for optimal utility for both males and females. Although the ROC analyses conducted above included the original M-CHAT-R/F validation sample, excluding the sample resulted in consistent findings, so the larger sample was maintained to maximize power.
Psychometric properties of different cutoff scores for males and females.
Bold represents the values for the optimal cutoff score.
ROC curve analyses of optimal cutoff score by sex.
ROC: receiver operating characteristic; AUC: area under the curve; SE: standard error; CI: confidence interval.
Discussion
The goal of this study was to examine sex differences in toddler screening for autism using the M-CHAT-R/F in unselected community samples. Overall, the M-CHAT-R/F demonstrates good psychometric properties across sex; however, sex differences emerged.
Psychometric properties of the M-CHAT-R/F by sex
As expected, females were less likely than males to be classified as HL of autism at all stages of the screening process using the M-CHAT-R/F; however, the accuracy of an HL screen result in predicting an autism diagnosis differed based on sex. This is reflected in lower PPVs (i.e. a higher proportion of FP to TP cases) for females compared to males. Positively, the addition of the Follow-Up significantly improved the PPV for both sexes (i.e. over two-fold for males and approximately three-fold for females). As such, findings support the utility of the two-stage M-CHAT-R/F, for both females and males.
Despite a higher number of FP screens in females compared to males, it is encouraging that there were no sex differences in the estimated sensitivity of the tool. This result provides support that the M-CHAT-R/F readily identifies both males and females who have autism who are detectable during toddlerhood, with the caveat that not all children received confirmatory evaluations; children with LL of autism based on multiple screens and clinician surveillance were not evaluated and were presumed to be TN. However, we also interpret estimates of sensitivity among toddlers with caution, understanding that not all children with autism will be detected at this young age (Robins, 2020). The lower rate of HL classification among females may be due to lower prevalence of autism in females. This finding was strengthened by the fact that sex differences did not emerge across HL or ML levels on initial total scores among those who were ultimately diagnosed with autism. In addition, exploratory ROC analysis supported recommended cutoff scores for the M-CHAT-R/F for both males and females. Similar to previous research examining sex differences in other autism-detecting measures, our exploratory ROC analysis supports the current recommended risk cutoff scores for the M-CHAT-R/F for both males and females (Kaat et al., 2021; Kalb et al., 2022). In developing neurodevelopmental screening measures, it is important to optimize sensitivity considering the negative consequences of not identifying a child with the disorder. Delayed diagnosis for an autistic child can impede access to autism-specific interventions at a critical time during brain development. Thus, despite a higher number of FP screens for females compared to males, strong sensitivity across sex should be recognized as evidence that the M-CHAT-R/F readily identifies both autistic males and females.
In this sample, sex differences remained when considering risk for all developmental disabilities detected by M-CHAT-R/F. PPV of any DD was significantly higher for males compared to females. Nonetheless, clinicians can be assured that most (i.e. 79.7%) females and males (89.5%) classified as HL of autism on the M-CHAT-R/F are diagnosed with autism or another DD.
It is unclear why there may be a higher number of FP screens on the M-CHAT-R/F for females compared to males. In general, PPV of a test increases as base rate increases, and vice versa. As such, lower PPV for females compared to males may be due to the higher prevalence of autism (more TP cases proportional to FP cases) in males compared to females, although we found fewer positive results overall (both TP and FP) for females compared to males. As there was no difference in estimated sensitivity of the M-CHAT-R/F between males and females, it is possible that there are fewer autistic females but that the M-CHAT-R/F is still successfully catching those at HL. Similarly, there is a higher prevalence of other neurodevelopmental disorders for males compared to females. For example, the male to female ratio is 2:1 for intellectual disability (Ropers, 2008), 1.5:1 for language delay (Shriberg et al., 1999), and 3:1 for attention deficit hyperactivity disorder (Willcutt, 2012). As such, differential sex prevalence may also explain lower PPV for females compared to males when considering risk for neurodevelopmental conditions more broadly. Furthermore, it is crucial to keep in mind that the M-CHAT-R/F as well as all other autism screening and diagnostic tools were developed based primarily on the male autism phenotype given higher prevalence of autism in males compared to females. As a result, there is currently debate within the field about whether current diagnostic tools are less sensitive to the female phenotype compared to the male phenotype (D’Mello et al., 2022). It is therefore possible that the greater FP rate may be an artifact of missing females who truly are autistic. Reliable and valid biomarkers demonstrating equivalence of underlying autism independent of behavioral diagnostic systems will allow examination of whether M-CHAT-R/F and other tools must be adjusted to increase accuracy in females.
The male-to-female autism prevalence ratio was 2.8 in this sample, which is somewhat lower than the current estimate of 4.2 in 8-year-old children but is more similar to the 3.4 ratio in 4-year-old children derived by Centers for Disease Control and Prevention (CDC) autism surveillance (Maenner et al., 2021; Shaw et al., 2021). It is important to note that children identified with autism early in life may be more cognitively impacted than those identified later (Christensen et al., 2019) and autistic females are more likely to be diagnosed with co-occurring intellectual disability (Fombonne, 2003; Loomes et al., 2017). As such, it is likely that our sample of children includes more children who will eventually be diagnosed with intellectual disability than the general population of autistic individuals across the life span.
M-CHAT-R/F total score by sex
Differences in final scores were also examined among children classified as HL of autism in order to explore sex differences in continuous M-CHAT-R score. It is important to note that the M-CHAT-R was not initially developed to be used in this manner; however, total score has been used across many autism screening and diagnostic tools as a continuous measure of autism characteristics, and there was a significant difference between scores of children with and without autism highlighting the M-CHAT-R/F’s construct validity as an autism screener. Consistent with previous research, there were no sex differences in total score among autistic children; however, contrary to Øien and colleagues (2017), there were also no sex differences in total score among those without autism. Methodological differences between these studies may have contributed to differences. Specifically, Øien and colleagues used the original M-CHAT and did not conduct the Follow-Up. In this study, the M-CHAT-R/F was used and final scores after Follow-Up were examined; adding this critical second stage of screening may effectively reduce potential sex differences. In addition, Øien and colleagues examined scores for all children, not only those deemed at HL for autism. In this study, we examined total scores only for children deemed at HL of autism because the variance in scores for children not at HL of autism was much lower than for children classified as HL of autism due to the vast majority of children scoring 0–1, which violates statistical assumptions of ANOVA. The lack of sex differences among children classified as HL of autism based on the M-CHAT-R/F in this sample is consistent with recent literature examining total scores in a group of children referred for autism evaluation due to clinical concerns about autism (Ros-Demarize et al., 2020). Taken together, these data support that the total score does not differ across males and females among children at HL of autism.
Limitations and future directions
The findings from this study must be interpreted within the context of several limitations. Research has demonstrated more consistent autism sex differences later in development, and this cross-sectional design does not follow children as they age. In addition, it is likely that some children who do not meet clinically significant criteria in toddlerhood will do so later in development (Ozonoff et al., 2018); evaluation later in childhood is expected to lead to identification of more autistic children, consistent with recent record review studies after children are 4 years or older (Carbone et al., 2020; Guthrie et al., 2019). It is also possible that if presentation of autism differs in females versus males, the M-CHAT-R/F, a tool primarily validated by detecting autism in toddler boys, may be mistuned and missing girls who would otherwise be referred for evaluations.
The sensitivity of our sample should be interpreted as an estimation rather than a true value, as FNs were identified as missed cases through concurrent clinician surveillance, later positive toddler or preschool M-CHAT-R/F rescreen, or other positive toddler screen, rather than confirming TN classification in every LL child. While we acknowledge this limitation, we also note that CDC estimates of autism prevalence show a decrease from 4-year-olds compared to 8-year-olds (Shaw et al., 2021). Therefore, it is possible the decreased rate of autism detected in this study is due to the even younger sample of primarily 18- to 24-month-olds, who may show lower prevalence estimates than are reported in older children. Similarly, it is important to note that not all children who were identified as HL for autism were evaluated. Evaluation non-attendance did not significantly differ by sex and was not due to lack of evaluation availability, but due to parents declining to attend.
While exploratory examination of M-CHAT-R/F item-level sex differences based on chi-square analyses did not emerge, future research should use strategies such as differential item functioning or sex-specific algorithms or item weighting that may maximize psychometric properties based on sex, to consider M-CHAT-R/F equivalence or lack thereof across males and females. Future research also should explore clinician and caregiver perceptions of the M-CHAT-R/F for boys and girls and whether the higher FP rate in females impacts referral practices.
In order to obtain a large sample of females with autism, four studies, including the sample used to validate the M-CHAT-R/F, were aggregated for this study, and study-specific differences in FN detection may impact results. Finally, small effect sizes should be interpreted with caution when contextualizing the results given the relatively large sample size.
Conclusion
Our study found that autistic males and females screened positive on M-CHAT-R/F at similar rates, highlighting similar sensitivity across males and females as a strength of the tool. We found no indication that the detection of autism in females would benefit from different risk score thresholds than the current guidelines recommended for the M-CHAT-R/F. Even in early childhood when there is less expected differentiation of autism symptomology, however, females are still less likely to be classified as HL of autism and referred for evaluations compared with males. Our results highlight a need to improve the screening process in females, as even when females are referred they are more likely to be FP for autism than TP, although they were still likely to have other DDs and benefit from detection. The higher FP rate in females compared to males may influence clinician confidence in referring families at this crucial earlier age. Future research should focus on examining methods to decrease FPs in the screening process for females.
Supplemental Material
sj-docx-1-aut-10.1177_13623613231154728 – Supplemental material for Sex differences in early autism screening using the Modified Checklist for Autism in Toddlers, Revised, with Follow-Up (M-CHAT-R/F)
Supplemental material, sj-docx-1-aut-10.1177_13623613231154728 for Sex differences in early autism screening using the Modified Checklist for Autism in Toddlers, Revised, with Follow-Up (M-CHAT-R/F) by Sherief Y Eldeeb, Natasha N Ludwig, Andrea Trubanova Wieckowski, Mary FS Dieckhaus, Yasemin Algur, Victoria Ryan, Sarah Dufek, Aubyn Stahmer and Diana L Robins in Autism
Footnotes
Acknowledgements
We thank the healthcare clinicians, toddlers, and their families for participating in this study as well as the many individuals involved in data collection. In addition, we would like to thank the late Lauren Adamson, PhD, for her contributions to this work.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: D.L.R. is a co-owner of M-CHAT LLC, which receives royalties from parties that license use of the M-CHAT in electronic products. No royalties were received for any of the data presented in this study. D.L.R. sits on the advisory board of Quadrant Biosciences Inc, for which she receives an honorarium. The other authors have indicated they have no financial relationships or conflicts of interest relevant to this article to disclose.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development Grant R01HD039961, Autism Speaks Grant 8368, NICHD Grant P50HD103526, and the National Institute of Mental Health, R01MH115715.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
