Abstract
BACKGROUND:
Parent-completed tools like the Ages and Stages Questionnaire Third Edition (ASQ-3) are important in developmental screening. As a screening tool, a high negative predictive value (NPV) is critical to avoid missing the diagnosis of developmental delay. This study evaluated the NPV and accuracy of the ASQ-3 in assessing the development of preterm infants.
METHODS:
Infants born at <32 weeks and/or <1250 grams, presenting to the Neonatal Neurodevelopmental Clinic at the Singapore General Hospital for follow-up from January 2014 to June 2017, at 6, 12, and 18 months corrected age, were included. The ASQ-3 and standardized tests – Peabody Developmental Motor Scale-Second Edition (PDMS-2) and Preschool Language Scale, Fourth Edition UK (PLS-4 UK) – were administered. ASQ-3 gross motor and fine motor scores were compared to PDMS-2 at 6 and 12 months, and ASQ-3 communication scores to PLS-4 UK at 18 months.
RESULTS:
At 6 months (n = 145), NPV for gross motor and fine motor were 96.4% (accuracy 80.0%) and 95.4% (accuracy 77.2%) respectively. At 12 months (n = 127), NPV for gross motor and fine motor were 88.9% (accuracy 79.8%) and 82.8% (accuracy 74.0%) respectively. At 18 months (n = 113), NPV for language was 56.9% (accuracy 63.7%).
CONCLUSIONS:
The ASQ-3 showed high NPV and accuracy in screening gross motor and fine motor skills at 6 and 12 months, but not in screening language skills at 18 months. Judicious use of the ASQ-3 may allow for more effective utilization of resources.
Introduction
Preterm infants are at higher risk of developmental delay, with preterm birth and low birth weight being associated with problems like cerebral palsy and language delay [1]. Compared with term infants, the risk of cerebral palsy and developmental delay was approximately eight times and two times more likely respectively in children born 30–33 weeks’ gestation, and three times and 1.3 times more likely in late preterm infants [2]. Studies have shown that early experience can modify the anatomy of the rapidly developing brain, thus early identification and implementation of interventions are of paramount importance in improving long-term outcomes [3–6]. Early intervention programs aimed at preterm infants have been shown to improve cognitive and motor outcomes [7, 8].
There are no standard protocols or guidelines on the best tools to use in the assessment of developmental profile of high-risk preterm infants. In Singapore, significant resources are channeled into the follow-up of preterm infants, with programs in various hospitals utilizing different schedules and standardized tests. Most of these standardized tests are administered by trained personnel in the hospital setting, hence are both time and resource intensive. For example, the Peabody Developmental Motor Scale-Second Edition (PDMS-2) is an early childhood motor development program that provides in-depth assessment and training or remediation of gross and fine motor skills [9]. It is typically administered by trained personnel such as physiotherapists and takes around 45–60 minutes to complete. Another example is the Preschool Language Scale, Fourth Edition UK (PLS-4 UK), an instrument which measures young children’s receptive and expressive language skills and is usually administered by trained speech and language therapists [10]. It takes around 20–45 minutes to complete.
Several standardized parent-completed questionnaires are also available for developmental screening, including the Ages and Stages Questionnaire Third Edition (ASQ-3), Parents’ Evaluation of Developmental Status (PEDS), and CSBS DP™ Infant-Toddler Checklist [11–13]. These typically include checklists or questions to be completed by caregivers and are aimed at helping parents evaluate their child’s development and milestones. In particular, the ASQ-3 is a parent-reported initial-level developmental screening instrument consisting of 21 age intervals from 1–66 months, each with 30 items in five areas: personal social, gross motor, fine motor, problem solving, and communication. It has been widely used due to its cost-effectiveness, availability in multiple languages and vast evidence of its reliability in screening the developmental profile in both term and preterm infants and children. Its psychometric properties include test-retest reliability of 92%, sensitivity of 87.4%, and specificity of 95.7%, with its validity examined across different cultures and communities across the world [14–17].
Studies on both low-risk and preterm infants have previously shown correlation between the ASQ and other developmental instruments such as the Battelle Developmental Inventory, the Bayley Scales of Infant Development II, and the revised Brunet Lezine psychometric test, thus positing parent-completed screening questionnaires as a promising alternative to professional screening [18–21]. In Singapore however, data on the agreement between the findings of the ASQ-3 questionnaire and that of standardized tests is limited. Agarwal et al. reported on the use of ASQ-3 in a smaller cohort of preterm babies and evaluated the optimal referral ASQ-3 cut-off scores that differentiate low-risk from high-risk infants using the total ASQ-3 scores [22]. However, this study focused on total ASQ-3 scores as opposed to individual domain-specific scores.
Currently, preterm infants are subjected to diagnostic evaluation using standardized tests, necessitating a significant amount of time and resources which could be better channeled toward those truly in need of intervention. It would be prudent to streamline a diagnostic process for identifying neurodevelopmental delay, where preliminary screening with a parent-completed questionnaire is first carried out to identify those at risk of neurodevelopmental delay, before more specialized diagnostic evaluation using standardized tests administered by trained professionals is performed on those identified as at risk of delay.
Basic measurements such as positive and negative predictive values may be used to quantify the accuracy of the proposed screening test at predicting true disease status. Hence, the positive predictive value (PPV) of the ASQ-3 questionnaire is the probability that a child with a positive screening test result indeed is delayed, and negative predictive value (NPV) is the probability that a child with a negative screening test result is indeed not delayed [23]. As a screening tool, it is of greater importance that the parent-completed ASQ-3 questionnaire achieves a high NPV. This is critical to avoid missing the diagnosis of developmental delay, thus minimizing unnecessary delay in early intervention. Conversely, the consequences of low PPV are the increased and inefficient use of resources leading to over-testing and resulting anxiety for parents whose children are picked up as false positives.
Given its potential for early detection of developmental delay, an assessment of ASQ-3 NPV and accuracy relative to standardized tests as gold standards is warranted, with the aim of enabling domain-specific early intervention. Our primary aim was to evaluate the NPV and accuracy of the ASQ-3 compared with two standardized tests, the PDMS-2 and the PLS-4 UK, done at the corrected ages of 6, 12, and 18 months. Our secondary aims included the determination of the discrimination ability of the ASQ-3 for suspected developmental delay and ASQ-3 cut-off scores for suspected developmental delay in our population using these two standardized tests as a reference.
Materials and methods
Study population
This observational study included infants born at <32 weeks and/or <1250 grams followed up at the Neonatal Neurodevelopmental Clinic at Singapore General Hospital at 6, 12, and 18 months corrected age (±1 month) from January 2014 to June 2017. There were no exclusion criteria. Parents completed a form detailing personal, socio-economic, demographic and medical information. Ethics approval and consent waiver were obtained from the hospital’s institutional review board (CIRB Ref 2016/2771).
Outcome measures
During follow-up, parents and/or caregivers were asked to complete the ASQ-3, with guidance from the clinic assistant where required. Parents and/ or caregivers answered six questions in each of the five developmental domains: gross motor, fine motor, communication, problem solving, and personal social. Following the validated ASQ-3, the scores in the gross motor, fine motor, and communication domains were computed and the developmental outcome summarized as: ‘Development on Schedule’, ‘Monitor’, or ‘Further Assessment Needed’. For the purpose of comparison with the standardized tests, the developmental outcomes on the ASQ-3 were stratified into two categories: ‘Low Risk’, and ‘High Risk’. ‘Development on Schedule’ was classified as ‘Low Risk’, while ‘Monitor’ and ‘Further Assessment Needed’ were classified as ‘High Risk’.
Standardized tests
The PDMS-2 was administered by one of two physiotherapists at 6 and 12 months corrected age. The PLS-4 UK was administered by one of two speech and language therapists at 18 months corrected age. Parents and/ or caregivers were reminded not to prompt the infant throughout the assessment. For the purposes of risk stratification and comparison with the ASQ-3, the outcomes on PDMS-2 and PLS-4 UK were stratified into two categories: ‘Age-appropriate’ and ‘Delayed’, with scores average and above considered ‘Age-appropriate’, and scores below average considered ‘Delayed’. Scores below average on the PDMS-2 and PLS-4 UK were ≥1 SD below the mean. Hence, all infants who were considered delayed had scores at least ≥1 SD below the mean. The gross motor and fine motor domains on the ASQ-3 were compared to their corresponding domains on the PDMS-2, and the communication domain on the ASQ-3 was compared to the PLS-4 UK which assessed language skills. (For the purposes of this study, the terms ‘communication’ and ‘language’ are used interchangeably to refer to the language domain of development.)
Statistical analysis
Follow-up included all infants who met the study inclusion criteria. However, only infants who completed the follow-up assessments were analyzed. These assessments were done at the corrected ages of 6, 12, and 18 months. Demographics, birth weight and gestational age of infants were summarized using count (percentage) for categorical variables and mean±SD for continuous variables. McNemar’s test was used to compare proportions of infants classified as ‘High Risk’ and to obtain point estimates and Wald 95% confidence intervals on the differences.
Accuracy, sensitivity, specificity, PPV, and NPV were calculated for gross motor, fine motor, and language domains based on local cut-off scores obtained by Youden’s rule and validated ASQ-3 cut-off scores. Cohen’s Kappa adjusted for chance agreement. Results were presented as histograms showing false positive (FP), true negative (TN), true positive (TP), and false negative (FN) outcomes. Receiver operating characteristic (ROC) analysis and logistic regression were used to assess overall ASQ-3 accuracy as a predictor of ‘High Risk’ for developmental delay based on a below-average cut-point for each standardized test score as the gold standard. All analyses were performed using SAS University Edition, Version 9.4 of the SAS System Copyright © 2016 SAS Institute Inc., Cary, NC, USA.
ASQ-3 cut-off scores for our study population
Youden’s index defined as ‘sensitivity+specificity –1’ summarizes the performance of a diagnostic test with values ranging from 0 to 1. A value of 0 means the test is useless, i.e. no better than a guess; a value of 1 indicates a perfect test, i.e. no false positives or negatives. In conjunction with ROC analysis, the maximum value is used as a criterion for selecting the optimum cut-off point for a diagnostic test as it estimates the probability of an informed decision. The cut-off score for each ASQ-3 domain was calculated using Youden’s index.
Results
Follow-up rates
At 6 months corrected age, 162 infants were available for follow-up, of whom 89.5% (145) were assessed (Fig. 1). At 12 months corrected age, 85.2% (127) of 149 available infants were assessed. At 18 months corrected age, 80.7% (113) of 140 available infants were assessed.

Study population flow chart of assessment at 6, 12, and 18 months corrected age.
The perinatal characteristics of infants included in the study were compared to those of infants lost to follow-up. There were no significant differences between the two groups for gender and birth weight across all ages, and for gestational age and race at 6 months. Infants lost to follow-up were significantly more mature compared with infants who were assessed at 12 and 18 months, with gestational ages of 29.6±2.4 vs 28.0±2.4 weeks (p = 0.005) and 29.4±2.4 vs 27.8±2.3 weeks (p = 0.002) respectively. Majority of the infants lost to follow-up at 12 and 18 months were of Malay ethnicity (40.9% and 51.9%, respectively). This differed significantly from the group assessed at 12 and 18 months where Chinese ethnicity constituted the larger majority of 59.1% (p = 0.02) and 61.9% (p < 0.001) respectively.
Demographic data is shown in Table 1, with no significant difference between the three populations. The proportions of parents who completed the ASQ-3 who were also the primary caregivers were 51.0%, 52.8% and 46.9% at 6, 12, and 18 months respectively.
Infant demographics at 6, 12, and 18 months corrected age
Infant demographics at 6, 12, and 18 months corrected age
Numbers represent n (%);† Mean±SD.‡ The remaining parents did not disclose their education level.
In terms of racial demographics, all three cohorts surveyed at 6, 12, and 18 months were made up of majority Chinese, followed by Malay and Indian, and lastly other races. This racial make-up mirrors that of the larger Singaporean population, with 2019 population data showing Chinese to comprise the largest racial group, followed by Malay, Indian, and others [24].
At 6 months, the ASQ-3 showed simple accuracy/Cohen’s Kappa of 0.80/0.29 for the gross motor domain and 0.77/0.22 for the fine motor domain, indicating that 80% of the time, the results on the ASQ-3 were in agreement with that of the PDMS-2 (Table 2). At 12 months, accuracy/Kappa was 0.80/0.47 for the gross motor domain and 0.74/0.25 for the fine motor domain. At 18 months, accuracy/Kappa was 0.64/0.32 for the language domain.
Comparison of infants classified as high risk based on the ASQ-3 and standardized tests
Comparison of infants classified as high risk based on the ASQ-3 and standardized tests
†Standardized tests: Gross motor and fine motor, PDMS-2; Language, PLS-4 UK. ‡ Kappa agreement26; <0.20: poor; 0.20–0.40: fair; 0.40–0.60: moderate; 0.60–0.80: good; 0.80–1.00: very good.
ROC curves for ASQ-3 scores as predictors of risk of developmental delay are shown in Fig. 2. Area under the ROC curve (AUC) for ASQ-3 gross motor and fine motor scores at 6 months were 0.87 and 0.80 respectively, and at 12 months were 0.84 and 0.65 respectively. The ASQ-3 language score at 18 months was 0.69.

ROC curves for ASQ-3 gross motor, fine motor, and language scores as predictors of developmental delay risk.
The NPV of the ASQ-3 showed a similar trend (Table 3). NPV was highest at 6 months for both gross motor and fine motor domains at 96.4 and 95.4 respectively, and remained high at 12 months for both gross motor and fine motor domains at 88.9 and 82.8 respectively. However, at 18 months, NPV for the language domain decreased to 56.9. Psychometric properties and local cut-off scores derived from this study are shown in Table 3.
Comparison of psychometric properties of ASQ-3 validated cut-off scores with local cut-off scores for gross motor, fine motor, and language domains at 6, 12, and 18 months corrected age
†ASQ-3 validated cut-off scores of 1 SD from the mean; ‡ Local cut-off scores based on Youden’s index; SENS: sensitivity; SPEC: specificity; PPV: positive predictive value; NPV: negative predictive value.
Corresponding histograms illustrating the discrimination capability of ASQ-3 gross motor, fine motor, and communication scores are presented in Fig. 3 Local cut-off scores based on our population were derived for each ASQ-3 domain at the different age groups, together with the corresponding rates of FP, TN, TP, and FN results.

Discrimination capability of the validated ASQ-3 scores for gross motor, fine motor, and language domains by assessment age at 6, 12, and 18 months corrected age.
The ASQ-3 is reliable in assessing motor development
The psychometric properties of gross motor and fine motor domains showed good accuracy when compared with the PDMS-2 at 6 and 12 months corrected age, implying that these domains discriminate infants with age-appropriate motor skills from infants with motor delay with an accuracy of close or equal to 80%. AUC for 6-month gross motor and fine motor domains and 12-month gross motor domain were ≥0.80, indicating good capability as a classifier of low- and high-risk neurodevelopmental outcomes. The 12-month fine motor score appears to be a less reliable predictor of developmental delay as reflected by the AUC, but with relatively good accuracy and NPV, remains clinically useful for screening. Our findings compare favorably with previous reports of the high NPV of the ASQ in screening motor development in ex-preterm infants [25].
Based on simple accuracy alone, agreement for the ASQ-3 gross motor and fine motor domains at 6 and 12 months would be considered good. However, after adjusting for chance agreement, Cohen’s Kappa reflected only fair to moderate agreement [26]. Studies have shown that Kappa tends to underestimate agreement in situations with low prevalence and is an overly conservative measure of agreement [27, 28]. This limitation is apparent in our population due to the low prevalence of high-risk infants. More follow-up cases will be needed to better confirm the value of these methods.
The ASQ-3 is limited in its ability to assess language
At 18 months, the ASQ-3 was poor in identifying infants at risk of language delay when compared with the PLS-4 UK, with an accuracy of 63.7%. Our findings of limited accuracy in the ASQ-3 domain of communication concur with a previous study in which Simard et al questioned the ASQ’s capacity to detect developmental delay when individual domain scores were analyzed instead of total scores [29]. From our study, a reasonable conclusion would be that the ASQ has a greater capacity to detect motor delay as compared to language delay.
Using the ASQ-3 as a screening tool
As a screening tool, it is of greater importance that the ASQ-3 achieves a high NPV. This is critical to avoid missing the diagnosis of developmental delay, minimizing unnecessary delay in early intervention. In our study, NPV was high for both gross motor and fine motor domains at 6 and 12 months, suggesting that the ASQ-3 remains reliable in screening motor domains. However, NPV was low for the language domain at 18 months. A possible explanation for this finding could be that gross and fine motor movements are easily noticed, whereas development in receptive and expressive skills demands targeted observation, and effort to elicit language skills may be required to ensure optimal accuracy. Caregivers may be better at detecting motor developmental milestones compared to an infant’s communication skills.
Studies on ASQ use in developmental screening have shown conflicting results. Skellern et al. reported high NPV for the ASQ in the follow-up of ex-preterm infants compared with the 12- and 24-month Griffiths Mental Developmental Scales, 18-month Bayley Mental Developmental Intelligence Scale, and 48-month McCarthy General Cognitive Intelligence Scale, supporting its use as a screening tool for cognitive and motor delay [25]. In contrast, Lindsay reported that the ASQ is able to detect only those with severe developmental delay, rendering it of little value in screening [30].
Who is the main caregiver?
ASQ-3 accuracy depends on the respondent’s understanding of the child’s development, an area of knowledge largely dependent on time spent with the child [31]. In our cohort, only 46.9–52.8% of respondents at the various assessment ages were parents who were also the infant’s primary caregivers. This could mean that the person completing the ASQ-3 might not have the best understanding of the infant’s development. Using the ASQ-3 in other settings such as the home may allow more time for reflection and assessment by the infant’s primary caregiver, thereby improving the accuracy of input.
Our study also showed that the follow-up rate and psychometric properties were higher at 6 months compared to that at 12 and 18 months, suggesting that parents could be more concerned in the initial months post-discharge. The decreasing trend in psychometric properties with increasing age may also reflect difficulty in comprehending ASQ-3 questions with increasing complexity as development progresses. Perhaps this highlights the need to be proactive in educating caregivers on developmental milestones expected at various ages. Resources such as educational activities should be shared with caregivers in a progressive, timely manner.
The ASQ-3 in the Singapore population
Population-specific ASQ-3 cut-off scores for gross motor and fine motor domains at 6 and 12 months approximated that of validated ASQ-3 scores with a difference ranging from 1 to 7 points. However, population-specific ASQ-3 cut-off scores for the language domain at 18 months showed a 7-point difference. Except for fine motor skills at 6 months, the cut-off for our population-specific ASQ-3 is consistently higher compared with the validated ASQ-3 across domains assessed at 6, 12 (+1 to +4 points for motor skills), and 18 months (+7 points for communication domain). However, the population-specific ASQ-3 cut off score was lower by 7 points for fine motor skills at 6 months compared with that of the validated ASQ-3.
Population size and the main language used at home are major differences when comparing the population in which the ASQ was first validated with our local population. Hence, our population-specific ASQ-3 scores in the communication domain may not be reliable as standardized reference values for developmental screening in high-risk preterm infants. The reliability and validity of the ASQ-3 should be further explored with a larger population size. Furthermore, responders to the questionnaire may not be the primary caregiver. Cultural differences prevail, with other adult figures like grandparents, nannies, and live-in domestic helpers playing primary or supporting roles in different societies and families. Fewer opportunities for skill acquisition in particular fine motor skills in early infancy may in part account for the lower scores in our study cohort.
With our findings of good accuracy and discrimination in the gross and fine motor domains of the ASQ-3, we recommend using it in screening motor development in high-risk preterm infants. Its high NPV as early as 6 months may be used to reassure parents and clinicians of normal motor development. On the other hand, infants failing to meet the reference cut-off score should be recommended for further structured assessment. Optimal and judicious ASQ-3 use would lead to early referral for assessment and intervention while reducing costs of follow-up programs. This will limit the number of high-risk preterm infants requiring specialized motor assessments, optimizing resources of time, manpower and cost.
Our study adds to the current field of literature as it focuses on ASQ-3 properties other than merely its validity. NPV, accuracy, and discrimination ability are crucial to complete the evaluation of the ASQ-3 as a screening tool and may possess higher predictive value, thus better guiding clinical decisions [32]. Our population included infants of multi-ethnicity, further contributing to the cross-cultural adaptation recommended in using the ASQ-3 as a developmental screener [33]. Our study population is representative of the larger preterm cohort in terms of gender and birth weight. However, we did find significant differences between those included in the study and those lost to follow-up with respect to race and gestational age at 12 and 18 months. A larger proportion of the infants lost to follow-up were of other races and may have moved back to their home countries after birth. In addition, variations in cultural practices amongst different ethnic groups may in part explain the differences in compliance to recommended preventive healthcare follow-up. Also, at 12 and 18 months, the infants lost to follow-up tended to have a longer mean gestational age compared to those included in the study, suggesting that parents with more mature preterm infants may be less likely to have concern over the neurodevelopmental growth of their infants.
Study limitations
In using the ASQ-3 as a developmental screener, cultural and language differences may have resulted in variation in parental expectation and the resultant developmental progression. Moreover, the relatively small study population limits the reliability of population-specific ASQ-3 cut-off scores, particularly in the communication domain. It is also important for practitioners involved in the care of these infants to recognize that most standardized tests currently used as gold standards for assessing development have not been normed for our local population. Lastly, tests were not performed in our study to determine the extent of inter-rater variability in the completion of the ASQ-3 and standardized tests.
Conclusion
The ASQ-3 has the potential to assess motor development accurately and could thus be used as a primary mode of assessment in preterm infants. It may optimally serve as the first-line screening tool to risk stratify infants who can then be appropriately referred for specialized assessment and intervention as required. However, in its current way of assessment, the ASQ-3 is unable to accurately assess the language domain and may have to be restructured to improve its accuracy. Optimal ASQ-3 use could lead to early referral for assessment and intervention while reducing the costs of follow-up programs. Moreover, it creates room for parent education to occupy a greater role in the screening of childhood development.
Conflicts of interest
The authors declare that they have no relevant financial interests in this manuscript. Ethics approval and consent waiver were obtained from the hospital’s institutional review board (CIRB Ref 2016/2771). This project was supported by a grant from the AM-ETHOS Duke Medical Student Fellowship Award.
Footnotes
Acknowledgments
Many thanks to Dr Joanne Ngeow, Dr Deirdre Anne De Silva, Ms Susan Phung, Ms Juriffah Abdul Talib and Ms Sharon Yam for their contributions. Deepest appreciation to the assessors, children and families of the Neonatal Neurodevelopmental Clinic.
