Abstract
Autism spectrum disorder is a neuro-developmental disorder that affects communication and social skills in individuals. Screening and diagnosis of autism using conventional methods, such as interviews with parents or caregivers and observational assessments takes a long time. The accurate diagnosis of autism by physicians and healthcare professionals seems to be challenging. By analyzing data on autistic children, medical professionals can learn about autism screening assessment decision making. The present study aims to develop a parental autism screening tool termed the Indian Autism Grading Tool (IAGT) for early screening of autism. Data are collected using the Indian Autism Parental Questionnaire and assigned with grades. This dataset is employed to test five supervised machine learning models, which compare classification performance based on accuracy, precision and recall. The most effective model should be used to implement the autism screening application. MLR is known to be more robust and to support fewer data sets, so it can be employed for the implementation of ML-powered mobile applications. MLR achieves the overall accuracy of 97.85%, which equates to 0.72%, 2.37%, 0.84% and 1.54% better than SVM, DT, KNN and GNB respectively. The proposed tool is developed in both Tamil and English. The pilot study is conducted with 30 children and the predictability of the tool is compared with the clinician. Therefore, the tool consistently achieves the same level of accuracy as clinicians.
Keywords
Introduction
Autism Spectrum Disorder (ASD) is a neuro-development disorder considered by repetitive patterns of behaviors, complicatedness in social communication [1] and impaired social interaction [2, 3]. It can be identified by symptoms such as different social behaviors, expression of stereotyped [4] or repetitive activities and uncontrolled body movements, fascinated by moving, spinning objects and other sensory abnormalities. The most famous neurological debates for ASD states that 1 in 59 below 10 years is affected by ASD; the rate of ASD appears to be increasing year by year.
A child’s early signs of ASD can be identified by its parents. Nevertheless, it would be difficult for many parents to recognize the symptoms of autism because the child would have normal physical growth as of any typical child of that age, but would be abnormal in its activities. Though the symptoms of ASD different some symptoms appear to be similar to that of the other disorders. This makes it difficult for the autism to be identified at an early stage. Hence the parents have to often keenly observe their child for sometimes and consult their friends, relatives and doctor regarding their child’s abnormal behaviors impaired activities, developmental delays and other doubts, which lead to delay in the diagnosis of ASD [5] Medical practitioners tend to follow different schedules and various autism screening and diagnostic tools [6] that lead to a lack of unified diagnosis. There are some parental interview-based ASD diagnoses available; Autism Diagnostic Interview-Revised (ADI-R) [7], Diagnostic Interview for Social and Communication Disorders (DISCO), Developmental, Diagnostic and Dimensional interview (3Di), Childhood Autism Rating Scale (CARS/CARS-2) [8] and Indian Scale for Assessment of Autism (ISAA).
Experts and therapists suggest that ASD can be reduced by early intervention and early screening. However, the early identification becomes very complex and gets delayed [9, 10] due to the misunderstanding of disabilities, misguidance by elders, ignorance due to social status, feeling of inferiority and so on. The early autism identification by parents plays a vital role in the child’s early start of the therapy and disability constraints of ASD. However, in lower middle-income countries like India, the ASD screening and assessment process is more costly and a time-consuming one due to the dependence on patented western tools like ADI-R, DISCO and CARS, which may not be accessible to the people of remote villages.
A lot of studies are made in machine learning to improve ASD diagnosis. Most recently, for autism screening and diagnosis, a varsiety of methodologies are followed: Eye Tracking System, Personal Characteristic Data (PCD), Structural Magnetic Resonance Imaging (MRI) Image, Autism Brain Imaging Data Exchange (ABIDE) and diagnosing autism using home videos, a multi disease prediction using deep learning methodology. The prediction methodology created by the following steps like Data Acquisition, Optimal Feature selection, Statistical feature Extraction, and prediction, this system is verified by using different deep learning algorithms. [11].
This study makes possible to the development of machine learning model towards the automated diagnosis of ASD. From the above methods of ASD screening and diagnosis is impractical for the people who are living in India. The parental interview system is more suitable for the Indian people because the parent is the only person to know about the children who are all in abnormal activities and struggled in their daily regular activities.
In the present research, a novel ML powered mobile application called Indian Autism Grading Tool (IAGT) is developed for parents, caregivers and special school teachers for the initial screening and diagnosis of ASD. The parental questionnaire is designed in an easily understandable format in both Tamil and English language with the aim to provide accessibility to underprivileged people. This can help people to screen and identify ASD on their mobile devices. It also acts as the platform for giving first opinion to assessors. In addition, the IAGT is an intelligent tool-based screener which would give immediate results with a user-friendly environment. This ASD diagnostic tool simulates the expertise of the clinician with the intensive development of Artificial Intelligence (AI) based mobile application. Machine Learning (ML) is a part of an artificial intelligent technique to provide support for decision making in medical and healthcare sectors [12].
ML has a variety of classification algorithms to solve the supervised and unsupervised learning [13, 14]. Decision Tree Classifier, Gaussian Naive Bayes approach, K-Nearest Neighbors (KNN), Multinomial Logistic Regression (MLR) [14] and Support Vector Machine (SVM) are supervised machine learning techniques that can be implemented as automatic ASD screening and the results can be validated against the real-world autism data.
Due to the practical difficulties in data collection, an algorithm which performs well for the least number of data has to be identified. So, the MLR and DT models outperform in the training and testing compared to the other classifiers. At the same time the validation process for DT has less prediction accuracy from unseen dataset [15]. In these classifiers, MLR is known to be more robust and it supports least number of data and it can be used for the implementation of an ML powered mobile application.
The IAGT provides a great support to initial autism screening for the parents, teachers, social workers and experts who are working with autistic children [16]. It clears the parents’ doubts regarding their child’s developmental delays. Mobile based computing tool is compact and used as an automated evaluation with optimum data management system of storage and retrieval. The most important aspect of a mobile device is portability which helps to use the autism grading application at home, at school and at public places. Since it is proven to be consistent and has interactive dynamic applications, mobile technology reduces the time and cost of the screening process. The IAGT provides parents with an ability to assess and recognize their child’s behaviors, and identify whether the child has autistic symptoms or not. This application will further guide the parents if the child needs to be referred to any clinician for detailed diagnosis. The key contributions of the present work are as follows, The IAGT application is capable of predicting ASD symptoms based on the medical records of a child with ASD. From the behavior responses given by the parents and caregivers, it is useful to determine the autism grade of the child, such as Mild, Moderate, Severe and No Autism. It reduces the problem of inaccurate diagnosis. In addition, it reduces the gaps between hospitals and patients in the autism screening. Data are collected using Indian Autism Parental Questionnaire and assigned with grades. This dataset is employed to test five supervised machine learning models. The classification performance of the proposed ML based IAGT is evaluated using accuracy, f1 score, precision and recall.
The rest of this work is organized in the following sections. Section-2 briefly discussed about the related works, Section-3 includes the materials and method for developing IAGT, Section-4 comprise with results and discussion and finally section 5 encloses with conclusion.
Related works
There are various autism screening mobile applications available for early diagnosis. Most of the existing applications employ score-based evaluation [17, 18]. A few applications have implemented Artificial Intelligence (AI) techniques that are suitable for clinicians or experts in autism assessment, but are not useful to parents for the initial assessment of their child. The following are the list of applications used for autism assessment: Indian Scale for Assessment of Autism (ISAA), ASD Tests app, Eye-Tracking App, Autism & Beyond, Mobile Video Rater platform for autism Risk Classification, Naturalistic Observation Diagnostic Assessment (NODA) and Autism AI App [19, 20].
ISAA
ISAA is a mobile application used for clinical diagnosis involving grading severity, intervention and monitoring. This is together established by National Trust and Ministry of Health and Family Welfare, and Ministry of Social Justice and Empowerment of the Government of India. It includes 40 items under 6 domains to diagnose the children from 3 years to 22 years and above. In this diagnostic process, each parental question carries marks from 1 to 5. After the assessment, the above 40-item scores are summed up and the grades are assigned according to the score. The average administration time is 10 to 15 minutes and this application is designed in English language. This mobile application is designed specifically to assess the children with ASD in India. The ISAA tool is predicting autism grading based on the child’s behavioral score.
ASD tests app
ASD Tests app is a machine learning based mobile application [21] which assists the healthcare professionals in the diagnosis of ASD. Two types of questionnaires are used; for the children of age less than 36 months, Q-CHAT questionnaire is used and for those above 36 months to 17 years, AQ Test questionnaire is used. This application is constructed in 11 different languages with four test modules, each test consisting of 10 questions corresponding to 5 options. In addition, Naïve Bays and Logistic regression algorithms are used for the predictive analysis of the ASD features. These algorithms are trained with 1100 instances. Data collection is the most important module in this application.
Eye tracking app
Eye tracking is a mobile application [22] used to diagnose the children’s ASD. It identifies changes in the eye contact of the children. A child’s social activities are monitored by games, and eye tracing is done by motion forms and curves with attractive colors. This application helps to identify the ASD children between 18 months and 7 years old. A video is recorded with the duration of 1 minute containing two scenes on the same screen; left side of the mobile screen has social related scenes and the right side of the mobile screen has colorful moving objects. The webcam records the children’s visual activities for ASD diagnosis.
Autism & beyond
Autism & beyond is an iPhone app for screening autism. A smart phone records the children’s behavioral responses for the activities like bubbles, bunny, mirror, toys and songs in their home and schools in video format. All the videos of the child’s facial landmarks are extracted using embedded face detection software encoding. Data collection includes the collection of caregivers’ report and a child’s behavioral video recorded from home by caregivers. The mobile application is deployed with automatic computer vision and machine learning analytics for autism screening. The videos collected quantify the young children’s behaviors and emotions in their natural environment using video coding. This automated system helps to analyze and track the children’s movements, behavioral response, emotions and attention.
Mobile video rater platform for autism risk classification
Mobile video rater platform is employed to diagnose autism using home-based videos [23]. This application is constructed with machine learning based autism classification from the 3-minute home videos of the children with ASD and without ASD. A mobile web portal is utilized for video raters to assess 30 behavioral features that are used in 8 ML models for diagnosing ASD. The participants record videos of children between 12 months and 17 years and upload them onto the portal or make reference to the videos already uploaded in you tube by using crowd sourcing method. This video rater platform is an artificial intelligence-based model that is more generalized to the task of automatic detection of autism in home based short video clips from mobile devices.
Autism detection via Machine learning techniques
In 2021 Hossain, et al. [24] had devised a machine learning algorithm for detecting autism spectrum disorder. The characteristics of toddler, child, adolescent, and adult datasets for ASD were examined and a relationship was established between demographic characteristics and ASD incidences. The most important characteristics was classified the ASD cases using multiple ML techniques, and then choose the most efficient algorithm. From the comparison of various ML classification algorithms, SMO (Sequential Minimal Optimization) based Support Vector Machine (SVM) algorithm performs optimally for detecting ASD cases in all ASD datasets.
In 2021 Mohanty, et al. [25] had proposed a classification approach for the detection of toddler ASD cases based on machine learning. The proposed approach makes use of the Qualitative Checklist for Autism in Toddlers (Q-CHAT-10), which is based on a number of behaviorally independent factors in the data set, including age, gender, and ethnicity. The categorization of ASD Toddler data is carried out using various ML classification algorithms in two phases: training parameter ɛ and k-fold cross validation (k = 10), which achieves high classification performance.
In 2021 Vakadkar, et al. [26] had designed a machine learning technique for the prediction of ASD in children. This method uses five ML algorithms to categorize certain subjects as having ASD or Non-ASD by using features like age, sex, ethnicity, etc. Each classifier was analyzed to find the classifier that performed the best. From the comparative results Logistic Regression was noted to obtain the high accuracy. The ASD datasets are scarce in open source, so this approach will have less effect on performance.
Naturalistic observation diagnostic assessment (NODA)
Naturalistic Observation Diagnostic Assessment (NODA) [27] is also an autism diagnostic mobile application. The parents record their child’s behavioral action during particular situations and upload them in this application for which experts can provide interpretation. The scenarios provide opportunities for the children to establish social communication skills and play, specific social presses and typical social communicative behavior. Clinicians observe the behaviors captured through a secure platform and provide a detailed observation report for that behavior.
Autism AI app
Autism AI is an ASD screening application that uses AI. This assessment tool has 10 questions with four outcome options to notify the assessing person if there is an autistic trait. It supports the basic autism assessment process and suggests whether a child need to consult health professionals or not. Table 1 comparison different applications used for autism diagnosis.
Survey of autism diagnostic mobile application
Survey of autism diagnostic mobile application
System Architecture and Design: The present research is carried out in four phases, namely Parental Interview Questionnaire Development, Data Collection, Development of ML models and implementation of the best performing ML model for Development of IAGT Mobile Application. The architecture and design of ASD screening is shown in Fig. 1.

Architecture and design of ASD screening.
The available standard autism parental questionnaires require some assistance for the parents to complete, and many screeners from western countries are socio-culturally unsuitable for the eastern population. Hence it is necessary to create an easily understandable and socio-culturally sensitive parental questionnaire. It takes more time for the initial test and autism administration for a single child and the patented tools, being costly, add burden to the parents. With this concern, a simple and easily understandable and culturally sensitive questionnaire is developed in Tamil and English languages and it is planned to translate it into other regional languages, which would be suitable for south Indian population.
Five-point Likert scale like never, sometimes, often, very often, every time is selected as a response category. This IAPQ screener is developed to assess ASD in infants and toddlers from the age of 6 months to 11 years. The Indian Autism Parental Questionnaire (IAGT) developed has 37 questions with six groups of items, including Social Skills, Language Skills, Physical Skills, Activity/Play Skills, Sensory Ability, and Intellectual Ability.
The IAPQ screening uses parental interview, questionnaire and grading methods for autism assessment. The time taken for the assessment is < 10 minutes. This tool is compared with other tools and depicted in Table 2.
Comparison of autism diagnostic tools with IAPQ
Comparison of autism diagnostic tools with IAPQ
Data of ASD children’s behavior data are collected from the parents, caregivers and teachers by using IAPQ Tool from the District Early Intervention Centers (DEIC), special schools, therapy centers and disability medical camps. Further, parental interview is conducted and the information related to the children’s present and past medical history, behavioral problems and mental ability are collected. These children are then assessed by a clinician and grading is done. In total, 182 records are collected, among them 69 children are Normal, 33 are mild, 51 are moderate and 29 are with severe autism. The participants of the study include 145 male and 37 female children, among them 29 children are with the age of 0 to < 3 years, 48 children are with the age of 3 to < 5 years, 89 children with the age of 5 to < 10 years and 16 children of 10 to < 11 years. The mean age group of the data is 4.05 years. A dataset is being created by combining all the IAPQ records collected from various places in Erode district, Tamil Nadu, India. It consists of answers for 37 questions. Each answer is given a score and the score for all the responses is summed up. Based on the score and behavioral observations the clinician assigns a grade, namely normal, mild, moderate and then severe. Hence at the end of each record, the grade assigned by the clinician is recorded and that information serves as the class label for the classification algorithm.
Machine learning models
In this work, the efficiency of the proposed approach was calculated based on accuracy, recall, precision and F1 score. To assess the autism features, the ASD screening test has been carried out with five different machine learning classifiers including Gaussian Naive Bayes approach, Decision Tree Classifier, KNN, MLR and SVM. These models are built based on the IAPQ dataset in which all the 182 records from the dataset are taken as training sets and 109 records (60% of records) are taken for test set. The model with maximum performance is identified by calculating individual class accuracy, precision, recall and f1-score as shown in Table 3.
Accuracy, Precision, Recall, f1-score and support values for each machine learning model
Accuracy, Precision, Recall, f1-score and support values for each machine learning model
The validation process of the machine learning models is evaluated in two different datasets like module 1 and module 2. The module 1 data has a part of trained data and module 2 is a new data, which is not employed in training and testing. Both the modules have 30 individual records with four classifiers and the result of the validations is shown in Fig. 4.
Mobile application tool for autism identification
An outperforming machine leaning approach is further combined to a mobile application which assists as a tool for the autism parental interview and screening based on IAPQ questionnaire.
This questionnaire is answered by the parents, caregivers and teachers. IAGT parent ASD screening tool is shown in Fig. 2(a), (b), (c) and (d). The IAGT mobile application encourages home based autism screening in the smartphones and tablets and in all the other mobile devices. It is a robust and upgradable application. Parents can install the application in their mobile phones and can perform autism screening easily. They can choose either Tamil or English in which they wish to respond. Then the mobile tool would display the questions in the language chosen. Initially, the personal and medical information about the child like the child’s behavior, birth history, problems and the present treatment and child’s personal information are to be entered. Then the user has to fill answers to 37 questions with 5 prescribed options which reflect the child’s behavior. Figure 3 illustrates the steps involved in the use of ML powered IAGT mobile tool.

Autism screening mobile application; (a) home screen; (b) parent questionnaire in tamil; (c) parent questionnaire in english; (d) ML predicted result.

Flow of IAGT.
Finally, the answers are fed into the machine learning model to predict the autism grade automatically for the child. Then the autism grade given in Fig. 2(d) is displayed. This outcome is obtained from a highly trained ML model rather than a score-based prediction.
The IAGT stores and manages the data in two different sources like local storage and cloud storage. The assessment records are stored in the mobile device for future reference by using the SQLite database. The past assessment records help the parents to check the level of improvement of the child and to compare the current and previous screening results. This application can be run in both offline and online modes. If the mobile has an internet connection, the data will be automatically stored in cloud storage; otherwise, it will be stored only in local devices. This is an optional feature to collect data to enhance the ML model with more training data.
Experiment and results
The machine learning models are trained and tested using 37 features. Hyper parameter tuning is an essential aspect of machine learning process. A hyperparameter is an input into a model that is determined before the learning process starts. The model is validated by partitioning the collected samples into training set and test set to evaluate the model performance. Increasing the complexity or learning rate of a model demonstrates the importance of hyperparameters. The most optimal combination of hyperparameters can be analyzed as a search problem for models with many parameters. There are also some hyperparameters for each classifier, for example SVM has some parameters like C and gamma and by tuning these parameters SVM yields higher weighted average precision of 0.98. DT has max_depth, max_features and max_leaf_nodes parameters, by tuning these parameters DT achieves the weighted precision of 0.98, KNN yields the weighted precision of 0.96 by tuning n_neighbors and leaf_size parameters. The regression parameters (β coefficients) is the hyperparameters of MLR that can be tuned to yield the weighted precision of 1. Although, GNB does not have hyperparameters and it obtains the weighted precision of 0.97 as depicted in Table 4. The parameters that are learned in GNB include the prior probability of various classes, as well as the probability of various attributes for each class. Due to the assumption of feature independence, the GNB may produce erroneous class probabilities.
Performance evaluation of five ML classifiers based on the tuned hyperparameters
Performance evaluation of five ML classifiers based on the tuned hyperparameters
Furthermore, the performances of the models are analyzed by two different ways. In the first case, the test result of the individual class’s prediction accuracy, sensitivity, specificity and Kappa of all five models is obtained as shown in Table 7. Then the machine learning models are verified by previously trained data and also unseen data. The Validation results of DT, GNB, KNN, MLR and SVM for both Module 1 and Module 2 are shown in Fig. 4.

Validation results for DT, GNB, KNN, MLR and SVM for both module 1 and module.
Table 5 shows the estimation of AUC and ROC values of the model. The macro one-vs-one ROC-AUC value is 0.967 and 0.979 is weighted prevalence. Similarly, the one-vs-rest ROC-AUC yields the same value for macro and weighted prevalence.
Calculation of ROC and AUC Scores
Figure 5 shows the ROC for autism and Fig. 6 shows the multiclass classification of autism based on the TPR and FPR. Figure 7, show that for each of the ML classifiers was tested, normal (class 2) has the high classification value, with only one misclassification. For the MLR classifier, the ASD prognosis was effectively classified 100% of the time successfully classifying 75% of the testing data.

ROC curve of ASD.

ROC curve for multiclass of ASD.

Confusion matrices of (a) MLR (b) SVM (c) DT (d) KNN and (e) GNB classifiers.
The ML powered autism diagnostic mobile application allows people to screen and detect autism spectrum disorder in both offline and online modes. This tool has the capability of changing languages according to users native. The robustness of the proposed ML techniques was evaluated using performance metrics such as accuracy, precision, recall and f1 score. The model achieves better robustness from the experimental results illustrated in Table 6. The scalability of this model is depicted in Fig. 8. Similarly Table 6 reflects accuracy, precision, recall, f1-score and support values for the machine learning models. In this, DT and MLR classifiers’ accuracy, precision, recall and f1-score are high with 100% for all four classes. The accuracy of mild, no autism and severe has 100% and then moderate class accuracy is 97%. The GNB and SVM classifiers’ accuracy of mild, No Autism and severe is 100%, and the accuracy of moderate class is 97%. In precision accuracy in Moderate, no autism and severe class have 100%, and then the accuracy of mild cases is 96%. In the recall the accuracy of mild, no autism and severe is 100%, and it is 97% in moderate cases. In f1 score the accuracy of both mild and moderate class has 98%, and then no autism and severe accuracy have 100%. The KNN classifier is accuracy, precision, recall, f1-score accuracy for the no autism and severe class have 100%, mild class has 97% and moderate class accuracy has 96%. Finally, support counts have 26 records in mild, 29 records in moderate, 31 records in normal and 23 records in severe autism.
Accuracy, Precision, Recall, f1-score and support values for each machine learning model

Scalability of the ML classifiers.
In the validation process of modules 1 and 2 in Fig. 4, KNN has the smallest difference from trained data that accuracy has 98.63% and new data accuracy has 93.33%. GNB, MLR and SVM have 100% and 86.67%. DT has 100% and 76.67% in shown in Table 7.
Accuracy, sensitivity, specificity, kappa and hamming loss for DT, GNB, KNN, MLR and SVM
The predictive model is cross validated by 10-fold cross validation technique by partitioning the collected samples into training set and test set to evaluate the model performance. In the 10-fold cross validation, the dataset randomly partitioned into 10 equal size groups and validate the predictive model performance. In this validation accuracy of the DT has 95.53, GNB has 96.34, KNN has 97.02, MLR has 97.85 and SVM has 97.14 is shown in the Fig. 6 and the training and testing loss function curve is depicted in Fig. 9.

Fold cross validation result for autism predictive model.
From Fig. 9 MLR achieves the overall accuracy of 97.85%, which comparatively 0.72%, 2.37%, 0.84% and 1.54% better than SVM,DT,KNN and GNB respectively. Figure 10 depicts the number of iterations and loss value, which shows that as the number of iterations increases, the model loss decreases. So, the model predicted result highly reliable.

Loss function curve for (a) MLR, (b) SVM, (c) DT, (d) KNN and (e) GNB classifiers.
Mobile applications are being increasingly used in the medical healthcare sector. The majority of healthcare applications can only be handled by clinicians. Automated patient assessment applications handled by people are very few; parents need easy to use applications to motivate them for the first screening. The proposed mobile healthcare IAGT application allows people to screen autism at anytime and anywhere, and thus it provides a valuable first opinion at an affordable cost. Despite the rapid growth of ASD, diagnostic centers are busy and doctors spend less time on screening, which leads to misdiagnosis. Moreover, the initial autism screening assessment takes a longer time to complete the parental interview and autism administration process. Therefore, screening and diagnosis should be administered at an earlier stage and procedures should be understood by everyone. The present study proposes a multiple-choice IAGT tool consisting of 37 questions with a five-point Likert scale. The five machine learning models DT, GNB, KNN, MLR and SVM are identified and trained using the 182 samples collected. MLR is selected as the best suitable model from the above models because it has 100% accuracy in both training and testing, and 100% accuracy in module 1 and 86.67% in module 2 validation. The ML model was validated by 10 fold cross validation to prevent model overfitting. MLR is known to be more robust and to support fewer data sets, so it can be used for the implementation of ML-powered mobile applications. This machine learning model is incorporated with a mobile application. The ML powered autism diagnostic mobile application allows people to screen and detect ASD in both offline and online modes. Furthermore, Tamil language-based applications make it easier for parents from the Tamil region to assess and screen for autism, while English-based applications can be used by all parents.
Footnotes
Acknowledgment
This study is supported by the Department of Science and Technology (DST) (Ref.No. SEED/TIDE/106/2016, Dated: 13.11.2017), Govt of India.
Funding statement
The authors received no specific funding for this study.
Conflicts of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
