A new classification system for autism based on machine learning of artificial intelligence

Abstract

BACKGROUND:

Autistic Spectrum Disorder (ASD) is a neurodevelopment condition that is normally linked with substantial healthcare costs. Typical ASD screening techniques are time consuming, so the early detection of ASD could reduce such costs and help limit the development of the condition.

OBJECTIVE:

We propose an automated approach to detect autistic traits that replaces the scoring function used in current ASD screening with a more intelligent and less subjective approach.

METHODS:

The proposed approach employs deep neural networks (DNNs) to detect hidden patterns from previously labelled cases and controls, then applies the knowledge derived to classify the individual being screened. Specificity, sensitivity, and accuracy of the proposed approach are evaluated using ten-fold cross-validation. A comparative analysis has also been conducted to compare the DNNs’ performance with other prominent machine learning algorithms.

RESULTS:

Results indicate that deep learning technologies can be embedded within existing ASD screening to assist the stakeholders in the early identification of ASD traits.

CONCLUSION:

The proposed system will facilitate access to needed support for the social, physical, and educational well-being of the patient and family by making ASD screening more intelligent and accurate.

Keywords

Autism ASD screening detection systems machine learning medical screening deep neural network

1. Background

Autistic Spectrum Disorder (ASD) is a neurodevelopmental condition typically described as impairments in the development of social, cognitive, and communication skills that are exhibited by social, repetitive, or restricted behaviours and interests. ASD is one of the fastest growing development conditions with rates showing that 1.5% of the world’s population are on the autism spectrum and many remain undetected [1]. Usually, ASD is coupled with substantial healthcare costs that may hinder speedy detection [2]. On average, waiting time for a formal ASD diagnosis in the United Kingdom is over three years [3]. Despite receiving a diagnosis, some individuals exhibit above average scholastic or non-academic (e.g., artistic) capabilities, posing a challenge for professionals to rationalise a diagnosis [4].

The diagnosis of ASD is clinically based on observable and measurable behavioural indicators (e.g., social skills, engagement in age-appropriate play and leisure, behaviour excesses, communication skills) using diagnostic methods. Existing methods seem to subscribe to the idea that more questions translate to a more accurate classification of cases and controls – this is time consuming due to the large number of items the specialist must check. These have necessitated a change in the way diagnostics are coded and behave within ASD clinical tools for the classifying process.

Early detection of autistic traits can be accomplished by using screening methods such as the Autism Quotient (AQ) [5]. However, the number of items required to be checked by the user is still large, i.e. 50 items in the case of AQ. Attempts to reduce the number of items and improve the efficiency of screening and diagnostic tools have been observed over the last few years, for example by [6], and [11]. Nevertheless, all of the existing methods in ASD screening research are based on domain expert rules and simple scoring functions used to classify cases and controls [12]. Often, when the scores obtained exceed a certain threshold, the individuals are classified on the spectrum, or at least classified with having autistic traits. For instance, when the score obtained by an individual using AQ is larger than or equal to 32, the presence of autistic symptoms is indicated. In addition, the experience and knowledge of the clinician or the user undergoing the test (in case of self-administered) plays a part in the performance of the outcome [13].

One promising direction to enhance ASD screening performance (accuracy, sensitivity, and specificity) is to replace the scoring function with an objective and intelligent mechanism such as machine learning. Using this approach, individuals undergoing the screening will be classified using predictive models derived by studying valuable hidden patterns from previously labelled cases and controls. Therefore, the sole decision of classification will be by the domain expert exploiting the predictive models. This process will minimise the subjectivity while assisting clinicians and other stakeholders (such as parents, patients, caregivers, etc.) by offering objectivity for the classification decisions. Physicians and other medical experts will be able to utilise these intelligent models to quickly refer cases for further in-depth clinical evaluation and provide a more accurate rationale for diagnoses of individuals exhibiting above average skills within a particular domain.

Autism screening can be considered as a binary classification since the goal is to forecast whether individuals exhibit autistic traits based on predefined behaviour features (questions/items in the screening method) as shown in Fig. 1. This research investigates the applicability of a machine learning mechanism for ASD traits detection, primarily DNNs. These are advanced and powerful machine learning techniques that have the ability to learn complex non-linear relationships between many dependent and independent variables [14]. Since DNNs have not been widely studied in ASD behavioural research (i.e. [6, 15]), the proposed methodology of utilising DNNs will not only make screening tools more accurate but will also dramatically change the design of future clinical tools. When the neural network is embedded in the self-assessment tool, valuable knowledge is derived for the users while guiding the process of correct classification selection decisions in a more efficient manner.

Figure 1.

Detecting autistic traits as a classification problem in machine learning.

sAnother contribution of the study is to better understand what components contribute to an efficient data-based ASD screening tool that may be used by health professionals. More specifically, to establish a self-administered ASD screening method that reliably and accurately provides relevant feedback to patients, caregivers, and medical professionals for professional diagnostic services. The main research question that this paper investigates is:

Can DNNs improve ASD screening in terms of accuracy, sensitivity, and specificity when compared to other intelligent machine learning methods?

This paper is structured as follows: Section 2 critically reviews research works in the literature that adopt machine learning technology in autism research. Section 3 explains data and behavioural features and describes the experimental setup, while results are provided in Section 4. Lastly the conclusions are provided in Section 5.

2. Literature review

Kosmicki et al. [11] investigated the impact of different machine learning techniques to improve the efficiency of conventional autism screening and diagnosis methods. They used data from the Simplex Simon Collection (SSC) version 15 [17] to differentiate between ADHD and autism cases. After processing the dataset, classifiers that were derived from the various machine learning techniques showed a reduction in the number of features – especially by Logistic Regression classifiers when compared to those of Random Forest. Despite the reduction of the features, there was no clear mechanism for distinguishing cases of ADHD from those of ASD. Likewise, Wall et al. [18] applied a number of machine learning algorithms to reduce the number of items in ADOS-R modules. After investigating the results of a decision tree algorithm, the authors claimed that out of the 29 items of ADOS-Revised (module 1), only eight features appeared in the classification system and therefore concluded that the 29 items could thus be replaced with just eight items. However, a later study by Bone and his colleagues [10] showed serious conceptual and implementation pitfalls with this study.

Kosmicki et al. [11] also presented an intelligent system to differentiate autism cases using a subset of autistic behaviours under the DSM-V. The study evaluated items within the ADOS diagnosis method to achieve its objectives. The intelligent system in this study consisted of four modules based on the language and developmental levels of the subjects. The initial assessment of the first module of ADOS, conducted by a certificated clinician in a clinical setting, suggested that approximately 27% of the subjects were left undiagnosed until the age of eight years old. Therefore, the researchers incorporated Support Vector Machines (SVM) on the second and the third modules of the ADOS with the use of stepwise backward feature selection. Data obtained from 4,240 individuals from several datasets were employed to evaluate the stepwise backward feature selection and the SVM algorithm. The results of the experiments demonstrated that 9 out of 28 behavioral dimensions can be captured through the ADOS module 2 incorporated with the SVM results, whereas module 3 can capture 12 out of 28 behaviours. Likewise, module 3 exhibited 96–97% accuracy in determining the risk of ASD in the considered sample population and reduced the number of individuals left undiagnosed by offering 97% sensitivity. Module 2 was also reported to have a sensitivity of 96.81% and specificity of 89.39%, after the SVM adaptation.

ASD can be characterised by neuroanatomical variations in different regions of the brain, but there have been few studies focused on individual subject features of neuropsychiatric disorders. Addressing this issue [19], investigated machine learning usability to understand the relationship between structural covariance features and autism symptomology using the inter-regional thickness correlation of the individuals. Through advertising, they recruited 82 cases and 84 controls of autism who were examined at the Institute of Psychiatry, Kings College London, the Autism Research Centre, University of Cambridge, and the Autism Research Group, University of Oxford using the ADOS and Autism Diagnostic Interview-Revised (ADI-R ASD) diagnostic tools [20]. Magnetic Resonance Imaging (MRI) data required to identify the relevant regions of the brain were obtained from the 3T systems at the three designated centres. The FreeSurfer analysis suite [21] was used to obtain average inter-regional thickness values related to each weighted MRI. The results of the study demonstrated a clear relationship between structural covariance of different brain regions and the presence of autism symptomology. The relationship between structural covariance of the left hemisphere regions of the brain was revealed to have a stronger impact than the left hemisphere regions on autistic behaviours. Even though the study did not address the clinical suitability of the machine learning classifiers based on different functional regions of the brain, it provided some evidence on machine learning applicability in discriminating the symptoms of various neuroanatomical impairments.

ASD is heterogeneous in nature and the treatment requirements vary significantly with individuals. In this respect, Vellanki et al. [22] studied how to identify learning patterns of different autism cases to personalise the educational curriculum. This study presented a method of learning patterns through data collected using a TOBY play pad. Since there was a large number of variables covering different types of skills, simulation, and languages, understanding the number of patterns accurately was challenging. To overcome this issue, the study adopted the Linear Position Gamma model (LPGM) [23], a Bayesian non-parametric factor analysis, and Indian Buffet Process (IBP) [24] to create intervention skill for subgroups of children. The authors evaluated the methodology on a dataset of 542 cases and derived 26 latent factors by repeating the same process 2,000 times. These latent factors established the relationship between individual children and their designated learning patterns. The investigation revealed similarities in the children based on their respective learning patterns that remained relatively stable over time. The findings of the study can help to personalise the computer-assisted learning syllabus with enhanced learning opportunities in children’s development areas, identifying the causes by grouping children with similar learning patterns.

Abbas et al. [6] developed an ASD screening instrument for young ( $<$ 5 years of age) children powered by machine learning to provide a simple and more cost effective screening than the prevailing conventional methods. The proposed screening tool comprises two basic screeners: a parental questionnaire based on ADI-R with 93 different items, and a video screener based on ADOS with the ability to capture short home-made videos provided by the parents. The data required to construct the classifiers was obtained using the ADOS and ADI-R methods at the different data centres including Boston Autism Consortium [25], Autism Treatment Network [26], and the Simons Simplex Collection [20]. The clinical validation of the tool is carried out using a sample population of 230 children aged between 18–72 months by assessing each child with different prevailing screening tools; the results were evaluated by a professional clinician. Different machine learning techniques, including Random Forest and Generic, with multiple model variants were incorporated. The classification of cases and controls was enhanced by using the multiple model variants when feature selection was employed. Finally, the screening process further improved when Logistic Regression was adopted to optimise the functionality.

Liu et al. [9] proposed a machine learning system to predict ASD symptomology through the eye movement patterns of individuals. Initial experiments were carried out on two target groups of Chinese children. The first group consisted of three subgroups: 20 ASD children, 21 age-matched typically developing (TD) children, and 20 IQ matched TD children. Similarly, the second target group consisted of 19 ASD, 22 IQ-matched Intellectually Disabled (ID), and 28 age-matched TD young adults and adolescents. The eye movements and gazing patterns of each assessed subject was captured through a Tobii T60 Eye Tracker [27]. The images captured were analysed using k-means clustering to identify the eye gaze coordinates on the spatial domains and to divide the face into different regions. ASD cases are expected to be differentiated based on the magnitude and directions of both the eye gaze coordinates and eye motion.

George and Joseph [28] employed a model similar to ‘Bag of Words’ to document the sequence of coordinates per image per person. To divide the face into regions based on clusters of gazes, two types of histogram presentations were used. The first one was a hard histogram to capture the frequencies of the gazes, and the other was a soft histogram to capture the gazes falling on the border of two identified regions. Then the prediction models were developed using SVMs to avoid negative data and to identify linear decision boundaries. Finally, subject level predictions with a global threshold were enabled as a scoring framework to interpret functional margins and decision boundaries. However, findings of the experiments showed a greater potential and effectiveness in the proposed system for identifying symptoms of ASD.

Alwakeel et al. [8] introduced a machine learning-based electronic security alert system with wireless sensory devices that could recognise the child’s gestures and motions. This was to help parents protect their child from potential hazards since children diagnosed with autism often get injured or exposed to danger due to communication difficulties and/or neglect. The authors discussed the concepts, elements, functions, architecture, and operations of the proposed hazard control system. The system was named Autistic Child Sensor Network (ACSN) and comprised a wearable sensor, a mobile parental application, and also home automation system with an intelligent algorithm. Both wearable sensory device and parental app were connected with the home automation system, fixed at home and had the ability to facilitate communication between the devices. The home automation system can detect a hazardous situation and turn on any designated electronic device to distract the child away from his present position. To perform this task, ACSN’s three components were equipped with a GSM/SMS modem, GPS sensor, temperature sensor, heartbeat sensor, repetitive and undesired movement sensor, sound sensor, real time operating system, ambient controls, and an intelligent battery control with inductive charging and low battery special handling facilities. All these devices gather data to measure and define possible abnormal autistic events and process them through a machine learning algorithm to set threshold values. When an event violates the threshold and the values set by a machine learning algorithm is experienced, sensors are stimulated and warn the other devices through the operating system.

The autism and developmental disabilities network (ADDMN) [7] conducted an annual survey on 8-year-old children diagnosed with autism in the USA to determine whether they met the ASD surveillance criteria through the careful evaluation of obtained data by the clinicians. Since the sample population size kept increasing rapidly, it became more challenging for ADDMN to carry out this task manually. Therefore, [29] presented an intelligent-based automated system that embedded a modified Random Forest classifier for ADDMN to determine the surveillance criteria of the observed population. Data provided by [30] obtained from 1,162 children was used to train the Random Forest classifier. Autism criteria based on DSM-IV-TR [31] and its protocols were fed into the classifier. To maintain the reliability of the classifier’s output, a clinician was also employed for additional review. The classifier used words and phrases to identify the autism case status, and the Bag of Words approach was adopted to capture the words, phrases, and their frequencies. The Random Forest algorithm was then used to subgroup the identified words/phrases and to perform the actual classification. Finally, the developed system was tested on the dataset to determine the validity of the system in discriminating children who did not meet ASD surveillance criteria with children who met the criteria. The results indicated the validity and acceptability of the proposed system offering 84% sensitivity and an 89.4% predictive rate.

Guillen et al. [32] applied a number of machine learning algorithms fed by survey questionnaire-based databases of the medical records of autistic individuals to provide a better insight on autistic traits to discover the subtypes of autism. The Autism Research institute’s (ARI) E2 survey database was comprised of the medical data of children born after 1960. The data were processed through a four-step procedure to find sub-categories of autism. The first step was modelling the data by adapting the text format such as Attribute Relation File Format (ARFF). The second step involved identifying the appropriate clustering machine learning techniques to simplify the database characteristics. Through a careful analysis of data and available machine learning techniques, Expectation Maximisation (EM), and Minimum Message Length (MML) algorithms were chosen to optimise the data clustering process. The EM algorithm was used for probabilistic clustering, maximising the marginal likelihood and MML was integrated with the WEKA data mining tool to reveal the clustering performance. After obtaining the EM and MML algorithms’ results, the clusters were processed by applying the RIPPER classification algorithm [33] to extract simple rules that might explain the subtypes of autism. The results of the study identified nine different subtypes of autism through the RIPPER classification systems and the clusters obtained by the EM and MML methods.

Thabtah and Peebles [34] developed a classification algorithm based on the rule induction approach to detect autistic traits at preliminary stages. The authors implemented a classification algorithm that generates simple yet influential knowledge that can be used to reveal correlations among autistic symptoms and response variable (having potential autistic traits). The reported results showed that the classification algorithm was able to predict autistic symptoms with high accuracy when contrasted with classification models derived by decision trees and other rule induction approaches, at least on the datasets considered.

3. Methods

3.1 Datasets

ASD data related to children, adolescents, and adults was collected using a recently developed mobile application for autism screening called ASD Tests [35]. The data is publicly available from the University of Irvine repository and was collected after obtaining an ethical approval from the University of Huddersfield in the United Kingdom. The screening app used for collecting the dataset consists of four modules based on autism screening methods (Q-CHAT-10, AQ-10-Child, AQ-10-Adolescent, AQ-10-Adult) [36]. The app was developed to expedite the screening process and to cover a larger target audience. Here, individual experiments were conducted on adult, adolescent, and child datasets, excluding the toddler dataset since it has less than 2% of instances on the spectrum and was imbalanced with respect to the target variables (i.e. ASD traits).

The data were originally obtained after ethical approval from the University of Huddersfield by its prospective authors. According to Thabtah [35], and prior to completing the assessment in the ASD Tests app, it was ensured that participants consented to a disclaimer regarding privacy policy, anonymity, research background, and the use of the data. All data collected are anonymous and there was no direct contact with participants. More importantly, the app clearly stated the terms of use of the data and all participants had to agree before submitting any answer to the behavioural questions.

Table 1 shows the main features for each dataset. In particular, features A1-A10 correspond to the actual questions in the AQ-10-Child, AQ-10-Adolescent, and AQ-10-Adult autism screening methods, respectively. Differences were found in A1-A10 features in each dataset since they belong to different screening methods and target a different audience.

Table 1
Features in the child, adolescent, and adult datasets

No	Independent variable	Data type	Comments
1.	A1	Binary	See Table 2
2.	A2	Binary	See Table 2
3.	A3	Binary	See Table 2
4.	A4	Binary	See Table 2
5.	A5	Binary	See Table 2
6.	A6	Binary	See Table 2
7.	A7	Binary	See Table 2
8.	A8	Binary	See Table 2
9.	A9	Binary	See Table 2
10.	A10	Binary	See Table 2
11.	Age	Continuous	Age of participant
12.	Gender	Binary	Male or female
13.	Ethnicity	Categorical Data	Chosen from a list of predefined values
14.	Jaundice	Binary	Yes or no
15.	Family History	Binary	Whether any family members diagnosed with autism
16.	User	Categorical	Person taking the test (parent, self, relative, caregiver, etc.)

Table 2

Details of variables mapping to the screening methods

Variable in

dataset

Corresponding AQ-10-Adult features

Corresponding AQ-10-Adolescent features

Corresponding AQ-10-Child features

I often notice small sounds when others do not

S/he notices patterns in things all the time

S/he often notices small sounds when others do not

I usually concentrate more on the whole picture rather than the small details

S/he usually concentrates more on the whole picture rather than the small details

I find it easy to do more than one thing at once

In a social group, s/he can easily keep track of several people’s conversations

If there is an interruption, I can switch back to what I was doing very quickly

If there is an interruption, s/he can switch back to what s/he was doing very quickly

S/he finds it easy to go back and forth between different activities

I find it easy to ‘read between the lines’ when someone is talking to me

S/he frequently finds that s/he doesn’t know how to keep a conversation going

S/he doesn’t know how to keep a conversation going with his/her peers

I know how to tell if someone listening to me is getting bored

S/he is good at social chit-chat

When I’m reading a story I find it difficult to work out the characters’ intentions

When s/he was younger, s/he used to enjoy playing games involving pretending with other children

When s/he is read a story, s/he finds it difficult to work out the character’s intentions or feelings

I like to collect information about categories of things (e.g. types of car, types of bird, types of train, types of plant, etc)

S/he finds it difficult to imagine what it would be like to be someone else

When s/he was in preschool, s/he used to enjoy playing pretending games with other children

I find it easy to work out what someone is thinking or feeling just by looking at their face

S/he finds social situations easy

S/he finds it easy to work out what someone is thinking or feeling just by looking at their face

A10

I find it difficult to work out people’s intentions

S/he finds it hard to make new friends

Table 2 depicts the mapping between A1–10 features and their corresponding questions in the screening methods to reveal the differences and similarities of the features. The values in the A1–A10 variables in each dataset have been mapped to ‘0’ or ‘1’ depending on the actual values given during the screening process by the participants. In other words, during the screening using the AQ-10-Child method, ‘1’ was given for questions 1, 5, 7, and 10 if the participants answered any of them with ‘Definitely’ or ‘Slightly Agree’, and ‘0’ otherwise. For the rest of the questions ‘1’ was considered if the answer was ‘Definitely’ or ‘Slightly Disagree’; otherwise ‘0’ was assigned. For the AQ-10-Adolescent method, ‘1’ was allocated to questions 1, 5, 8 and 10 if the given answer was ‘Definitely’ or ‘Slightly Agree’ for each of those questions, whereas ‘1’ was allocated to ‘Definitely’ or ‘Slightly Disagree’ on the remaining questions. Lastly, for the AQ-10-Adult method, ‘1’ was given for ‘Definitely’ or ‘Slightly Agree’ answers for questions 1, 7, 8, and 10. For the rest of the questions in this method ‘1’ was allocated when ‘Definitely’ or ‘Slightly Disagree’ was chosen for questions 2, 3, 4, 5, 6, or 9. This representation of ‘1’ or ‘0’ per feature in the screening method can ease the process of data processing by the machine learning algorithms during the building of the classification systems.

Looking at the data, the adult, children and adolescent datasets have 1, 118, 509, and 249 instances, respectively. In the adult dataset, the majority of the instances belong to the ‘no ASD traits’ class, making the dataset imbalanced. Surprisingly, the child and adolescent datasets are balanced with respect to the number of instances that are linked with the class label (ASD traits), in which there are 127 out 249 instances with ASD traits in the adolescent dataset and 257 out of 509 in the child dataset. Looking further into the instances with ASD symptoms in the adult and adolescent datasets, it was also a surprise to reveal that most of them are female, e.g. 69 in the adolescent and 185 in the adult, whereas the majority of instances with ASD traits in the child dataset are males.

3.2 Comparative study

To provide a comparative framework and baseline systems, an initial set of experiments was conducted incorporating different machine leaning algorithms. In particular, well-known machine learning algorithms have been compared with DNNs, namely Bayes Net, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), PART, C4.5 (Decision Tree), and multilayer perceptron ANNs to produce automated ASD classifiers. These classifiers reveal the machine learning technology performance (advantages and disadvantages) in terms of accuracy, sensitivity, and specificity when compared with traditional screening (see Eqs (2)–(5)). The reason for employing four machine learning algorithms was to identify the best performing algorithm, especially when different learning mechanisms are adopted. For example, C4.5 employs entropy to build tree-based classifiers, whereas RIPPER uses search methods with a global optimisation procedure to extract simple rules. PART combines both C4.5 and RIPPER strategies to generate hybrid classifiers, and Bayes Net employs a probabilistic concept to classifiers. All the chosen algorithms have been well-studied in machine learning and were applied to different application domains. In all the comparative experiments, 90% of the data was used for training and 10% for testing. The results of this comparative study are presented in Section 5.

3.3 DNN experimental setup

The next set of experiments was conducted on designing a customised DNN model to detect autistic traits compared to generic multilayer perceptron ANNs. The objective was to refine the evaluation strategy to better measure the generalisability of the proposed model by considering more testing data.

Separate experiments were conducted on each dataset since the independent variables A1 to A10 varied in each dataset. The experiments began by pre-processing the dataset to encode all categorical variables as explained in Tables 1 and 2, and to present them as binary data to facilitate the DNN learning procedure and improve convergence on the training data. Likewise, the ‘age’ variable was presented as bins of size 10 for adult data, 3 for child data, and 1 for adolescent data. The encoding procedure increased the dimensionality of the independent variables from 16 to 31.

The dependent variable (ASD traits) had two possible values (Yes, No) and was created during the data collection process by transforming the final score generated by the app and computed using the scoring function of the ASD screening methods. To be exact, if the final score obtained was larger than 6, then the class assigned was ‘Yes ASD traits’, otherwise, ‘No ASD traits’. The process of class assignment was automated and built within the screening app.

Next, each dataset was divided into training and testing datasets as shown in Table 3. The test/train ratios were different because each dataset has a different number of samples. In particular, a bigger proportion of the data was used for training the neural nets in smaller datasets.

Table 3
Training and testing data breakdown per dataset

Dataset	Total number of records	Train ratio	Number of training records	Test ratio	Number of testing records
Adolescent	249	85%	210	15%	38
Child	509	80%	407	20%	102
Adult	1118	70%	782	30%	336

The DNNs used in this study were dense models implemented in Keras [37] on Tensorflow backend. The hyperparameters were selected using grid search [38] after different architectures and hyperparameters were tried. Table 4 describes the DNNs hyperparameters. The number of hidden neurons H was calculated as:

$\displaystyle H=\left|D\right|+\left(0.75\left|I\right|\right)).$ (1)

in which $D$ is dependent variables and $I$ is independent variables as explained by [39].

Each experiment was repeated ten times to apply 10-fold cross-validation. New training and testing sets were generated randomly for each fold to maximise the usage of the limited data available, and the neural nets were retrained with this new data with a clean slate. It is important to highlight that the testing data of each fold was not given to the network during training of that fold.

The DNNs’ training performances (Loss, Accuracy, Mean Squared Error) are provided in Table 5 to demonstrate whether the DNNs were successful in converging the training data. The mean training accuracy over all datasets was 97.32% with 0.027 MSE. Training was more successful with the adult dataset because there was more training data available f the network to analyze.

Table 4

DNNs architecture

Number of input neurons	31
Number of hidden layers	2
Number of hidden neurons	24
Number of output neurons	1
Kernel	Uniform
Hidden neurons activation	Relu
Output neuron activation	Sigmoid
Optimizer	Adam
Dropout	20%
Learning rate $\alpha$	0.001
Mini-batch	128
Loss function	Binary cross entropy

Table 5

DNNs’ training results

Fold	Adolescent data			Child Data			Adult data
	Loss	Accuracy (%)	MSE	Loss	Accuracy (%)	MSE	Loss	Accuracy (%)	MSE
1	0.172871	94.76	0.04500	0.127582	97.30	0.03190	0.044447	99.49	0.00958
2	0.163038	94.76	0.04189	0.125345	97.05	0.03063	0.043761	99.74	0.00895
3	0.161448	95.71	0.04126	0.116378	97.05	0.02896	0.052608	98.98	0.01210
4	0.185780	92.86	0.05078	0.112527	97.79	0.02767	0.046770	99.74	0.00963
5	0.157448	95.71	0.04083	0.127398	96.81	0.03255	0.086983	98.72	0.01703
6	0.149488	95.71	0.03777	0.118668	98.28	0.02843	0.045306	99.23	0.01017
7	0.166950	94.29	0.04242	0.126559	96.81	0.03187	0.041021	99.36	0.00837
8	0.158707	96.19	0.04026	0.125162	97.30	0.03108	0.050155	99.23	0.01104
9	0.131113	96.67	0.03011	0.113886	97.54	0.02839	0.045905	99.49	0.00990
10	0.144262	95.71	0.03525	0.116188	97.79	0.02846	0.043231	99.49	0.00928
Mean	0.159100	95.24	0.04060	0.121000	97.37	0.03000	0.050000	99.35	0.01060

3.4 Evaluation metrics

The following evaluation metrics were used:

1.
Accuracy (%): Accuracy was measured as the ratio of correct classifications to the number of total tests:

$\displaystyle\textit{Accuracy}=\frac{\textit{True Positives}+\textit{True % Negatives}}{n}$ (2)

where $n$ is the number of total tests per fold.
2.
Root Mean Squared Error (RMSE %): RMSE is an indication of how close the DNNs predictions (i.e. actual outputs) are close to the targets. For a binary classification task, it is an indication of the correctness of the generated probabilities. It was calculated as:

$\displaystyle\textit{RMSE}=\sqrt{\frac{\mathop{\sum}\nolimits_{i=1}^{n}(% \textit{Target Output}_{i}-\textit{Actual Output}_{i})^{2}}{n}}$ (3)

Here, since the target output was either 0 or 1, RMSE was already normalised; it was used to present the testing error rate. This metric was only measured during the experiments with the customised DNNs.

Sensitivity (%): sensitivity (and specificity) is a binary classification metric commonly used to verify medical tests and screening studies. It provides the proportion of tests that are correctly classified as true positive. To put it differently, it is the ratio of subjects with ASD correctly identified. It was calculated as:

$\displaystyle\textit{Sensitivity}=\frac{\textit{True Positives}}{\textit{True % Positives}+\textit{False Negatives}}$ (4)
3.
Specificity (%): similar to sensitivity, specificity provides the ratio of tests that are correctly classified as true negative, i.e., the proportion of subjects without ASD that were correctly classified as healthy:

$\displaystyle\textit{Specificity}=\frac{\textit{True Negatives}}{\textit{True % Negatives}+\textit{False Positives}}$ (5)

Figure 2.
Predictive accuracy generated by the machine learning algorithms on the ASD datasets.

4. Results

Figure 2 depicts the accuracy rates of the comparative experiments with different machine learning algorithms explained in Section 3.2 on the adult, adolescent, and child datasets. The classification accuracy results show the superiority of the ANN algorithm on the three datasets respectively when compared with the considered algorithms. The differences in classification accuracy between ANN and C4.5, PART, RIPPER and Bayes Net on the adult, adolescent, and child datasets are (6.48%, 3.88%, 5.77%, 4.43%), (16.53%, 11.29%, 14.92%, 5.65%) and (10.61%, 8.45%, 12.97%, 5.31%) respectively. Clearly the perceptron ANN outperformed the remaining algorithms in terms of accuracy. There was a noticeable result on the adolescent dataset in which ANN significantly outperformed the remaining algorithm. It appears that the neural network algorithm was not highly sensitive to the low number of training instances in the adolescent dataset, whereas Bayes Net, C4.5, RIPPER, and PART algorithms were sensitive when a limited number of instances were present.

Figure 3.

Sensitivity rates generated by the machine learning algorithms on the ASD datasets.

Figures 3 and 4 show the sensitivity and specificity rates produced by the considered machine learning algorithms. The sensitivity rates also show the superiority of the neural network when compared to C4.5, PART, RIPPER, and Bayes Net on all datasets. For instance, ANN produced 6.59%, 3.99%, 5.89%, and 4.59% higher sensitivity rates on the adult dataset respectively than the aforementioned algorithms. The sensitivity rate differences became larger on the adolescent dataset because it has fewer instances than the adolescent dataset. To be exact, ANN achieved 16.50%, 11.30%, 14.90%, and 5.70% higher sensitivity rates respectively on the adolescent dataset than the C4.5, PART, RIPPER, and Bayes Net algorithms. These higher sensitivity rates of ANN were consistent with the predictive accuracy results explained earlier.

Figure 4.

Specificity rates generated by the machine learning algorithms on the ASD datasets.

The specificity rates of the considered algorithms showed consistency with the previous accuracy and sensitivity results – ANN showed higher specificity rates than the remaining algorithms on all datasets. One noticeable result was the specificity rate of the PART algorithm on the adult dataset: it is high when compared with Bayes Net, C4.5, and RIPPER. On this dataset, and according to the confusion matrix results which consist of the true negatives, true positives, false negatives, and false positives, PART was the algorithm with the second least false positive rates, i.e., 11 instances. In other words, 11 instances without ASD traits were wrongly classified on the spectrum by the PART classifier. The false positive rates for the Bayes Net, C4.5, RIPPER, and ANN were 14, 22, 18, and 2, respectively. These results explain why PART achieved a higher specificity than the remaining algorithms, except ANN, on the adolescent dataset.

Overall, the results of the classification accuracy, sensitivity, and specificity rates generated by the machine learning algorithms on the adult, adolescent, and child datasets have shown an acceptable level and superiority of ANN classifiers in all experiments. The classifiers generated against the adult dataset by the considered machine learning algorithms showed good performance, basically because this dataset has more data representation. In the next sub-section, we provide an in-depth analysis of the customised DNN results. More specifically, we will change the testing environment by considering a more realistic split during the training phase of the DNNs.

4.1 Detailed DNN experimental results

Before presenting the DNNs results, it is important to note that the algorithms used in the comparative study employed 90% of the data for training, hence they are expected to produce more accurate classifiers as there was more data available to adjust the parameters. Nevertheless, as explained in Section 3.3, the number of test samples was increased in experiments with the customised DNNs to ensure the validity of the testing procedure; less data was thus available to train the DNNs in each fold. Thus, we investigated the behaviour of the proposed model using a more realistic evaluation setup based on the existing data representations within the three considered datasets.

Table 6
Mean cross-validation testing results

Dataset	Mean RMSE %	Mean accuracy %	Mean sensitivity %	Mean specificity %
Adolescent	22.75	92.37	94.83	89.77
Child	19.14	95.69	97.75	93.70
Adult	11.93	98.69	97.59	99.21
Overall	17.94	95.58	96.72	94.22

Table 7

Type I and II errors per dataset

Dataset	False positive ratio (Type I error)	False negative ratio (Type II Error)
Adolescent	5%	3%
Child	3%	1%
Adult	1%	1%

Figure 5.

Root mean squared error rate cross-validation testing results.

Figure 6.

Accuracy rates cross-validation testing results.

Figure 7.

Sensitivity rates cross-validation testing results.

Figure 8.

Specificity rates cross-validation testing results.

Figure 9.

Specificity and sensitivity statistical distribution analysis.

Figures 5–8 depict the testing results of the experiments with the DNNs, while the cross-validation mean results are provided in Table 6. It can be seen that the highest accuracy rate was achieved on the adult dataset because more training and testing data was available. Likewise, the performances of DNNs trained with the adolescent dataset were lower than the other DNNs since there was not enough data samples available to properly model the data. This lack of sufficient data resulted in outliers during the experiments with the adolescent dataset as can be seen in the results. This problem will be resolved when more screening data samples are available.

Types I and II errors are also provided in Table 7. The data presented is based on the mean of the false reports obtained in all cross-validation folds per dataset. Across all datasets, mean Type I and II errors totalled only 1% which proves the reliability of employing neural networks for detecting autistic traits.

Statistical analysis of specificity and sensitivity according to their standard deviations $\sigma$ is provided in Fig. 9. After removing one outlier observation from each metric (i.e. 3% of the overall observations), it can be seen that at least 89% of the observations of both metrics fall between $\pm 1\sigma$ and the remaining observations are within a $\pm 2\sigma$ . Thus, it can be concluded that both test observations follow a normal distribution.

5. Conclusions

ASD is considered to be a growing neurodevelopment condition worldwide that requires efficient and accurate self-administered screening tools. These tools can be utilised by parents, physicians, medical staff, caregivers, and teachers among others, to screen for autistic traits at an early stage to expedite the referral process, potentially reducing further development of the condition. Existing screening tools typically rely on characterised behavioural features (items), and a scoring function that tallies scores based on the responses given to derive the screening decision. However, most of the existing tools are criticised for being partly subjective to the scoring function and dependent on the domain expert’s knowledge and skills. Therefore, a promising direction to improve the screening process of autism is to intelligently build classification systems using machine learning that could potentially replace current scoring functions. This paper has thoroughly investigated machine learning technology to seek its applicability in addressing the crucial problem of ASD traits’ detection and to reveal its true performance. We have examined the possibility of using DNNs, instead of scoring functions, in autism screening methods and measure the ASD detection process with reference to different evaluation metrics. The DNN model was implemented and then evaluated on three new screening datasets (adult, adolescent, and child) that were collected using a screening tool called ASDtests. To measure the true performance of the DNN model, we compared the classifiers generated from the adult, adolescent, and child datasets with different machine learning algorithms (Bayes Net, C4.5, PART, RIPPER, multilayer perceptron) and with respect to sensitivity, specificity, and predictive accuracy. The ANN classifiers derived from the considered datasets showed superiority over those of Bayes Net, C4.5, PART, and RIPPER respectively and with respect to accuracy, sensitivity, and specificity rates. These results clearly reveal that neural networks can be embedded within ASD existing screening to act as detection system for ASD traits. These systems could assist physicians, parents, and medical staff, among others, in the early identification of ASD traits, thus facilitating access to the needed support systems for the social, physical, and educational well-being of the patient and family. In the near future, we are going to investigate autism diagnosis using machine learning hoping to develop a new knowledge base that can help clinicians, psychiatrists, and psychologists in the formal ASD diagnostic process.

A limitation of this study is understating the impact of the DNN technique used in this study in comparison to a clinical diagnosis. It would be helpful to analyse how consistent the specificity and sensitivity of this technique are with formal diagnosis approaches such as ADI-R, ADOS, 3DI, etc., and where there are inconsistencies, whether subjectivity of a formal diagnosis impacted the outcomes.

Footnotes

Conflict of interest

The authors declare that they have no conflict of interest.

References

Fitzgerald

. The clinical gestalts of autism: Over 40 years of clinical experience with autism. In autism – paradigms, recent research and clinical applications. In Tech; 2017.

Weir

Allison

Baron-Cohen

. Autism in children: improving screening, diagnosis and support. Prescriber. Jan 2020; 31(1): 20-24. doi: 10.1002/psb.1816.

Crane

Chester

Goddard

Henry

Hill

, Experiences of autism diagnosis: a survey of over 1000 parents in the United Kingdom. Autism Int J Res Pract. Feb 2016; 20(2): 153-162. doi: 10.1177/1362361315573636.

Thabtah

. Autism spectrum disorder screening: machine learning adaptation and DSM-5 fulfillment. In proceedings of the 1st International Conference on Medical and Health Informatics 2017 – ICMHI ’17, 2017; pp. 1–6. doi: 10.1145/3107514.3107515.

Marlow

Servili

Tomlinson

. A review of screening tools for the identification of autism spectrum disorders and developmental delay in infants and young children: recommendations for use in low- and middle-income countries. Autism Research. 2019. doi: 10.1002/aur.2033.

Abbas

Garberson

Glover

Wall

. Machine learning approach for early detection of autism by combining questionnaire and home video screening. J Am Med Informatics Assoc. Aug 2018; 25(8): 1000-1007. doi: 10.1093/jamia/ocy039.

Bakian

Bilder

Carbone

Hunt

Petersen

Rice

. Brief report: independent validation of autism spectrum disorder case status in the Utah Autism and Developmental Disabilities Monitoring (ADDM) Network Site. J Autism Dev Disord. Mar 2015; 45(3): 873-80. doi: 10.1007/s10803-014-2187-6.

Alwakeel

Alhalabi

Aggoune

Alwakeel

. A machine learning-based WSN system for autism activity recognition. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Dec. 2015; pp. 771-776. doi: 10.1109/ICMLA.2015.46.

Liu

Raj

Zou

. Efficient autism spectrum disorder prediction with eye movement: a machine learning framework. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). Sep 2015; pp. 649-655. doi: 10.1109/ACII.2015.7344638.

10.

Bone

Bishop

Black

Goodwin

Lord

Narayanan

. Use of machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and multi-instrument fusion. J Child Psychol Psychiatry. Aug 2016; 57(8): 927-937. doi: 10.1111/jcpp.12559.

11.

Kosmicki

Sochat

Duda

Wall

. Searching for a minimal set of behaviors for autism detection through feature selection-based machine learning. Transl Psychiatry. Feb 2015; 5(2): e514-e514. doi: 10.1038/tp.2015.7.

12.

Shahamiri

Thabtah

. Autism AI: a new autism screening system based on artificial intelligence. Cognit Comput. Jul 2020; 12(4): 766-777. doi: 10.1007/s12559-020-09743-3.

13.

Metcalfe

McKenzie

McCarty

Murray

. Screening tools for autism spectrum disorder, used with people with an intellectual disability: a systematic review. In Research in Autism Spectrum Disorders. Jun 2020; 74: 101549. doi: 10.1016/j.rasd.2020.101549.

14.

Shahamiri

Thabtah,

. An investigation towards speaker identification using a single-sound-frame. Multimed Tools Appl. 2020; 7: 31265-31281. doi: 10.1007/s11042-020-09580-4.

15.

Hosseinzadeh

Koohpayehzadeh

Omar Bali

Afshin Rad

Souri

Mazaherinezhad

et al. A review on diagnostic autism spectrum disorder approaches based on the Internet of Things and Machine Learning. J Supercomput. Jun 2020; pp. 1-19. doi: 10.1007/s11227-020-03357-0.

16.

Abdelhamid

Thabtah

. Associative classification approaches: review and comparison. J Inf Knowl Manag. Sep 2014; 13(3): 1450027. doi: 10.1142/S0219649214500270.

17.

Fischbach

Lord

. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron. Oct 2010; 68(2): 192-195. doi: 10.1016/J.NEURON.2010.10.006.

18.

Wall

Kosmicki

DeLuca

Harstad

Fusaro

, Use of machine learning to shorten observation-based screening and diagnosis of autism. Transl Psychiatry. Apr 2012; 2(4): e100-e100. doi: 10.1038/tp.2012.10.

19.

Sato

Hoexter

Oliveira

Jr. Brammer

Consortium

MRC AIMS

Murphy

, et al. Inter-regional cortical thickness correlations are associated with autistic symptoms: A machine-learning approach. J Psychiatr Res. Apr 2013; 47(4): 453-459. doi: 10.1016/j.jpsychires.2012.11.017.

20.

Geschwind

Sowinski

Lord

Iversen

Shestack

Jones

, et al. The autism genetic resource exchange: a resource for the study of autism and related neuropsychiatric conditions. Am J Hum Genet. Aug 2001; 69(2): 463-6. doi: 10.1086/321292.

21.

Fischl

. FreeSurfer Neuroimage. Aug 2012; 62(2): 774-781. doi: 10.1016/j.neuroimage.2012.01.021.

22.

Vellanki

Duong

Gupta

Venkatesh

Phung

. Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data. Knowl Inf Syst. Apr 2017; 51(1): 127-157. doi: 10.1007/s10115-016-0971-7.

23.

Gupta

Phung

Venkatesh

. A slice sampler for restricted hierarchical beta process with applications to shared subspace learning. Oct 2012, Accessed: Oct 12, 2018. [Online]. Available: http//arxiv.org/abs/1210.4855.

24.

Griffiths

Ghahramani

. The Indian buffet process: an introduction and review. J Mach Learn Res. 2011; 12: 1185-1224, Accessed: Oct. 12, 2018. [Online]. Available: https//dl.acm.org/citation.cfm?id=2021039.

25.

Wolfson

. Boston Autism Consortium searches for genetic clues to autism’s puzzle. Chem Biol. Feb 2007; 14(2): 117-8. doi: 10.1016/j.chembiol.2007.02.002.

26.

Kuhlthau

Orlich

Hall

Sikora

Kovacs

Delahaye

, et al. Health-related quality of life in children with autism spectrum disorders: results from the Autism Treatment Network. J Autism Dev Disord. Jun 2010; 40(6): 721-729, doi: 10.1007/s10803-009-0921-2.

27.

Weigle

Banks

. Analysis of eye-tracking experiments performed on a Tobii T60, Jan 2008; 6809: 680903. doi: 10.1117/12.768424.

28.

George

Joseph

. Text classification by augmenting Bag of Words (BOW) representation with co-occurrence feature. IOSR J Comput Eng. 2014; 16(1): 34-38, doi: 10.9790/0661-16153438.

29.

Maenner

Yeargin-Allsopp

Van Naarden Braun

Christensen

Schieve

. Development of a machine learning algorithm for the surveillance of autism spectrum disorder. PLoS One. Dec 2016; 11(12): e0168224. doi: 10.1371/journal.pone.0168224.

30.

U.S Department of Health and Human Services. Autism and Developmental Disabilities Monitoring (ADDM) Network. https://www.cdc.gov/ncbddd/autism/data.html (accessed Oct. 12, 2018).

31.

American Psychiatric Association, Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association, 2013.

32.

Guillén

Jensen

Edelson

. A machine learning approach for identifying subtypes of autism. In Proceedings of the ACM International Conference on Health Informatics – IHI ’10, 2010, p. 620, doi: 10.1145/1882992.1883091.

33.

Cohen

Singer

. Context-sensitive learning methods for text categorization. ACM Trans Inf Syst. Apr 1999; 17(2): 141-173. doi: 10.1145/306686.306688.

34.

Thabtah

Peebles

. A new machine learning model based on induction of rules for autism detection. Health Informatics J. 2019. doi: 10.1177/1460458218824711.

35.

Thabtah

. An accessible and efficient autism screening method for behavioural data and predictive analyses. Health Informatics J. Sep 2018; p. 146045821879663. doi: 10.1177/1460458218796636.

36.

Allison

Auyeung

Baron-Cohen

. Toward brief ‘red flags’ for autism screening: the short autism spectrum quotient and the short quantitative checklist in 1,000 cases and 3,000 controls. J Am Acad Child Adolesc Psychiatry. Feb 2012; 51(2): 202-212.e7. doi: 10.1016/j.jaac.2011.11.003.

37.

Chollet

, Keras. GitHub, 2015.

38.

LeCun

Bengio

Hinton

. Deep learning. Nature. 2015; 521(7553): 436-444. doi: 10.1038/nature14539.

39.

Shahamiri

. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2021; 29: 852-861. doi: 10.1109/TNSRE.2021.3076778.

A new classification system for autism based on machine learning of artificial intelligence

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSION:

Keywords

1. Background

3. Methods

3.1 Datasets

Table 1 Features in the child, adolescent, and adult datasets

3.3 DNN experimental setup

Table 3 Training and testing data breakdown per dataset

Table 6 Mean cross-validation testing results

Footnotes

Conflict of interest

References

Table 1
Features in the child, adolescent, and adult datasets

Table 3
Training and testing data breakdown per dataset

Table 6
Mean cross-validation testing results