ADET MODEL: Real time autism detection via eye tracking model using retinal scan images

Abstract

Background

Deficits in concentration with social stimuli are more common in children affected by autism spectrum disorder (ASD). Developing visual attention is one of the most vital elements for detecting autism. Eye tracking technology is a potential method to identify an early autism biomarker based on children's abnormal visual patterns.

Objective

Eye tracking retinal scan path images can be generated by eyeball movement during the time of watching the screen and capture the eye projection sequences, which helps to analyze the behavior of the children. The Shi-Tomasi corner detection methodology uses open CV to identify the corners of the eye gaze movement in the images.

Methods

In the proposed ADET model, the corner detection-based vision transformer (CD-ViT) technique is utilized to diagnose autism at an early stage. Generally, the transformer model divides the input images into patches, which can be fed into the transformer encoder process. The vision transformer is fine-tuned to resolve binary classification issues once the features are extracted via remora optimization. Specifically, the vision transformer model acts as the cornerstone of the proposed work with the help of the corner detection technique. This study uses a dataset with 547 eye-tracking retinal scan path images for both autism and non-autistic children.

Results

Experimental results show that the suggested ADET frameworkachieves a better classification accuracy of 38.31%, 23.71%, 13.01%, 1.56%, 18.26%, and 44.56% than RM3ASD, MLP, SVM, CNN, SVM, and our proposed ADET methods.

Conclusions

This screening method strongly suggests that it be used to assist medical professionals in providing efficient and accurate autism detection.

Keywords

eye tracking autism vision transformer eye ball movement image

1. Introduction

Autism is the most prevalent kind of Pervasive Developmental Disorder (PDD). Communication and social issues, as well as the emergence of limited and recurring behavioral patterns, can be characterized by ASD.¹ It is projected that 10 million individuals in India are believed to have ASD. The WHO study on 2022 figures states that one in every 100 youngsters globally has ASD.² The greater likelihood of developing autism has been scientifically associated with a variety of factors, including preterm delivery, environmental factors, genetics, maternal health concerns, and advanced parental age.³ Autism may start to show symptoms in some children as early as twelve months of age, while it may take up to 24 months or more for other children.⁴ Generally, the early symptoms of autism arise between the ages of 1 and 2. Lack of eye contact, refusal to become more responsive to noise, and inability to make eye contact are some early indicators of autism in children younger than one year of age.⁵

Some two and three-year-old children with ASD show the following symptoms: limited communication, hypersensitivity or decreased sensory stimuli, difficulty following simple instructions such as rejection of affection, overactive or distractive behavior, and recurrent actions like flapping hands, spinning the rotation movement, and creating unusual noises. The typical co-occurrence of ASD with other neurodevelopmental problems and medical comorbidities makes it difficult to detect and diagnose with accuracy.^6,7 There are several research investigations that have demonstrated the positive and negative consequences of conducting evaluations without objective diagnostic testing.⁸ Missing out on opportunities for appropriate professional education and training, which has resulted in missed opportunities for detecting autism in its early stages.⁶ Several research investigation techniques have used various input modes, such as brain signal analysis, brain image analysis, biometrics, sensory inputs, facial expression images, and eye tracking models.⁹

Eye tracking studies are discussed by many researchers, and they make the identification of ASD promising. It collects eye movement and eye gaze data from toddlers and preschool children and serves to determine the direction of eye gaze and the latency of eye movements.¹⁰ The person who is an ASD child has a different eye gaze and latency movement when collecting data from different devices. The tracker device's recorded eye movements are sent as diagnostic data.^11–14 Eye movements can be categorized into four types: scan routes, fixation, blink rate, and saccades. Recently, eye-tracking technology has diagnosed ASD with new techniques.¹⁵ Normally, eye contact is the most important part of the field of non-verbal and social communication. Children diagnosed with ASD typically exhibit abnormal movements of the eyes.¹⁶

Eye movement-based ASD detection has been extensively studied, yet there are still certain issues. At first, existing technology majorly utilized statistically related analysis to carry out the eye gaze movements of autism and non-autistic children.¹⁷ Even though the requirements of the analysis are too high, it gives a better classification result. Second, many of the algorithms use only the fixation values, which define how many times the eye movements are stationary.¹⁸ But the actual movement of the eyes is a dynamic process, and the relationshipbetween the eyes’ fixations also provides crucial data.¹⁹

Many authors have developed many models for detecting ASD among children. In 2019, Y. Tao and M.L. Shyu²⁰ determined the SP-ASDNet, which utilizes both the LSTM and CNN networks to detect ASD. It achieves 74.22% classification accuracy, and it takes more computational power. In 2019, M. Krol et al.,²¹ created a powerful algorithm by comparing the eye-movement sequence, statistically comparing the cross-validated accuracies, and finding the difference between temporal scan path features. By re-running an entire analysis, an explicit dimension reduction function cannot possibly add the new high-dimensional points.

In 2019, Eraslan et al.,²² designed Scan path Trend Analysis (STA), which combines a collection of eye-movement pathways into a single representative path and determines the trending path of a group of users on a webpage. The dataset in this case only includes six web pages, which is insufficient to fully explore the impact of characteristics. In 2020, S. Zhang et al.,²³ established a strategy for the combined analysis of children's eye tracking and EEG. It analyzes the connection between eye-tracking and EEG records and emphasizes their functional relationship. The experimental result shows a classification accuracy of 95% for identifying the ASD children. Multi-model fusion analysis always chooses simple fusion strategies; thus, the categorization model's performance is very limited.

In 2020, Roth et al.,²⁴ demonstrate the interactive dyadic system, which combines more communication channels for recording non-verbal behavior, and find the differences using computer aided diagnosis. Limited study samples and random samplings regarding the age factor as well as gender distribution do not provide sustainable results. In 2021, Akter et al.,²⁵ created a k-means clustering algorithm by using an eye-tracking dataset and gave stable results and evaluation metrics that justified the performance of different classifiers. This will be helpful to diagnose the ASD for better treatment. But the age boundaries of the children will help to increase the investigation and detect autism more precisely.

In 2022, Ahmed et al.,¹⁹ used eye-tracking scan path images for diagnosing ASD by developing three artificial intelligence techniques such as hybrid models, deep learning and machine learning. Here, the dataset is balanced and the model was adjusted as well as modified to extract the deep features and solve the overfitting problem. Misclassification arises very rarely, and the computational cost of the hybrid model is too high. In 2022, Gaspar et al.,²⁶ took the scan path pictures of 219 ASD and 328 normal children and in order to optimize the kernel extreme machine learning model, a metaheuristic approach was used. It also explains the construction of the Giza Pyramids for classification with an accuracy of 98.8%. More participants are included and eye-movements are recorded in future processes.

In 2022, G. Wan et al.,²⁷ proposed the fixation duration of TD and ASD children using 10-s female speaking videos. This approach does not identify functional level indicators like adaptive and IQ behavior. Moreover, the sample size is very minimal and it cannot find any specific age group of people. According to the previous studies, most of the existing techniques used eye tracking to show that children with autism had distinct gaze patterns from typical kids. The approaches discussed above have certain shortcomings, including less classification accuracy, high computational cost, and less computational power in autism detection. Therefore, in this paper a novelADET model has been proposed to diagnose autism at an early stage using the corner detection-based vision transformer (CD-ViT) technique.

This study's primary contribution is:

Collecting the eye-movement scan path images from autism and non-autistic children. Pre-processing is the most important part, which balances the images present in the dataset and enhances the eye-movement images.

Analyze the Shi-Tomasi corner detection method using open CV and identify the corners of the eye gaze movement in the images.

Create the vision transformer model to train the data set and it shows a promising result, demonstrating better performance on eye-based image classification tasks.

Finally, compare the model with different algorithms; they give better performance.

The following is the arrangement of the remaining portions of the paper: Section 2 presentsan overview of the pertinent research. Section 3 presents a description of the proposedtechnique. Section 4 conducted the experimental evaluation. Section 5 presents a discussion of this study.

2. Proposed methodology

The methodology explains a brief description of the autism eye retinal scan path image classification process. The input eye retinal scan path image is initially preprocessed by minimizing or maximizing the intensity level of the image, avoiding redundant images and removing the noise. The second stage is to identify the corners of the images that were marked separately which will be helpful to train the model easily and effectively.

The third stage is the feature extraction stage and the features are extracted using the vision transformer model, which usually splits the images into patches. Each patch separates the tokens and trains the image based on self-attention-based approaches. Moreover, the corner vision transformer model differentiates the eye retinal scan path features and classifies the model. The proposed ADET model corner detection image-based vision transformer (CD-ViT) technique is explained in Figure 1.

Figure 1.

The proposed ADET model architecture.

2.1 Splitting the dataset images

The dataset contains eye tracking and retinal scan path images, and it is implemented for detecting autism in children. Generally, there are 547 images, which are divided into autism and non-autism images. The split image is described in the following Table 1.

Table 1.
Splitting the eye-tacking retinal scan path image dataset.

Training and Validation image (80%)

Eye Tracking retinal scan path image Total images Training Image (80%) Validation Image (20%) Testing Image (20%)

ASD 219 140 35 44

Non-ASD 328 207 55 66

		Training and Validation image (80%)
ASD	219	140	35	44
Non-ASD	328	207	55	66

2.1.1 Shi-Tomasi corner detection

Enhancement of the traditional corner identification algorithm, the Harris algorithm, is achieved via the Shi-Tomasi (ST) algorithm. In general, it produces better corners than the Harris algorithm. We will describe the conceptual foundations of the ST algorithm in this part. The Harris algorithm's primary method is to traverse around the image using a local window and determine whether the grayscale (gs) values have changed significantly. If the gs values inside the window (as shown on the gradient map) exhibit notable differences, there is a corner in the area where this window is situated.

At first, a mathematical model is established to identify the windows that will significantly alter the grayscale values. The values of the grayscale pixels at a given spot in the grayscale image are used as the starting value when the window's centre is positioned there. If the window is moved slightly in both the x and y axes, the pixel gs value at that place shows the change in gs values caused by the movement. In the simplest case, when every pixel in the window represents an average filtering kernel with a weight of 1, the formula for the variance in pixel gs values that arises from rotating the window in different directions is as follows:

R (s, t) = \sum_{x, y} z (x, y) [I (x + s, t + s) - r (x, y)]^{2}

(1)

Following the expansion with Taylor's formula, the approximation is provided by:

R \approx [s, t] \sum z (s, t) [(\begin{matrix} r_{x}^{2}, r_{x} r_{y} \\ r_{x} r_{y}, r_{y}^{2} \end{matrix})] (\begin{matrix} s \\ t \end{matrix})

(2)

The following expression can roughly be obtained for minor local displacements [s, t]:

R \approx [s, t] N (\begin{matrix} s \\ t \end{matrix})

(3)

N is a 2 × 2 matrix that was produced by taking the image's derivatives:

N = \sum_{x, y} z (s, t) [\begin{array}{cc} r_{x}^{2} & r_{x}, r_{y} \\ r_{x}, r_{y} & r_{y}^{2} \end{array}]

(4)

When the matrix N is diagonalized, the X and Y axis’ grayscale change rates are represented by the eigenvalues

γ_{1}

and

γ_{2}

respectively.

N = \sum_{x y} z (s, t) [\begin{array}{cc} r_{x}^{2} & r_{x} r_{y} \\ r_{x} r_{y} & r_{y}^{2} \end{array}] = [\begin{array}{cc} γ_{1} & 0 \\ 0 & γ_{2} \end{array}]

(5)

The corner response function for the Harris corner identification technique is:

A = γ_{1} γ_{2} - G (γ_{1} + γ_{2})^{2}

(6)

To find local maxima in A, the corner response function A: A > threshold, is a thresholder as part of the Harris corner detection technique. Shi-Tomasi's approach is an enhancement on Harris's, in whicha point is deemed a corner if the minimum eigenvalue (

γ_{1}

γ_{2}

) is greater than theminimal value.

The ST corner detection algorithm's corner response function is:

A = min (γ_{1}, γ_{2})

(7)

Figure 2 indicates the functions of the ST corner detection algorithm. This function uses:

Initially, this method uses either Shi-Tomasi or Harris Corner to determine the corner quality score at each pixel.

Subsequently, a non-maximum suppression is carried out using this function, keeping the local maximums in the 3 × 3 neighborhood.

Subsequently, all corners with a quality score below $Q u a l i t y L e v e l * m a x_{x, y} Q u a l i t y S c o r e (x, y)$ are eliminated. This is the best corner score, or $m a x_{x, y} Q u a l i t y S c o r e (x, y)$ . In this case, all corners with a quality score of less than 15 are discarded if the best corner has a quality score of 1500 and a quality level of 0.01.

At this point, the quality score is used to sort every remaining corner in descending order.

When a stronger corner is present at a distance shorter than maxDistance, Function discards those corners.

Figure 2.

Function of Shi-Tomasi corner detection algorithm.

2.1.2 Patch the dataset images

The corner detection can be identified by using a computer vision application, and it has a default function. To detect the corners using Shi-Tomasi by identifying the scoring function. To calculate the scoring function R is:

R = min (λ_{1}, λ_{2})

The pixel point is regarded as a corner if the R value is higher than the threshold value. The values of

λ_{1}, λ_{2}

are the eigenvalues of the resultant matrix. The output of the image is named as

E_{C o r n e r} (x)

The eye image E (x) is converted after Shi-Tomasi corner detection is,

E (x) = E_{C o r n e r} (x)

Figure 3 represents the ADET model corner detection of eye-tracking retinal scan path images. It identifies all the corners where the eyeball movement is moving.

Figure 3.

Eye tracking retinal scan path image after applying ADET model corner detection technique.

2.1.3 Positional embedding

Positional embedding keeps the positional information of a collective embedded patch and also indicates the sequential position of all the patches. Finally, it can create a positional number before every patch in a one-dimensional order. The resultant sequence of positional embedding gives the input to the transformer encoder.

2.1.4 Transformer encoder

The positional embedding patches initially add the learnable class token X_c. The learning embedded is a sequence of embedded patches with class tokens. The starting encoder layer Z₀ is described in the equation as follows:

Z_{0} = [X_{c}; X_{p}^{1} E; X_{p}^{2} E; \dots; X_{p}^{N} E] + E_{P I}

(8)

where,

\begin{matrix} E \in R^{(P^{2} . C) x D} \end{matrix}

(9)

\begin{matrix} E_{P I} \in R^{(N + 1) x D} \end{matrix}

(10)

Here

X_{c}

is a class label token,

X_{p}^{N}

is the patch images

N \in 1 t o K

and

E_{P I}

denotes how the positional information is stored in a sequential order.

Every transformer encoder always needs a class token at the 0^th position while using a pre-trained model. When the patch image is sent as an input sequence to the encoder, one class token must be initiated as the first patch information. There are several identical layers in the encoder process. Each layer creates two primary blocks named theFFN block and Multi-head Self Attention block.

2.1.5 Feed-forward network (FFN)

This is the transformer encoder's second block, made up of two fully connected layers that include GELU activation functionality. Every one of the two encoder layer blocks is preceded by the layer of normalization (LN). The output is calculated using the following formulas by applying residual connections:

\begin{aligned} Z_{l}^{'} & = M S A (L N (Z_{l - 1})) + Z_{l - 1}, l = 1 \dots L \end{aligned}

(11)

\begin{aligned} Z_{l} & = F F N (L N (Z_{l}^{'})) + Z_{l}^{'}, l = 1 \dots L \end{aligned}

(12)

2.1.6 Remora optimization

The remora optimization algorithm (ROA) is primarily inspired by the remora whale, which is a clever marine navigator. The algorithm consists of two stages: exploitation and exploration. To construct the numerical expression, remora behaviors like mindful eating and free travel are utilized. Mode switching is accomplished with a single short step trial, and the remora factor, which produces convergence, can be employed to increase the optimization's precision. Decisions on mode switching must take into account phases like experience, thoughtful eating, and free travel. These tactics aid in the ROA algorithm's pursuit of ideal outcomes. The ROA algorithm's steps are listed below:

(i)
Initialization
Remora is the greatest option, and in its current standing, the variables in the search space are represented by the variable D. Remora moves in different positions according to the size of the pool. The formula for the present position is $C_{j} = (C_{j 1}, C_{j 2}, \dots, C_{j d})$ . The symbol d represents the size of a swimming remora, whereas i represents the total number of remoras. Similarly, $C_{o p} = (C_{1} , C_{2} , \dots, C_{d} )$ can be used to indicate the algorithm's optimal solution. Additionally, every potential solution demonstrates a unique fitness value. Additionally, it can be written as $E (C_{j}) = E (C_{j 1}, C_{j 2}, \dots, C_{j d})$ , where the E is used to calculate the value of the fitness function. The ideal fitness of each remora location is shown by the formula $E (C_{o p}) = E (C_{1} , C_{2} , \dots, C_{d} )$ .
SFO Strategy
When the remora is attached to the swordfish, its position can be updated and expressed as follows,
$C_{j}^{s + 1} = C_{o p}^{s} [r a n d (0, 1) * (\frac{C_{o p}^{s} + C_{r a n d}^{s}}{2}) - C_{r a n d}^{s}]$
(13)
Here, S is the highest number of iterations, s is utilized to indicate the number of iterations that are still in progress, and $C_{j}^{s + 1}$ is the current position of the remora with its number(j). The best position of the remora is foundis $C_{r a n d} and C_{o p}^{s}$ represents the remora's random position. These variables are used to make sure the algorithm can perform a worldwide lookup. Moreover, the fitness value of the current iteration is obtained from the experience assault step, which is the basis for the random selection of remora.
Experience Attack
Remora's change of host can be estimated using this phase. It might be expressed as follows:
$C_{a t t} = C_{j}^{s} + (C_{j}^{s} - C_{p r e}) * r a n d l$
(14)
$C_{p r e}$ can be used to indicate the position of the previous generation, while $C_{a t t}$ can be used to indicate the tentative step. This is the movement for global search, and the randl can be selected suitably. The purpose of thisstage is to compare the fitness values of the attempted solution $E (C_{a t t})$ , and the current solution $E (C_{j}^{s})$ . In addition, the value of $E (C_{a t t})$ is tiny,meaning that $E (C_{j}^{s}) > E (C_{a t t})$ . Remora achieves local optimization by implementing a novel feeding method. The previous solution, which can be stated as follows, if the associated solution's fitness function is greater than the current one, it will be utilized once more.
$E (C_{j}^{s}) < E (C_{a t t})$
(15)
(ii)
Eat Thoughtfully (Exploitation)
WOA Strategy

Remora's bond with the whale serves as the basis for this, and the location updates are stated as follows,
$\begin{aligned} C_{j + 1} & = M * e^{δ} * \cos (2 π δ) + C_{j} \end{aligned}$
(16)

$\begin{aligned} δ & = r a n d (0, 1) * (a - 1) + 1 \end{aligned}$
(17)

$\begin{aligned} a & = - (1 + \frac{s}{S}) \end{aligned}$
(18)

$\begin{aligned} M & = | C_{o p} - C_{j} | \end{aligned}$
(19)
The remora's position is dependent on the whale, L is the distance (frequently appropriate solution) between the prey and hunter, δ is the chosen random number, and the random number a lie between [−1, 1] and [−2, −1], decreasing linearly.
Host Feeding
This subsection belongs to the stage of exploitation. The host's position space can be used to compress the solution space for this stage, which is expressed as follows:
$\begin{aligned} C_{j}^{s} & = C_{j}^{s} + L \end{aligned}$
(20)

$\begin{aligned} L & = T * (C_{j}^{s} - D * C_{o p}) \end{aligned}$
(21)

$\begin{aligned} T & = 2 * R * r a n d (0, 1) - R \end{aligned}$
(22)

$\begin{aligned} R & = 2 * (1 - \frac{s}{J_{M a x}}) \end{aligned}$
(23)
The host and remora's volume spaces are proportionate to the small movement step, denoted as L. In order to locate the remora more precisely, T is used in the solution space. Figure 4 indicates the flowchart of ROA.

Figure 4.
Flow diagram of ROA.
2.1.7 Classification layer

After the encoder's output is included in the classification task, the class labels are determined as tokens by the softmax activation function of the model.

y = L N (Z_{L}^{0})

(24)

During pre-training, FFN usually represents the classification task and it can be replaced by a fine-tuning stage. Finally, the softmax function gives the probability for the classification accuracy of non-autisticand autism children using eye-tracking retinal scan path images.

Eye tracking retinal scan path images usually create a sequence of consecutive fixations and saccades generated from the path of eye movement over a specific time. Detecting the corner with the computer vision technique extracts all the corners from the image, and it determines the contents of the image. Detection of eye movement from various places helps to detect and differentiate the autism child from a normal child. Here, Shi-Tomasi corner detection identifies the eye movement image points by analyzing the variations. The corner points are often categorized by using the intensity values.

2.1.8 ASD and non-ASD

Here, two different participants are analyzed and trained on their retinal scan path eye movement images E, which are the combination of both autism and non-autism images. The autism-based eye retinal scan path images are represented as A, and the non-autism-based eye retinal scan path images are represented as N.

E (x) = {\begin{array}{ll} A (x), & for autism eye image \\ N (x), & for non-autism eye image \end{array}

(25)

where A (x) represents autism image and N (x) represents non-autism image.

Figure 5 shows the eye-tracking retinal scan path images of ASD and non-ASD. The retinal scan path images are usually a sequence of consecutive fixation points and eyeball movements through a specific period of time, and they cannot be overlapped by themselves.

Figure 5.

Eye-tracking retinal scan path images of ASD and non-ASD.

2.2 Pipeline process of preprocessing

The purpose of preprocessing is to enhance the quality and appearance of an image. Moreover, it eliminates certain portionsthat are unnecessary and thus minimizes the size of an image. It contains four distinct operations, such as the conversion of an RGB to grayscale image, the application of the Color Stretching (CS) operation, cropping an image, and down sampling an image, which resizes the image to the appropriate size. Figure 6 shows a pipeline process of preprocessing stages.

Figure 6.

Pipeline process.

2.2.1 Step 1: original retinal scan path eye image

The original retinal scan path eye image dataset contains 219 autism images and 328 non-autism images. The dataset is symbolized as $E_{1}$ , and each image in the dataset is symbolized as e₁(x) € $E_{1}$ , x = 1, 2, …, | $E$ | = 547.

E_{1} = {e_{1} (1), e_{1} (2), \dots, e_{1} (i), \dots, e_{1} (| E |)}

(26)

The size of the eye retinal scan path image is

Size {e_{1} (i)} = {W_{1} * H_{1} * C_{1}}

Here, W₁= 640, H₁= 480, C₁= 3.

Generally, the original retinal scan path eye images are not appropriate to train the neural network applications. Because the original images are color images, and have redundant features. The size of the color image is too big, and it takes up more storage while training. If, there is any background noise, it can also be removed in the pre-processing stage.

2.2.2 Step 2: convert color images to grayscale

Transforming the eye retinal scan path color images to grayscale image, and it maintains the brightness of the image and thus the grayscale image is symbolized as $E_{2}$ , then

\begin{aligned} E_{2} & = G r a y I m a g e (E_{1}) \\ E_{2} & = {e_{2} (1), e_{2} (2), \dots, e_{2} (i), \dots, e_{2} (| E |)} \end{aligned}

(27)

Every image in

E_{2}

contains a grayscale image. Now, there are no changes in the size of the image, but the color value has changed. After converting a color image to a grayscale image, the eye retinal scan path image's size is

Size {e_{2} (i)} = {W_{2} * H_{2} * C_{2}}

Here, W₂= 640, H₂= 480, C₂= 1.

2.2.3 Step 3: apply contrast stretching (CS)

Contrast stretching (CS) is a method for improving images, that can improve the image by using intensity values. This is also a kind of normalization process that can change the range of pixel intensity values, and thus the stretching image is symbolized as $E_{3}$ . Suppose, the ith image $e_{2} (i), i = 1, 2, \dots, | E |$ is applied in contrast stretching, then the highest and lowest intensity values of the image are calculated.

To calculate the minimum gray scale values:

μ_{min} (i) = {min}_{x = 1}^{W_{2}} {min}_{y = 1}^{H_{2}} e_{2} (i | x, y)

(4.A)

To calculate the maximum gray scale values:

μ_{max} (i) = {max}_{x = 1}^{W_{2}} {max}_{y = 1}^{H_{2}} e_{2} (i | x, y)

(4.B)

Here, the coordinates of the pixel are based on the height and width of the image. After contrast stretching, the image

e_{3} (i)

is obtained as follows:

e_{3} (i) = \frac{e_{2} (i) - μ_{min} (i)}{μ_{max} (i) - μ_{min} (i)}

(4.C)

e_{3} (i)

all images are stretched by using contrast stretching,

\begin{aligned} E_{3} & = C S (E_{2}) \\ E_{3} & = {e_{3} (1), e_{3} (2), \dots, e_{3} (i), \dots, e_{3} (| E |)} \end{aligned}

(28)

The size of the image after contrast stretching has not changed. The size of the eye retinal scan path image is

Size {e_{3} (i)} = {W_{3} * H_{3} * C_{3}}

Here, W₃= 640, H₃= 480, C₃= 1.

2.2.4 Step 4: crop the image

Cropping the images by eliminating the undesirable portion of an image, the cropped image is symbolized as $E_{4}$ .

E_{4} = C r o p (E_{3})

Parameters can be used to crop the image by removing pixels from left, top, right, and bottom. It can be denoted as c_l, c_t, c_r, and c_b. Set the crop values in units of pixels from left, top, right, and bottom.

\begin{aligned} c_{t} & = c_{b} = 60; c_{l} = c_{r} = 140 \\ E_{4} & = {e_{4} (1), e_{4} (2), \dots, e_{4} (i), \dots, e_{4} (| E |)} \end{aligned}

(29)

The size of the image after cropping has changed. The size of the eye retinal scan path image is,

Size {e_{4} (i)} = {W_{4} * H_{4} * C_{4}}

Here, W₄= H₄= 360, C₄= 1.

2.2.5 Step 5: down sampling an image

Down sampling reduces the size of each image, and each image is resized and thus the down sampled image is symbolized as E₅.

\begin{aligned} E_{5} & = D o w n S a m p l e d (E_{4}) \\ = (E_{4}, [256, 256]) \\ E_{5} & = {e_{5} (1), e_{5} (2), \dots, e_{5} (i), \dots, e_{5} (| E |)} \end{aligned}

(30)

Where, Down Sampled(E₄): E₄→E₅ means down sampling function. Here, E₄ is the original cropped image and E₅ is the down sampled image. In this study, the size of the image after down sampling is changed. The size of the eye retinal scan path image is

Size {e_{5} (i)} = {W_{5} * H_{5} * C_{5}}

Here, W₅= H₅= 256, C₅= 1

Generally, down sampling saves storage space. Larger storage spaces always bring in overfitting problems, which decreaseperformance.

The following Algorithm 1 describes the main steps for training and testing the proposed ADET model.

3 Result and discussion

The efficacy of the ADETframework for detecting autism is evaluated through mathematical measurements. The parameters for evaluating autism detection are accuracy, specificity, sensitivity, and precision. The confusion matrix is the major parameter that calculates the exact performance of the true and false values of the tested eye-based images.

Figure 7 represents the experiment's result of strong corner detection. The first column represents the original image from the retinal scan path images dataset.

Figure 7.

Experiment results of strong corner detection.

The second column represents the image with detected corners using the Shi-Tomasi corner detection algorithm. The third column represents the patches of the corner detection image. Finally, the fourth column represents the selected strong corners of the original image.

To train the proposed ADET model for detecting autism and non-autistic children by using eye tracking retinal scan path images. Here, the optimizer used is the Adam optimizer which works and achieves good recognition accuracy. The number of epochs used for training the model is 50. Batch normalization is also used to normalize the training image. Table 2 has a description of parameter setups.

Table 2.

Parameter configuration.

Configuration	Value
Optimizer	Adam
Epoch	50
Batch Size	16
Learning Rate	1 × 10⁻⁴
Batch Normalization	True
Execution Environment	GPU

The dataset contains eye tracking retinal scan path images and it is implemented for detecting autism in children. Generally, there are 547 images which are divided into autism and non-autism images. The split image is described in the following Table 3.

Table 3.

Splitting the eye-tacking retinal scan path image dataset.

		Training and Validation image (80%)
Eye Tracking retinal scan path image	Total images	Training Image (80%)	Validation Image (20%)	Testing Image (20%)
ASD	219	140	35	44
Non-ASD	328	207	55	66

3.1 Performance analysis

Precision, accuracy, AUC, sensitivity, and specificitywere evaluated by using the confusion matrix, which contains all true and false classified images and the values were generated by using the following equations:

\begin{aligned} Accuracy & = \frac{TP + TN}{TP + FP + TN + FN} * 100 \end{aligned}

\begin{aligned} Precision & = \frac{TP}{TP + FP} * 100 \end{aligned}

\begin{aligned} Specificity & = \frac{True Negative}{True Negative + False Positive} * 100 \end{aligned}

\begin{aligned} Sensitivity & = \frac{True Positive}{True Positive + False Negative} * 100 \end{aligned}

\begin{aligned} AUC & = \frac{True Positive Rate}{False Positive Rate} = \frac{Sensitivity}{Specificity} \end{aligned}

Where True Positive describes the correctly classified ASD children, the number of non-ASD children who are correctly labeled as normal is called TN; the number of non-ASD children who are incorrectly classified as ASD children is called FP; and the number of ASD children who are incorrectly classified as non-ASD children is called False Negative.

The following equations evaluate the values of all the parameters of the suggested method and they are described in Table 4.

Table 4.
Performance measures of the proposed ADET model.

Dataset Measure ADET model (Proposed)

Eye-Tracking retinal scan path dataset (Detecting autism and non-autism children) Accuracy 97.27%

Precision 95.55%

Specificity 66.7%

Sensitivity 98.7%

Dataset	Measure	ADET model (Proposed)
Eye-Tracking retinal scan path dataset (Detecting autism and non-autism children)	Accuracy	97.27%
Precision	95.55%
Specificity	66.7%
Sensitivity	98.7%

The transformer model is one of the most efficient models to evaluate, train and classify medical related images and it completely depends on the image's visibility and clarity. Generally, the transformer model classifies the images for training, testing and validating them. Training images train the set of images by splitting them into patches and gathering their features.

The features are easily gathered and trained by using the corner detecting process. This process extracts the features with an encoder mechanism and classifies the output layer with two possibilities. The output layer detects the particular image of the eye tracking retinal scan path, which denotes whether the child has autism or not. The result demonstrates the total number of images for training, testing and validating processes and detects the accuracy and loss curve of the suggested method shown in Figures 8 and 9.

Figure 8.

Compare the accuracy with epoch.

Figure 9.

Compare the loss with epoch.

The Receiver Operating Characteristics (ROC) measure the algorithms’ performance during the evaluation phase.The algorithm operates quite efficiently when the curve gets closer to the left corner. The performance of the suggested method is depicted in Figure 10 with the x-axis representing specificity or FP rate and the y-axis representing sensitivity or TP rate.

Figure 10.

ROC graph of the proposed ADET model.

Table 5 summarizes the performance results of the existing model with the proposed algorithm. It can be concluded that our proposed model achieves higher accuracy and better performance than the existing model. When the suggestedtechnique is trained and classified with higher accuracy than the existing methodology. Even though there are a smaller number of images,it achieves a better computation result by using the vision transformer model.

Table 5.

Comparison table of the performance result analysis.

Methodology	Accuracy (%)	Precision (%)	Sensitivity (%)	Specificity (%)
Mazumdar et al.⁷	60	58	43.18	78.6
Akter et al.²⁴	74.20	71.03	45.5	80.7
Zhao et al.⁹	84.62	82.51	50	84.2
Raj et al.¹⁰	95.75	90	42.9	94.8
Oliveira et al.¹	79.50	75.14	48.8	82.4
Proposed ADET Model	97.27	95.55	66.7	98.7

Table 5 indicates that the suggested ADET model achieves a better classification accuracy of 38.31%, 23.71%, 13.01%, 1.56%,and 18.26% than the RM3ASD,⁷ MLP,²⁴ SVM,⁹ CNN,¹⁰ and SVM¹ methods.

Our proposedADET model achieves a better classification precision of 39.28%, 25.66%, 13.64%,5.11%, and21.36% than the RM3ASD,⁷ MLP,²⁴ SVM,⁹ CNN,¹⁰ and SVM¹ methods.

Our proposedADET model achieves a better sensitivity of 35.26%, 31.78%, 25.04%,35.68%, and26.83% than the RM3ASD,⁷ MLP,²⁴ SVM,⁹ CNN,¹⁰ and SVM¹ methods.

Our suggested ADET model achieves a better specificity of 20.36%, 18.24%, 14.69%,3.95%, and16.51% than the RM3ASD,⁷ MLP,²⁴ SVM,⁹ CNN,¹⁰ and SVM¹ methods.

3.2 Statistical significance test

Following the categorization procedure, we had to reevaluate their performance and use a variety of statistical techniques to support these results. In this work, the statistical significance of each individual classifier was tested using the Wilcoxon Signed-Rank (WSR) approach. We applied this technique to the results of various assessment indicators for each age group. Below is a succinct explanation of the WSR method:

A non-parametric statistical test called the WSR Test is utilized to compare two independent samples. When the population mean is not relevant, this approach is thought of as a t-test substitute. The following is the method's working formula:

T = \sum_{j = 1}^{S} [s f (y_{1 j}, y_{2 j}) . K_{j}]

where sf stands for a sign function, T stands for test statistics, S for sample size,

y_{1 j}, y_{2 j}

for the ranked pairings of the two distributions, and

K_{j}

for rank.

4 Conclusion

Autism is a brain developmental disorder that affects children in their early stages and can spread all over the world. In the proposed work, an eye tracking retinal scan path image dataset is evaluated using a transformer application. The consistency and quality of the eye tracking data determine the accuracy of the suggestedADET model. The experimental result indicates that the suggested work is more effective. The ADET model corner detection technique is also used to give a higher level. Improve the accuracy level of feature selection with the Remora optimization algorithm. The vision transformer model is finely tuned and it shows statistically better performance. The classification accuracy reaches 97.27% and these results suggest that the model gives good generalization efficiency in binary class classification when compared to the previous approaches. With a small amount of training data, the suggested approach has proven to be resilient. The proposed approach has a significant flaw in that it relies heavily on eye-tracking data, which is not always available or simple to gather, particularly in environments with limited resources or diversity. The model's usability and scalability in real-world applications may be limited by its dependence on certain technology and regulated conditions, especially for large-scale or extensive screening operations. To address this issue, by implementing reliable data augmentation techniques and sophisticated preprocessing approaches to manage variability in eye-tracking data, further work could improve the accuracy and generality of eye-tracking models. Additionally, create a mobile application with this screening methodology that incorporates eye tracking data from the age of children from 6 to 24 months and it can independently diagnose the screening mechanism for toddlers by using mobile devices. Moreover, electrooculograms (EOG) are used to create the dataset for detecting autism in children at an early stage by evaluating the artificial intelligence framework.

Footnotes

Acknowledgements

The authors would like to thank the National Engineering College, K.R. Nagar, Kovilpatti for their support by providing fellowship and constructive suggestions that have helped to publish this research paper.

Ethical approval

My research guide reviewed and ethically approved this manuscript for publishing in this Journal.

Informed consent

I certify that I have explained the nature and purpose of this study to the above-named individual, and I have discussed the potential benefits of this study participation. The questions the individual had about this study have been answered, and we will always be available to address future questions.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of data and material

Data sharing is not applicable to this article as no new data were created or analyzed in this Research.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

References

Oliveira

Franco

Revers

, et al. Computer-aided autism diagnosis based on visual attention models using eye tracking. Sci Rep 2021; 11: 1–11.

Hus

Segal

. Challenges surrounding the diagnosis of autism in children. Neuropsychiatr Dis Treat 2021; 17: 3509–3529.

Davidson

Turner

Gillberg

, et al. Using the live assessment to discriminate between autism spectrum disorder and disinhibited social engagement disorder. Res Dev Disabil 2023; 134: 104415.

Jonsdottir

Saemundsen

Gudmundsdottir

, et al. Implementing an early detection program for autism in primary healthcare: screening, education of healthcare professionals, referrals for diagnostic evaluation, and early intervention. Res Autism Spectr Disord 2020; 77: 101616.

Jeyarani

Senthilkumar

. Eye tracking biomarkers for autism spectrum disorder detection using machine learning and deep learning techniques. Res Autism Spectr Disord 2023; 108: 102228.

Xia

Chen

, et al. Identification of autism spectrum disorder via an eye-tracking based representation learning model. In: Proceedings of the 7th International Conference on Bioinformatics Research and Applications, 2020, pp.59–65.

Mazumdar

Arru

Battisti

. Early detection of children with autism spectrum disorder based on visual exploration of images. Signal Process Image Commun 2021; 94: 116184.

Alcaniz

Chicchi-GiglioliI

Carrasco-Ribelles

. Eye gaze as a biomarker in the recognition of autism spectrum disorder using virtual reality and machine learning: A proof of concept for diagnosis. Autism Res 2021; 15: 131–145.

Zhao

Tang

Zhang

, et al. Classification of children with autism and typical development using eye-tracking data from face-to-face conversations: machine learning model development and performance evaluation. J. Med Internet Res 2021; 23: e29328.

10.

Raj

Masood

. Analysis and detection of autism spectrum disorder using machine learning techniques. Procedia Comput Sci 2020; 167: 994–1004.

11.

Ramji

Palagan

Nithya

, et al. Soft computing based color image demosaicing for medical image processing. Multimed Tools Appl. 2020; 79: 10047–10063.

12.

Safdar

Cheng

. Brain aneurysm classification via whale optimized dense neural network. Int J Data Sci Artif Intell 2024; 02: 63–67.

13.

Hemamalini

Anand

Nachiyappan

, et al. Integrating bio medical sensors in detecting hidden signatures of COVID-19 with artificial intelligence. Measurement ( Mahwah N J). 2022; 194: 111054.

14.

Jegatheesh

Kopperundevi

Anlin Sahaya Infant Tinu

. Brain aneurysm detection via firefly optimized spiking neural network. Int J Current Bio-Med Eng 2023; 01: 23–29.

15.

Kanhirakadavath

Chandran

MSM

. Investigation of eye-tracking scan path as a biomarker for autism screening using machine learning algorithms. Diagnostics 2022; 12: 518.

16.

Kollias

Syriopoulou-Delli

Sarigiannidis

, et al. The contribution of machine learning and eye-tracking technology in autism spectrum disorder research: a systematic review. Electronics (Basel). 2021; 10: 2982.

17.

Zammarchi

Conversano

. Application of eye tracking technology in medicine: a bibliometric analysis. Vision 2021; 5: 56.

18.

Tahri Sqalli

Aslonov

Gafurov

, et al. Eye tracking technology in medical practice: a perspective on its diverse applications. Front Med Technol 2023; 5: 1253001.

19.

Solovyova

Danylov

Oleksii

, et al. Early Autism Spectrum Disorders Diagnosis Using Eye-Tracking Technology. arXiv preprint arXiv:2008.09670. 2020.

20.

Roth

Jording

Schmee

, et al. Towards computer aided diagnosis of autism spectrum disorder using virtual environments. In: 2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), 2020, pp.115–122. IEEE.

21.

Wan

Kong

Sun

. Applying eye tracking to identify autism Spectrum disorder in children. J Autism Dev Disord 2019; 49: 209–215.

22.

Krol

. A novel eye movement data transformation technique that preserves temporal information: A demonstration in a face processing task. Sensors-Basel 2019; 19: 2377.

23.

Eraslan

Yesilada

Yaneva

, et al. Autism detection based on eye movement sequences on the web: a scanpath trend analysis approach. In: Proceedings of the 17th International Web for All Conference, 2020, pp.1–10.

24.

Akter

Ali

Khan

, et al. Machine learning model to predict autism investigating eye-tracking dataset. In: Proceedings of the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), vol. 5–7, Dhaka, Bangladesh, 2021, pp.383–387.

25.

Zhang

Chen

Tang

, et al. Children ASD evaluation through joint analysis of EEG and eye-tracking recordings with graph convolution network. Front Hum Neurosci 2021; 15: 651349.

26.

Ahmed

Senan

Rassem

, et al. Eye tracking-based diagnosis and early detection of autism Spectrum disorder using machine learning and deep learning techniques. Electronics (Basel). 2022; 11: 530.

27.

Gaspar

Oliva

Hinojosa

, et al. An optimized kernel extreme learning machine for the classification of the autism spectrum disorder by using gaze tracking images (May). Appl Soft Comput. 2022; 120: 108654.

ADET MODEL: Real time autism detection via eye tracking model using retinal scan images

Abstract

Background

Objective

Methods

Results

Conclusions

Keywords

1. Introduction

2. Proposed methodology

Table 1. Splitting the eye-tacking retinal scan path image dataset. Training and Validation image (80%) Eye Tracking retinal scan path image Total images Training Image (80%) Validation Image (20%) Testing Image (20%) ASD 219 140 35 44 Non-ASD 328 207 55 66

2.1.4 Transformer encoder

Table 4. Performance measures of the proposed ADET model. Dataset Measure ADET model (Proposed) Eye-Tracking retinal scan path dataset (Detecting autism and non-autism children) Accuracy 97.27% Precision 95.55% Specificity 66.7% Sensitivity 98.7%

4 Conclusion

Footnotes

Acknowledgements

Ethical approval

Informed consent

Funding

Declaration of conflicting interests

Availability of data and material

Human and animal rights

References

Table 1.
Splitting the eye-tacking retinal scan path image dataset.

Training and Validation image (80%)

Eye Tracking retinal scan path image Total images Training Image (80%) Validation Image (20%) Testing Image (20%)

ASD 219 140 35 44

Non-ASD 328 207 55 66

Table 4.
Performance measures of the proposed ADET model.

Dataset Measure ADET model (Proposed)

Eye-Tracking retinal scan path dataset (Detecting autism and non-autism children) Accuracy 97.27%

Precision 95.55%

Specificity 66.7%

Sensitivity 98.7%