Abstract
Autism Spectrum Disorder (ASD) is increasing rapidly at higher rate which has a greater impact in many organizations to contribute to ASD coaching and training. The highest complexity lies in the factor of early identification of ASD to support training. Many researchers have proved that early identification of autism and appropriate coaching to children with high ASD can result in a quality improvement in child’s lifestyle. This early identification of autism can be screened through stages involving evaluation of eye gaze, emotion, expression, linguistic ability, responsiveness. Objective of this paper mainly focuses on analyzing the facial expression of children with autism in a contact less environment as the children could even respond to the target object they face. The paper identifies the various facial expressions of autism children excluding the time and event occurrence. These expressions are used as an early screening method to identify children who may fall under autistic characteristic in the near future. Moreover the facial expressions could be analyzed in a live video environment as stimuli emotional sequence that further leads to next level of screening. The paper also analyses the major facial expression perceived by the children along with the variation in facial expression dynamics that a normal Toddler (TD) possess.
Keywords
Introduction
Autism Spectrum Disorder (ASD) is a neuro developmental disorder that results in cognitive impairments in children. The major pitfall faced in the ASD is that people from rural areas are not aware of the factors, causes and behavioral changes in the child with ASD. This leads to poor and misleading in training and guiding the children for their future. The ultimate responsibility in training children with autism lies in priority scale factor of early identification of children with autism, analyzing the functional complexity such as low and high functioning autism and on least lies the appropriate coaching. The clinical evaluation and analysis of autism identification is through the regular dyads manner. The clinical procedure has strongly proved that the children respond to the interaction based on their neurodevelopment. Such neurodevelopment disorders are reflected in the child explicitly as facial expression. Hence evaluating the various facial expressions, the response factor and their disengagement behavior could contribute to the analysis of neural development in a child thus in turn contributes to the early identification of autism.
Though this contributes to the screening of child processing ASD, this cannot conclude that the child with unmatched facial expression is autistic rather proves the possibility in near future.
These facial expressions differ from child emotion as expressions are categorized in a static, neutral, contact less environment. The static inbuilt expressions in children can then be analyzed involving time variant and even driven as emotions of the children that contributes the next stage of autism screening. This inbuilt expression contributes to the analysis in the fact that random capture of the children in an event free environment contributes to the original expression inculcated in the children.
With the advent of these facial expressions in real time applications, there arises a need for automatic recognition mechanism. These mechanisms for automated recognition of facial expressions normally depend upon the movement of eyes, facial muscle movements or depend upon the various means to create relationships among the different shapes of the face or on the variety of emotions. However, this information can be gained from the varied sequence of images which depicts the drive of emotions and this paper mainly contributes to the expression analysis faced by the children with ASD positive and ASD negative.
Summary: The paper is organized in the manner in which Section 2 describes the literature survey carried out related to the work specified in this paper, Section 3 elaborates the methodology involved in the implementation and Section 4 provides the acquired insights made on facial expression results through this paper implementation respectively.
State of art
It is clearly defined that the identification and diagnosis of autism spectral neural disorder could be identified within the age of two to three years. This diagnosis prodigy could be made success through regular clinical diagnosis which includes micro observations and evaluation of characteristic and behavioral patterns through questionnaire reports. Many researchers have worked to reduce the number of questions involved in the questionnaire interaction both with the parents and with the trainer.
Even still, this is considered to be a time consuming screening mechanism and these resultant observations out of the questionnaire cannot strictly adhere to autism positive rather it holds the simple measurement of pre indicator to autistic characteristic in a child. Many other research studies have failed to evaluate and conclude the expected impairments in the cognitive behavior of the children is that the recognition of facial expression is an emotion type specifically notified in people with ASD [9]. Hence the current paper aims in bridging the gap of identification and exploring autistic characters in children and the facial expression.
Thus facial expressions being one of the primary signals used to detect the internal intentions of one person, the exploration could support the assessment of ASD positive at a little lower age. Also, the attention bias towards faces is critical for the development of the social brain during infancy and childhood in humans [18]. The background of this paper mainly focuses in analyzing the internal intentions of the child in a contactless environment where the child is not bias on any external object. High functioning of autism spectral disorder will have significantly impaired communication abilities in social environment. These abilities are difficult to perceive and their ambiguous expression compared to the regular developing TD counterparts to a greater extent as discussed by Tanaya Guha, et al. [1].
This high functioning of autism takes the characteristics of poor visual processing, disengagement behavior, low linguistic ability, repetitive behavior, higher reactive rates. Besides these characteristics, the local and global visual processing varies to greater differentiable factor. The insight proves that the children with autism shows expression based on the event and the routine [2–14].
Though the characteristics and the diagnosis probability varies between genders is based on sensory symptoms, the basic characteristics of autistic child remains same at the early identification level [3]. Thus it is much necessary to consider the high level of expression and emotional behaviors in children before the start of the clinical analysis in children. Also, autism identification in the sub clinical levels along with the evaluation of the absence of compensatory ability in children may expose autism positive in autistic children as an early identification biomarker in the near future [3].
There exists notable research in facial expression recognition in the children with autism and without autism. But the major difference that lies in that fact is that the children recognizing and identifying the facial expression shown by the target object is made as an evaluation tool rather to explore the facial expression in the stimuli. This could even be a resultant factor due to influence of object and may even express disengaged behavior in facing angry/aggressive faces [28]. Such inculcation of facial expression recognition is identified to contribute maximum when analyzed on stimuli in view of initial screening of autism.
Such compensatory abilities are caused through the neurodevelopment of the child varied in the scale of age. The sensory nerves react to the state of children and reflect as expressions and emotions expressed by them. These emotions can be labeled confusing, sad, happy, angry, etc [4]. This behavioral expression could be intervened between the age group of 2 to 5 years. But this emotional behavior of a child starts form the age of 8 months on direct dyad with the mother. Such interventions could be screened at the early stage that feeds supportive information to emotion analysis in the later screening method.
The early screening method initiates the process by face detection and through feature extraction. Viola, et.al proposed Viola Jones algorithm for face detection that proves to be for Haar Classifiers on facial detection and feature identification [5]. The algorithm undergoes Haar cascade classification. Jing- Wein Wang, et al. suggested an algorithm for facial feature specification that categorized the face into T shaped structure extracting eyes, nose and position as three feature dimensions [6].
In an experiment [4], it was observed that the target object to the stimuli might influence the emotional behavior of the children either ASD or TD. Such emotions are to be carefully notified to improve the efficiency of the screening results and impacts over the analysis made in the contactless environment.
Furthermore, Lydia R. Whitaker, et.al classified the facial expression of the children when the target object possesses anger and happy emotion. The difference in the variance of emotion boundary suggests that the target object might be an influencing factor for ASD positive child [4]. Exploring such emotions through facial detections using machine learning algorithms could result in an early identification of autism before the clinical analysis. Though, to further improve the accuracy in classification and better reliability of the screening mechanism, a deeper feature identification and feature analysis should be involved.
Thus basic engagement behaviors and facial expression on a standard time variant without any target object intervention could lead to early screening of ASD.
Methodology
The facial expression identification in children with autism begins with preprocessing the dataset and moves along face detection, feature identification, feature analysis and classification based on the landmark of features detected in the faces. The flow model of the entire work is described in the Fig. 1.

Flow diagram of ASD analysis using facial expression analysis.
Face detection being the most prominent study for most of the applications, better techniques evolve in a gradient manner. However, the most standard, accurate and efficient face detection algorithm in recent decays is identified as Viola Jones algorithm of face detection. This paper aims to analyze facial expression in faces that are detected using Viola Jones algorithm. The face detection method using this algorithm has three major stages namely transformation of input image into integral image, adaboost technique and cascade classifier.
Transformation of image
The input image from the dataset in converted into an integral image meaning summation of pixel values in an identified rectangular piece of image. The summation of the pixel at a location (x,y) is computed as
The adaboost technique identifies the weak classifiers on analysis of facial features to reject the negative inputs. The term boosted implies that the classifiers at all stage of the cascade are intricate themselves and they are made of essential classifiers by means of different boosting techniques. The Viola Jones algorithm computes the weak classifier as,
The result of Adaboost classifier, i.e., strong classifiers is composed into stages to form cascade classifiers. The term cascade implies that the resulting classifier consists of quite a lot of simpler classifiers or stages that gets applied to the region of interest until the candidate gets rejected or all the stages gets passed through.
The cascade classifier divides the classification work into two stages: Training Stage and Detection stage. The training stage involves the collection of samples which can be categorized as positive and negative. The cascade classifier uses some auxiliary utilities to create a training dataset and to assess the eminence of classifiers. Table 1 explains the steps involved in the cascade classification that supports maximized detection performance and accuracy.
Algorithm for Cascade classification technique
Algorithm for Cascade classification technique
In our paper, we included the utility program called opencv_createsamples to produce the positive samples for opencv_traincascade. The resulting file of this utility is used as an input to opencv_traincascade to train the detected face. Initially, the classifier was trained with relatively few sample views of positive samples, and negative samples which are random images of the identical size of which both samples are identically scaled in their size. The classifier produces an output “1” if the region probably detects the face and produces an output “0” otherwise. This classifier is intended to locate the faces of interest at different sizes, which proves to be more competent than resizing the input image itself.
The feature extraction of the facial images is performed through the identification of facial landmarks. Facial landmarks suggest 68 points in the face frontal that sketches the facial alignment that could be used for any application analysis. In this paper, we detect the face landmark using OpenCV Dlib. The Dlib automatically detects the face, identifies the facial landmark. These points are then identified as the input data to feed the classifier. Though the Dlib automatically identifies face in any frame, the face detection through Viola Jones improves much higher in the accuracy of facial classification with decreased processing capability in feature extraction process. Figure 2 explains the various stages involved in the feature extraction that are required to process further.

Block diagram of Facial Landmark Identification.
The landmark identification technique plots 68 points on the input face [28]. Each point on the face has three basic characters to analyze. They are
The central point positions the nose which determines the angle of face viewing
The distance from the centre of the nose to the point (x,y) is d
The angle at which the point is present from the center is Θ
The central angular nose points are considered in prior so as to minimize the fault detection rate when the face is analyzed in a side postures rather to analyze in the frontal structure. The angular nose points forms the base for computation and forms correlation with other facial points to classify the most accurate emotion faced by the children. Table 2 describes the algorithm for facial landmark identification and the position evaluation in frontal face.
Figure 3 shows the sample images of the children face with landmark identification. These 68 points has a unique frontal arrangement of the detected face.

Resulting Images of Landmark Identification with 68 points.
The landmarks are considered as feature and are given as input to classifier ehich evaluates the distance between each landmark point vecorized. The Table 3 gives the brief facial locations based on the landmark points vectorized.
68 Landmark Feature Points and their facial location
As mentioned earlier, the Dlib in OpenCV itself is sufficient for face detection and thus separate face detection algorithm need not be implemented in prior. But, in this paper we focus on separate face detection as the feature extraction using Dlib will fall in accuracy when the face is not detected in capture and also in consideration with the medical application to produce trustable and close accurate solution possibility. This advancement was made as a target in making live facial expression detection and analysis for Indian children with ASD and normal TD.
During the live analysis there may be possibility in not detecting the face under a particular frame. However, when proper GPU Processing is adopted for live detection, there need not be a prior facial detection rather the capture and analysis will gets speeder on high computing capacity. Thus there will not be any failure in object detection in live frames. Hence, as a justification factor to improve the analysis factor in our paper, we first pass the input image into viola Jones- haar cascade classifier to detect face and then process the landmark using Open CV dlib python libraries to analyze the detected face.
In this paper, we have classified the facial expression using SVM-Linear Kernel Classifier. The facial landmark points are analyzed to classify the expression as anger, disgust, fear, happy, sadness, surprise and neutral in both children with ASD and TD. The basic concept that lies behind SVM classifier is that it works on the core factor of implicitly embedding the data into a high dimensional feature space. Along which the feature space is identified to be a linear algebraic and geometric functions that are used to separate data, which could be made possible with only nonlinear rules in input space. In the SVM Linear Kernel Classifier, kernel functions are applied to compute efficiently the inner products directly in feature space, without the need for explicit embedding [10]. The linear classifier classifies the expression based on
where x is the feature, w is the weight and b is the bias. Each image in the ASD_POSITIVE and ASD_NEGATIVE are split into 80|20 ratio for training and testing.
The data set consists of two categories of images of children who were found to be ASD_POSITIVE and ASD_NEGATIVE. Both the categories contained nearly 304 and 390 images of children respectively which were in turn preprocessed to remove duplicate images. In our paper we considered 193 images under ASD_POSITIVE and 359 images under ASD_NEGATIVE. However, among the entire dataset images we analyzed 80% of images under training dataset and 20% of the images were used as test datasets in both the categories. Since the dataset acquisition is the major challenge, the paper also employed 2 fold methods as a trail method to maximize the training dataset. As this could lead to minimized trustworthy to the system in medical field, the paper only includes the originally acquired dataset.
Results & analysis
The result and analysis in this paper mainly focus on exploring the facial expression pertained in the children with ASD positive and ASD negative. The analysis was made on the dataset described in chapter 4. Though the number of images involved in training the ASD positive case stands to be minimum, that is the number of images contributed to each facial expression stands to be an average of 22 images, the facial exploration of expressions stands to be near accurate to the processing capacity. Through the identification is clearly observed with the maximum facial expression faced by either group of children.
Classification of facial expressions
The classification of facial expressions is determined with a key factor of having basic screening mechanism as an indicator of autistic nature in the near future.
Table 4 shows the various probabilities of facial expressions identified from the ASD_POSITIVE & ASD_NEGATIVE image datasets. From the experimental analysis, it is observed that the children with ASD_POSITIVE showed neutral as the majority expression and disgust as the next higher level of majority. This indicates that the autistic children are neutral like any TD without any interference of any target object they face. When not neutral or under the circumstance of error prone conditions, the children are likely found to be disgust. Through this analysis, emotions of the children could be enhanced by identifying the target object that creates emotion in them and reflects as facial expression. However, this analysis is made as a basic screening and analysis of facial expression in a contact less environment.
Probability of facial expression in ASD+ve and ASD–ve
Probability of facial expression in ASD+ve and ASD–ve
In Fig. 4 the graph indicates the probabilities of various facial expressions observed in both ASD & TD. The probabilities are calculated based on the total number of images in each of the datasets aligned along the occurrences of each expression as categorized by the SVM Linear Kernel Classifier Algorithm.

Comparative graph of various facial expressions in ASD+ve and ASD–ve .
The insights were made only with the application of SVM Linear classifier so as to ensure strong insights for a medical diagnosis procedure. These insights suggest the parents and trainers to take extra attention in children who normally possess disgust and even neutral as their facial expression either when prone to an intervening object or in any contactless environment.
Table 5 inferences the overall probabilistic variations in all the seven identified facial expressions. Any image in the dataset undergoes evaluation of expression using the landmark points identified and the differences among the expression probabilities suggest the closest reaction that the children could express. The table insights that for every facial expression there could be minimal differentiated expression from the children. This inference is helpful in analyzing the performance of facial expression analysis and to provide appropriate advancements in detection.
Comparison of difference in probabilities across various facial expressions in ASD+ve and ASD–ve
From the Table 5, it could also be inferred that the identified expression in ASD like anger, disgust, fear, happy, sadness, neutral and sleep could be interpreted as happy, neutral, sadness, sleep, disgust and happy respectively as the probability difference between the expressions are minimum. Similarly the expressions in TD could be interpreted as sleep, happy, sadness, neutral, fear, happy and anger. This implicates that ASD varies between Neutral, Disgust and anger as maximum distracted expressions while normal TD shows variations revolving Maximum happiness and neutral. Disgust in TD is observed only under maximum accuracy and not under any fall in probabilistic facial expression value.
Figure 5 shows the distraction probability of anger that indicates the maximum distracted expression for anger is disgust and then neutral which fall under the observed facial expression of ASD positive children.

Distraction Probability of Anger in ASD+ve and ASD–ve.
Similarly Fig. 6 shows the distraction expression of disgust as anger which in turn justifies the facial expression observed.

Distraction Probability of Disgust in ASD+ve and ASD–ve.
The distraction probability of neutral is shown to be anger in maximum for ASD positive kids and for normal ASD negative kids in the Fig. 7 insights anger and sleep expressions.

Distraction Probability of Neutral in ASD+ve and ASD–ve.
Most importantly, the distraction probability of happy expression in ASD positive is observed to possess neutral at a maximum probabilistic difference as shown in Fig. 8. Such distractions in probabilities may either insight the acceptable fall of expression category during classification or the partial influencing facial expression possessed by the children.

Distraction Probability of Happy in ASD+ve and ASD–ve.
The accuracy of the analysis of facial expression in children with ASD & TD is evaluated across the training and test data performance. The facial expression analysis undergoes two linear classifications of SVM Linear Kernel classifier and the following accuracy was observed.
On first run in linear classifier (i), the accuracy factor is determined as 0.775362318841 and in linear classifier (ii), the accuracy factors is determined as 0.847826086957 as shown in (Fig. 9). On an average, the accuracy of the analysis is found to be as Mean value of Linear SVM classifier = 0.811594202899, i.e. approximately equals 81% of accuracy. The accuracy could be further improved by increasing the data set size either and training the classifier with much more varied feature combinations.

Accuracy analysis of the classifier.
Identification of Autism spectrum disorder is an important factor to approximate the training, coaching and proper handling of ASD Positive children. Although the identification stands perfect in dyads analysis, many system intervene technique support the early detection of autism. There are many screening techniques to conclude that an individual child is an autistic child or not. This paper mainly aims in identifying such basic autistic character through facial expression analysis under the non interference of a target object. It is observed that the children with ASD in majority showed neutral and disgust as the facial expression where the normal TD expressed happy, neutral and disgust. The probability analysis indicates that under a contact less environment, a TD shows happy and neutral as major expressions by their nature of neurodevelopment whereas the ASD positive child expresses disgust and neutral as the major expression. The paper also compares the difference in probability among each expression to identify the closest and the next predominant facial expression showed by the children along with the distraction probabilities among various expressions. This analysis shows that the children under these facial expressions may fall under the category of autism in the near future and does not guarantee to be an autistic child. This expression analysis could be further improved to identify the child’s emotion during a particular event within a random time period as next level of screening autistic character in them.
