Abstract
Facial expression is a significant indications for non verbal communication between individuals. The assignment of face emotion recognition is predominantly intricate for two reasons. Initial one is the non-existence of large database of training images and second issue is about classifying the emotions, which can be complex based on if the input image is static or not. Hence, this paper intends to propose a model, where two contributions are made both in the feature extraction and classification process. Initially, in feature extraction process, SIFT features are extracted as it is more associated with facial emotion features. However, numerous key point during SIFT extraction tends to provide redundant information. Hence, the key points are optimized using Whale Optimization Algorithm (WOA). Subsequently, the features extracted from the selected keypoints are multiplied with a weight factor, which has to be optimized by WOA. The resultant features are given to Deep Belief Network (DBN) for the classification of face emotion, in which the number of hidden neurons is also optimized by WOA along with feature weight. It is because of complexity that occurs with the utilization of both forward and reverse training in standard DBN model. Hence, after feature extraction and classification, the face emotion image is recognized accurately, while comparing the proposed Adaptive DBN (ADBN) using WOA over traditional classifiers and other optimization algorithms.
Introduction
Facial emotion recognition [1, 2] that plays a significant role in pattern recognition, human-computer interaction, and computer vision, is extensively exploited in surveillance systems, personalised healthcare, video games, multimedia, and humanoid service robots. In current studies, numerous algorithms focusing on face recognition, age, and gender estimation, and facial emotion classification have been expanded vastly [3, 4, 5, 45]. On the other hand, high dimensionality recognition is still a demanding problem for such applications. Facial emotion recognition is the most significant mode of human emotion expression [6]. For a couple of precedent decades, it has been a very imperative research field in the area of image recognition and computer vision. However, facial expression recognition remains as a demanding task [7, 8]. This is primarily connected with changing the environment, pose and lighting.
Facial emotion recognition [9, 10] in uncontrolled surroundings is an extremely demanding task owing to large intra-class distinctions caused by features such as occlusion, illumination, and head movement, and pose alterations [11]. The accurateness of a facial emotion recognition system usually depends on two significant characteristics: (a) facial features extraction which is robust beneath intra-class dissimilarities (for illustration, pose variations), however, are typical for a variety of emotions [12, 13, 53, 54, 55], and (b) model of a classifier that is proficient of differentiating dissimilar facial emotions dependent on imperfect and noisy data (for illustration, occlusion and illumination variations) [14, 15, 16].
In current years, DBN [17, 18, 19, 44] has obtained a rising consideration in artificial intelligence and machine learning, and numerous kinds of DBN associated algorithms have been productively deployed to image recognition tasks. Being diverse from a shallow learning structural design for nonlinear transformation [20], DBN techniques endeavour to discover abstract features in data at a high-level by employing hierarchical architectures [21], that have turn out to be an efficient approach for obtaining high-level characteristics from data. Nevertheless, conventional DBN experiences from inconveniences of learning effectiveness and computational complication [22, 23, 47].
Robust unbiased face recognition in real time remains as a foremost challenge for a variety of supervised learning dependent facial expression recognition techniques [24, 46]. This is owing to the actuality that supervised schemes cannot contain the entire appearance inconsistency across the faces regarding lighting, race, facial biases, pose, etc. in the restricted quantity of training data [25, 26, 43, 47]. In addition, dealing out each and every frame to categorize emotions is not necessary, as the user remains neutral for most of the time in common applications such as photo album or video chat/web browsing. Identifying neutral state at a beginning phase, thus passing those frames from emotion classification would accumulate the computational power [27, 28, 48, 49].
The paper contributes a facial emotion recognition model, in which two contributions are made in feature extraction and classification process. Primarily, in feature extraction process, SIFT features are extracted. Anyhow, several key points obtained throughout SIFT extraction tends to offer redundant information. Therefore, the key points are optimized by exploiting WOA scheme. Accordingly, the features extracted from the chosen keypoints are multiplied with a weight factor that has to be optimized by WOA. The optimal features are then subjected to DBN for the classification of face emotion, where the number of hidden neurons is further optimized by WOA together with feature weight. Moreover, the proposed ADBN method using WOA is compared with conventional classifiers, and other optimization algorithms such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC), FireFly (FF) and Grey Wolf Optimization (GWO) and the results are obtained. In addition, the evaluation is done by comparing the proposed classification schemes with conventional schemes such as DBN, Neural Network (NN), Support Vector Machine (SVM) and Naives Bayes (NB) techniques and the results are attained in terms of performance measures. The paper is organized as follows. Section 2 analyzes the related works and reviews done under this topic. Section 3 describes the modeling of face emotion recognition and Section 4 portrays the results and discussion. Finally, Section 5 concludes the paper.
Literature review
Related works
Morita et al. [1] have introduced a scheme for detecting emotional faces for 100 ms to 34 patients with an matched and MDD controls. Stimulus faces were surrounded by a scarf and cap or by an Islamic headdress. Accordingly, the investigations were performed based speed on and accuracy. Results demonstrate that across fearful faces, groups were identified faster and with higher accuracy in the out-group than in the in-group condition. In addition, sadness was recognized more precisely in the out-group stipulation. In association, happy faces were more exactly (and tend to be quicker) recognized in the in-group state.
Liedtke et al. [2] have established an approach known as extreme sparse learning that has the capability to learn a group of basis (dictionary) and a nonlinear classification design in cooperation. The suggested scheme merges the discriminative authority of extreme learning mechanism with the modernization feature of sparse representation to facilitate precise classification when offered with noisy signals and defective data recorded in normal settings. Moreover, this paper has offered a novel local spatiotemporal descriptor which is pose-invariant and typical. The suggested structure was capable to attain the state-of-the-art identification accurateness on both spontaneous and acted facial emotion databases.
Shojaeilangari et al. [3] have presented a scheme where coupling was related with the functional connectivity among the medial prefrontal cortices and forward cingulate. The outcomes of the simulation has recommended that the condensed impact of surveillance on embarrassment persuaded by self-face images in individuals with ASD was associated to mutilation in the right anterior insula, that was concerned in producing prejudiced feelings, and the anterior cingulate cortex, that acts as a focal point for incorporating information from others throughout self-face assessment.
Wang et al. [4] have implemented a new intellectual emotion recognition arrangement. Accordingly, the proposed model employed stationary wavelet entropy to take out the features and deployed a hidden layer feed forward NN as the classifier. To avoid the training of the classifier drop into local optimum points, Jaya algorithm was established. The representation outcomes over a 20-subject 700-image dataset have demonstrated that the suggested model has attained an overall accurateness of 96.80
Chen et al. [5] have proposed a Softmax regression-dependent deep sparse auto encoder network (SRDSAN) to identify facial emotion in human-robot communication. It endeavors to deal with large data in the deep learning output by means of SR. In addition, to prevail over gradient diffusion and local extrema inconveniences in the training procedure, the overall network weights were tuned finely to arrive at the overall optimum that makes the whole depth of the NN more forceful, thereby improving the presentation of facial emotion recognition. Outcomes illustrate that the average recognition accurateness of SRDSAN was superior to that of the convolutional NN and the SR.
Happy and Routray [6] have suggested a novel structural design for expression recognition by deploying appearance characteristics of chosen facial patches. A little major facial patches, depending on the location of facial landmarks, were extracted that were active throughout emotion elicitation. One-against-one classification technique was implemented by means of these characteristics. Furthermore, an automated learning-free facial landmark detection method has been suggested that accomplishes comparable performances as that of various conventional landmark detection techniques.
Chiranjeevi et al. [7] have suggested a light-weight neutral versus emotion classification mechanism that performs as a pre-processer to the conventional supervised emotion classification techniques. It enthusiastically learns neutral appearance at key emotion (KE) points by means of an arithmetical texture representation, build up by a set of reference frames for every user. The suggested system was robust to a variety of user head motions by associating with affine deformations dependent on a statistical texture design. The suggested technique, consequently, improves emotion recognition (ER) accurateness and concurrently reduces the computational complication of the ER system, as authenticated on numerous databases.
Zhang et al. [8] have implemented a facial expression recognition scheme with an alternative of evolutionary firefly algorithm for feature optimization. Initially, a customized Local Binary Pattern descriptor was implemented to generate a preliminary discriminative face demonstration. Moreover, a variant of the firefly algorithm was implemented to carry out feature optimization. From the evaluation, the implemented system attains a superior presentation and outperforms other traditional feature optimization techniques and associated facial expression recognition designs by a noteworthy margin.
Review
Table 1 shows the methods, features, and challenges of conventional techniques based on face emotion recognition protocols. At first, Pattern recognition was suggested in [1], where there was no correlation between accuracy of performance and the outcomes were stable. However, there was no consideration of personality styles in emotion platform. In addition, Extreme Learning Machine (ELM) was proposed in [2] which perform better during challenging scenarios with reduced classification cost. Anyhow, there was no contemplation of motion exaggeration schemes. NN was suggested in [3] that improves the coupling strength with minimized impact on embarrassment. However, the sample size was comparatively small, and there was gender variation in social sensitivities. Further, Jaya algorithm was suggested in [4] that offers increased accurateness, and it also includes various industrial and academic applications, but distortions arise if the photography is taken in an imperfect way. DNN was implemented in [5] that provide increased robustness with better efficiency and reduced complications. However, there were possibilities of gradient diffusion issues. LBP was suggested in [6] that offers accurate classification with reduced computational cost, but there was no accomplishment on real time recognition. In addition, LBP was presented in [7] that provide reduced performance complications, and it also minimizes the classification error. Anyhow, existence of issues related to non-availability of neutral frames. Finally, Moth-firefly optimization was proposed in [8] that handle better with illumination variations, and it also increases the convergence speed. However, there was no involvement on varied multimodal optimization issues. Thus these limitations are considered for enhancing the face emotion recognition algorithms efficiently in the recent research work.
Features and challenges of face recognition using various techniques
Features and challenges of face recognition using various techniques
Overall architecture of the proposed face emotion recognition model.
Proposed architecture
The overall framework of the proposed face emotion recognition model is given by Fig. 1. An image of an individual is taken for face emotion recognition analysis. The face of the image is selected and detected by means of face detector, and the keypoints are obtained using SIFT technique. The keypoints of the image are optimally selected by means of WOA. The optimized keypoints are then subjected for extracting the features. Moreover, the features obtained from the key points are multiplied with weight factors, which is equal to the length of the features. The extracted features of the image are given to D6tyuBN for classifying the respective emotions like normal, happy, sad, surprise, angry, fear and disgust. Accordingly, the weight factors along with the number of hidden neurons are optimized by WOA, which results in optimal classification of emotions. Thus the classifier recognizes certain emotions of the testing image from a set of training images.
SIFT extraction
SIFT [32] is a method for extracting and identifying local feature descriptors that are invariant to image contrast, rotation, and scaling. The SIFT technique is first implemented by Lowe [31]. In recent years, it has been enhanced and improved to offer better image extraction. SIFT characteristics have a number of significances as specified in the below points.
Natural characteristics can be obtained from images. They are invariant to uniform orientation, scaling and moderately invariant to contrast variations. Enhanced error acceptance with fewer matches. Offers characteristics with excellent effectiveness and speediness. Suitable to merge and produce beneficial information.
Scale-space extrema detection: Initially, in this stage, the forgery image
The dissimilarity in both the scales is said to be the DoG (Differences of Gaussians).
Keypoint localization: Following the initial process, Interest points, which are also known to be the keypoints are recognized as local minima or maxima across scales for the DoG images. The entire pixels in the DoG images is distinguished with its corresponding eight neighbours at the similar scale as revealed by Eq. (3), where
Orientation allocation: With the intention of attaining invariance to orientation, the gradient magnitude
Keypoint descriptor production: After selecting the keypoint orientation, the feature descriptor is evaluated on 4
DBN [29, 30] includes different layers, in which every layer comprise visible neurons compiling input layer and hidden neurons constitute the output layer. Moreover, the whole hidden neurons encompass a link to the neurons specified at input; however, there is a denial of association between the hidden neurons, and there exists a shortage of bond between the visible neurons. In fact, the bond between the visible and hidden neurons appears to be restricted and symmetric. The stochastic neuron model portrays the exact output for a particular input. As the output of a stochastic neuron in Boltzmann network is probabilistic, Eq. (6) points out the output and Eqs (6) and (7) indicates the sigmoid-shaped function possibility. In addition, the deterministic representation of the stochastic design is specified in Eq. (8), in which
In DBN, feature extraction is carried out by a group of Restricted Boltzmann Machine (RBM) layers, and classification assignment is carried out by means of various perceptron layers. The numerical design depicts the Boltzmann device energy for the involvement of neuron state
The descriptions of energy about the joint symphony of visible and hidden neurons
The probability allotments of the input data are encoded into weight factors and are allocated as the learning representation of RBM. In reality, training of RBM can make use of the allotted probabilities, and the allocation of weight is described by means of Eq. (13). For all possible pair of hidden and noticeable vectors, the probability allotted RBM design is revealed as specified in Eq. (14), in which
As it is a complicated assignment to attain the sampling of the outlooks beneath the allocation portrayed by the format, a learning method called Contrastive Divergence (CD) is exploited. Thus the concise phases of CD performance are demonstrated as specified below.
Choose the training samples Compute the possibilities of hidden neurons
Verify the hidden states Determine the external product of vectors Verify the reformation of the visible states
Compute the exterior product of Find out the updated weight as exposed in Eq. (18), in which
The weights with novel values are customized as specified in Eq. (19).
To initiate the procedure of learning dependent on MLP system, let us regard the patterns of training
Thus, Eq. (21) offers the pattern
The course of action taking place throughout the training of DBN with the incorporation of normal training (MLP) and pre-training (RBM) is provided in the following steps.
By means of biases and various related factors along with the arbitrarily chosen weights, the DBN design is initialized. Primarily, the initialization of RBM representation is carried out by means of data provided at input, helping potentials in its visible neurons and holds the unsupervised learning. The input to the following layer is attained by sampling the potentials that is produced in the hidden neurons of the preceding layer. In addition, it tracks the unsupervised learning. These steps are recurred for a definite amount of layers. Consequently the pre-training phase by RBM is completed till it arrives the MLP layer. MLP stage offers sophisticated learning by supervised format and is continued till it acquires the error rate of objective.
Solution encoding
The solution encoding involves two phases. Initially, the keypoints from the SIFT technique is taken as solution for optimal selection as shown in Fig. 2. Here,
Solution encoding (Phase 1).
Solution encoding (Phase 2).
The proposed face recognition model includes two objective functions as mentioned below.
The objective function of the phase 1is the minimization of correlation among the keypoints that get optimized by WOA as exposed in Eq. (23). The correlation among the two key points
The objective function of phase 2 concerns with the optimal features that are multiplied with the weight factors, which is equal to the length of features. The solution of weight factor and hidden neuron focus on the objective function as maximization of classification accuracy as shown by Eq. (25), where
The solutions in phases 1 and 2 are given to WOA for selecting the optimal solution. In general, whales are liable for emotions, judgment, and social performances as done by humans [33]. The major motivating thing concerning the humpback whales is the amazing hunting system. The foraging activities of them are identified as bubble-net feeding process. It is an outstanding behavior that can be perceived in only the humpback whales.
Sample image data for (a) Normal (b) Smile (c) Sad (d) Surprise (e) Angry (f) Fear (g) Disgust.
They can identify the position of prey and encircle them. As the location of the most favorable model in the explore space is not recognized as a priori, the WOA process presumes that the current most excellent candidate solution is the objective prey or is closer to the most favorable. Following the description of the fine exploring agent, the erstwhile exploring agents will, therefore, attempt to modernize their locations towards the optimal agent for search. This behavior is indicated by Eqs (26) and (27), in which
It is important to note that
Exploitation phase: Shrinking encircling mechanism: This action is accomplished by minimizing the value of
Spiral updating position: Initially
It can be further modeled as given in Eq. (31) where
Exploration phase: According to this phase, randomly selected exploration agent rather than the most excellent search agent is detected so far. Such mechanism and
Experimental setup
Classification analysis of proposed and conventional methods measuring (a) Accuracy (b) Sensitivity (c) Specificity (d) Precision (e) FPR (f) FNR (g) NPV (h) FDR (i) F1-score (j) MCC.
The proposed face emotion recognition model was carried out in MATLAB using JAFFE database, and the results were obtained. The analysis was made based on six face emotions such as normal, happy, sad, surprise, angry, fear and disgust as shown by Fig. 4. The proposed ADBN method using WOA was compared with conventional algorithms such as GA [38], PSO [39], ABC [40], FF [41] and GWO [42] and the outcomes were analyzed. In addition, the evaluation was done by comparing the implemented classification scheme with conventional schemes such as DBN [34], NN [35], SVM [36] and NB [37] techniques and the related measures such as accuracy, sensitivity, specificity, precision, False Positive Rate (FPR), False Negative Rate (FNR), Negative Predictive Value (NPV), False Discovery Rate (FDR), F1-score, Matthews Correlation Coefficient (MCC) were achieved.
Impact of whale optimization over conventional optimization algorithms for face emotion recognition
The classification analysis for proposed ADBN using WOA for face emotion recognition has been specified by Fig. 5. From Fig. 5a, the accuracy of the proposed scheme for SIFT is 16.67% better than DBN, 6.25% better than NN, 1.04% better than SVM, and 11.46% better than NB methods. Also, the suggested scheme for OSIFT is 16.33% superior to DBN, 8.16% superior to NN, 9.18% superior to SVM, and 13.27% superior to NB models. Also, from Fig. 5b, the sensitivity of the implemented method for SIFT is 50% better than DBN, 12.82% better than NN, 5.12% better than SVM, and 23.07% better than NB methods. As well, the presented system for OSIFT is 47.05% superior to DBN, 17.65% superior to NN, 15.29% superior to SVM, and 23.53% superior to NB designs. From Fig. 5c, the specificity of SIFT is 6.25% better than DBN, 1.04% better than NN, 1.04% better than SVM, and 4.16% better than NB techniques. Similarly, OSIFT is 4.21% superior to DBN, 1.05% superior to NN, 2.1% superior to SVM, and 3.16% superior to NB schemes. In addition, from Fig. 5d, the precision of SIFT is 50% better than DBN, 8.98% better than NN, 3.85% better than SVM, and 3.85% better than NB approaches. Similarly, the proposed OSIFT is 47.05% superior to DBN, 11.76% superior to NN, 23.53% superior to SVM, and 35.29% superior to NB algorithms. From Fig. 5e, FPR of the implemented scheme for SIFT is 62.38% better than DBN, 10.53% better than NN, 10.53% better than SVM, and 57.78% better than NB models. Also, the proposed OSIFT is 64.70% superior to DBN, 40% superior to NN, 51.61% superior to SVM, and 66.67% superior to NB schemes. From Fig. 5f, the FNR of SIFT is 63.93% better than DBN, 45.45% better than NN, 18.18% better than SVM, and 72.72% better than NB approaches. In addition, from Fig. 5g, NPV of SIFT is 6.25% superior to DBN, 1.04% superior to NN, 1.04% superior to SVM, and 4.16% superior to NB algorithms. Also, NPV of proposed OSIFT is 7.14% better than DBN, 2.04% better than NN, 3.06% better than SVM, and 7.14% better than NB algorithms. From Fig. 5h, FDR of SIFT is 63.93% superior to DBN, 45.45% superior to NN, 18.18% superior to SVM, and 72.72% superior to NB approaches and the proposed OSIFT is 65.38% better than DBN, 55.55% better than NN, 50% better than SVM, and 83.33% better than NB schemes. From Fig. 5i, the F1-score of the implemented scheme is 50% superior to DBN, 12.82% superior to NN, 5.12% superior to SVM, and 23.07% superior to NB systems and the suggested OSIFT is 47.05% better than DBN, 17.65% better than NN, 15.29% better than SVM, and 23.53% better than NB models. Finally, from Fig. 5j, the MCC of modelled scheme for SIFT is 60% superior to DBN, 7.14% superior to NN, 7.14% superior to SVM, and 28.57% superior to NB techniques and the proposed OSIFT is 52.5% better than DBN, 18.75% better than NN, 21.25% better than SVM, and 35% better than NB algorithms. Thus the enhancement of the implemented classification technique has been confirmed successfully from the obtained results.
Impact of whale optimization
The performance analysis of the proposed face recognition model using WOA is compared over several conventional optimization algorithms, which is shown in Table 2, where the accuracy of the suggested scheme is 2.57% better than GA, 3% better than PSO, 0.864% better than ABC, 0.43% better than FF and 0.43% better than GWO techniques. Also the sensitivity of the presented method is 10.44% superior to GA, 12.06% superior to PSO, 3.45% superior to ABC, 1.73% superior to FF and 1.73% superior to GWO methods. The specificity of the implemented method is 1.47% better than GA, 1.72% better than PSO, 0.846% better than ABC, 0.25% better than FF and 0.25% better than GWO models. Also, the precision of the suggested scheme is 10.44% superior to GA, 12.06% superior to PSO, 3.45% superior to ABC, 1.73% superior to FF and 1.73% superior to GWO methods. The FPR of the presented mechanism is 49.8% better than GA, 57.5% better than PSO, 15.5% better than ABC, 8.15% better than FF and 8.15% better than GWO algorithms. Moreover, the FNR of the proposed model is 49.9% superior to GA, 57.49% superior to PSO, 16.67% superior to ABC, 7.92% superior to FF and 7.92% superior to GWO designs. The NPV of the suggested method is 1.47% better than GA, 1.72% better than PSO, 0.46% better than ABC, 0.25% better than FF and 0.25% better than GWO techniques. The FDR of presented scheme is 49.9% superior to GA, 57.49% superior to PSO, 16.67% superior to ABC, 7.92% superior to FF and 7.92% superior to GWO approaches. Similarly, the F1-score of implemented scheme is 10.44% better than GA, 12.06% better than PSO, 3.45% better than ABC, 1.73% better than FF and 1.73% better than GWO models. Finally, the MCC of implemented scheme is 12.5% superior to GA, 14.59% superior to PSO, 4.25% superior to ABC, 2.09% superior to FF and 2.09% superior to GWO approaches. Thus the improved mechanism of the proposed method has been verified successfully.
Conclusion
This paper has presented a technique, in which two contributions were performed in feature extraction and classification process. Primarily, SIFT features were extracted in the process of feature extraction. Anyhow, numerous key points were achieved throughout SIFT extraction which provides redundant information. Hence, the key points were optimized by deploying WOA approach. Moreover, the features that were extracted from the chosen keypoints were factorized with a weight factor that has to be optimized by WOA scheme. The optimal features were then offered to DBN for the classification of face emotion, where the count of hidden neurons was optimized by WOA approach along with feature weight. Finally, investigations were carried out, where the proposed ADBN scheme using WOA was compared with existing classifiers and other optimization algorithms. From the classification results, the accuracy of the suggested method for SIFT was 16.67% better than DBN, 6.25% better than NN, 1.04% better than SVM, and 11.46% better than NB methods. Also, the presented scheme for OSIFT was 16.33% superior to DBN, 8.16% superior to NN, 9.18% superior to SVM, and 13.27% superior to NB approach. Also, the overall performance of the suggested method in terms of accuracy is 2.57% better than GA, 3% better than PSO, 0.864% better than ABC, 0.43% better than FF and 0.43% better than GWO techniques. Thus the improved performance of the implemented face emotion recognition was proved in a better way.
