Abstract
This paper represents automatic facial expression analysis method named Local Directional Stigma Mean Patterns (LDSMP) for automatic facial expression analysis and image retrieval using content based facial expression image retrieval and CNN. The traditional local patterns such as Local Binary Patterns (LBP) and Local Ternary Patterns (LTP) are applied for face recognition and expression analysis, calculated using relationship between the center pixel and neighboring pixels. The proposed method calculates the eight directional difference values then divided into the three ranges based on threshold values. Thus, the values are substituted with basic three positive values (
Keywords
Introduction
Facial expression analysis (FEA) is the challenging task in the field of Artificial Intelligence (AI) and computer vision. FEA demands in many applications such as criminals expression analysis, entertainment, surveillance in public transportation, patient mood analysis, student interest in online classes, customer satisfaction etc. probably statistical methods have been applying with the combination of classification algorithms such as ANN, SVM etc. but still expecting more accurate results.
The exponential growth of digital data due to usage of internet especially online services and through digital equipment like digital cameras, mobile phones etc. are generating daunting size of data is making stun to handle such databases very hard and inept by using only human annotations. Text – based systems are used in earlier of 1970’s, images are searched based on human annotations but this systems are suffered with some disadvantages like staff is required to give annotations and inaccuracy due to wrong notation to an image. So managing this copious data is tedious task to the administrator, thus to overcome this there is a acute demand of proficient and automatic structure is required named Content – Based Image Retrieval. Its prominent step is feature extraction whose impinge based on the technique developed to extract the features from an image. There are two categories of features (i) Low-level features and (ii) High-level features, CBIR uses low-level features such as color, texture, spatial information, shape etc. from image itself only. Here, generating the common representation of an image by considering perceptual content is difficult task because a user may take photographs in various kind of situations, illumination, orientation etc. illustrative, Comprehensive and upgraded survey about extraction, multidimensional on CBIR and future directions was given by Yong Rui, Liu and Kokare [1, 2, 3]. The retrieval accuracy of CBIR is generally depends on the efficient feature extraction following with similarity measurement methods. The recent applications of Convolutional Neural Networks (CNN) for image classification has proved that provides better results, so motivated to fusion the local directional stigma patterns to the CNN as input feature vector array to recognize and analyze the facial expressions.
Texture feature is one of the most important characteristics among basic low level features of an image, this psych iatry has been extensively used in many CBIR applications due to its potentiality. Muller et al. [4]presented exclusive review on generic content based image retrieval and technologies used for medical diagnosis images especially heart imagery applications and future directions. Moghadam et al. [7] had proposed a new algorithm called wavelet correllogram for image retrieval based on the color correllogram and multiresolution using daubechies wavelets then quantization method was applied. Moghaddam and Saadatmand were developed extended wavelet correllogram ie. Gabor Wavelet Correllogram by extracting rotation invariant features using gabor wavelets with optimized weighted distance to enhance the accuracy [8]. Zhang et al. [9] were proposed a hybrid method used to gather the global feature with the help of training free LBP variance (LBPV) and also used dissimilarity metric for dimensionality reduction with the combination of nearest neighbor classification and chi-square distance to create model. Kokare et al. [10] have introduced DT-RCWF and DT-RCT to retrieve the texture features in 12 directions used to decompose image. Zhen et al. [14] were proposed a hybrid method having space, scale, and orientation using gabor filters and LBP for face recognition. Second order derivatives for Local ternary patterns with KPCA (kernel principal component analysis) to confine the traces of median filtering [17].
Local binary patterns and extensions uLBP, CLBP etc., gradient based patterns, histogram based patterns are became popular due to its simplicity and efficiency. In spite of that LBP methods suffering from disadvantages especially while encoding large and small intensity difference values shown in Fig. 1 that leads to unsuccessful to retrieve the features separately as positive and negative features from the prominent positions of an image particularly in facial expressions. However, these variations can be addressed as mark by proposed method.
Example for generating same LBP (00100111) pattern for both textures with large and small intensity values.
Local binary pattern operator became an emerging method in texture feature extraction branch of content based image retrieval was introduced by Ojala et al. [5] further developed combinational operator using grayscale and rotation invariant to detect the uniform patterns for multiresolution and multidirectional ways to retrieve the spatial information [18]. Liao et al. [19] were proposed dominant local binary patterns (DLBP) for local texture classification, in addition circular symmetric gabor filter (CSGF) used to retrieve the global information to improve the accuracy. Tan and Triggs [20] were introduced a generalized feature descriptor ie. Local ternary patterns for face recognition as noiseless sensitive and more discriminate in uniform regions compared LBP. Murala et al. [21] has proposed local tetra patterns (LTrPs) by using second order derivatives in the directions of vertical and horizontal that combined with additional magnitude pattern also presented and compared results with gabor transform and also proposed directional local extrema patterns to retrieve the image edge information based on four directions such as 0
The basic patterns such as LBP, LTP are extracted the local information depends on edge distribution, which encoded either in positive direction or negative direction. Therefore these methods can also be advanced by considering the more directions instead of two directions. Our work proposed evaluating the possible directional information for each pixel as first-order derivatives then second order derivatives are evaluated based on quantization of proposed threshold values referred as local directional stigma patterns (LDSMP) for texture feature extraction for classification.
Organization of the paper includes: Sections 1 and 2 describes the brief introduction and precise literature survey. Section 3 represents the proposed work for feature extraction, Framework and Convolutional Neural Network description is mentioned in Sections 4 and 5, Results and analysis is presented in Section 6 and retrospectively, in Section 7, concluded with proposed work and mentioned probable future work, and finally references are listed in Section 8.
Local binary patterns
Ojala et al. [5] has firstly introduced LBP operator for texture classification of an image for feature extraction. LBP operator has been utilized and also proved in different application areas especially face recognition [18], expression analysis, biometric finger recognition and object tracking [23] etc.
Given center pixel ‘
As shown in Fig. 2, ‘
(a) 3 
Figure 2 shows an example for LBP calculation, then evaluate histograms of LBP patterns extracted for image edge distribution.
LTP is an upgraded method of LBP which is commenced by Tan and Triggers [20]. Gray values are quantized based on width of threshold ‘
Histogram can be build using Eqs (4) and (5) for an image by computing the LTP of each pixel
Computation of LTP operator, the ternary (3) patterns are split in to upper pattern and lower patterns by substituting ‘1’ for ‘
Local Adaptive Image Descriptor is proposed by Zahid et al. [11]. It is a deviation of LTP operator using the dynamic threshold based calculated value which is calculated from Eqs (6) and (7) the given image itself.
Here,
The basic idea of this patters are stimulated from the local textural patterns like LBP, LTP, LDP and DBC etc. [5, 19, 17, 23]. It depicts the spatial and temporal structure of the local texture feature based on the directions of the centered gray pixel value ‘
Numerical illustration of proposed method considering static threshold values and generation of six binary patterns by substituting each threshold value with ‘1’ and other values and 0’s are with ‘0’.
From above equations, possible differences of all directions to an image pixel can also be calculated using Eq. (3) by considering 8 different directions. Thus, substituting the given threshold values using Eq. (9).
The second order ‘
Furthermore, substituted with binary ‘1’ for each threshold value and others values with ‘0’s that calculation shown in Fig. 4. Subsequently, for the remaining five values to get six patterns respectively using Eq. (3).
Upper and lower patterns are generated using Eqs (3)–(3)
Determining threshold is a challenging task while preprocessing the image. In view of calculating threshold we can do the data partition into specific limits to retrieve the required feature where some of the methods like LBP etc. are failing to cover the wanted intensity levels especially in ‘0’ assigning cases. Threshold can be determined in two ways
Static threshold and Dynamic threshold.
Query image (top left) feature images of LBP, uniform LBP, circular LBP, LTP, DBC, Gabor filtered transform (GT), LGMMEPOP (local Gabor magnitude maximum edge positioned octal patterns), LGSMEPOP (local Gabor signed maximum edge positioned octal patterns), and proposed method LDSMP (top left corner query image).
Static threshold is user defined so user has to change frequently until getting satisfiable result to the given query. It is time consuming process and not generalized to different image categories. Dynamic threshold is calculated from the existed image pixel intensity data as appropriate to corresponding image. In proposed method considered static threshold as depicted in Fig. 4 like high, medium and low as 15, 6 and 1 according to the differences of given image so that data can be quantized in to 6 groups it covers three positive ranges and three negative ranges. Thus, dynamic threshold also obtained from an image using Eqs (17)–(19) maximum value of surrounding pixel is obtained as high threshold; minimum is obtained as low and median of 8 difference set is obtained as medium threshold.
Here,
Advantages of the proposed patterns compare to familiar texture patterns LBP, LTP, DBC etc. can be validated as follows:
LBP, LTP and DBC are extracting the encoded information in the form of binary ie. ‘1’ and ‘0’ or in LTP ‘1’, ‘0’ or ‘ As encoded LBP and LTP using differences of center and neighboring pixels. Though LDSMP encoded second order substitution compared to threshold values by dividing differences into valid limits to cover the nearer features. LDSMP works as generalized like immune to background noise (as shown in Fig. 5), illumination invariant as well compared to others and also extracts the local textural feature for better results. CNN will improves the efficiency up to mark.
Framework for generating proposed feature extraction pattern local directional stigma mean patterns (LDSMPs).
Algorithm:
Load the image then convert into grayscale and pixels. Calculate the first ordered derivatives in eight directions and construct the difference matrix. Quantize the differences with 6 different values ( Divide the patterns into 3 positive 3 negative. Evaluate the stigma patterns and split them into 6 binary patterns by alternating ‘1’ for each value at a time and others leave it as ‘0’s. Give the constructed feature list as input to the CNN algorithm. Classify the images based on detection using feature extraction. Match up to the given query image with images in dataset. Retrieve the top most images based upon the best matches.
Figure 6 illustrates the algorithmic step of proposed method.
Convolutional neural network with multi layers.
CNN is the deep learning algorithm which is sub field of machine learning and it is proved as efficient method for image classification, recognition etc. in various applications. CNN is the multi layer technique consists input layers, hidden layers and output layers as shown in Fig. 7. Input layer takes input from the feature database from proposed LDSMP method with unsupervised (content based instead of human annotated), hidden layers are for processing the images and the output layer is the fully connected layer to predict the required result based on the probability recognition of all images in the DB. Further, maximum probability will be considered as the required prediction.
performance of the proposed system is determined in terms of conservative measurements such as
Precision and recall are defined using Eqs (4) and (4) as follows:
where,
Recall is described as
Comparison between the different local patterns with proposed method in terms of average retrieval precision, average retrieval recall and average retrieval rate on DB1 (on 
Average precision and average retrieval rate can be calculated using Eqs (22) and (23) as
Here, ‘
Experiment 1
COREL-10K (DB1) database consists large volume of various category images like humans, nature, sports, animals, buildings etc. we collected 10000 images of 100 different categories elephants, Africans, buildings, dinosaurs, beaches, horses, buses, flowers, food and mountains for DB1. Each category has 192
Various local patterns with proposed method in terms of average retrieval precision, average retrieval recall and average retrieval rate on DB1
Various local patterns with proposed method in terms of average retrieval precision, average retrieval recall and average retrieval rate on DB1
Comparison of proposed method (LDSMP) and CNN with existing methods in terms of average retrieval recall, average retrieval precision and average retrieval rate (on 
Confusion matrix for facial expression image retrieval out of 18 images
Confusion matrix for facial expression recognition out of 100 images
Facial expression recognition rate (FERR) out of 100 images
Facial expression retrieval using proposed (LDSMP) method with CNN (on 
DB2 (JAFEE) is used in experiment 2 which contains 213 images of 7 (happiness, surprise, sadness, fear, anger, disgust and neutral) various facial expressions of 10 different Japanese female models with different poses. It consists each image of 256
Facial expression (disgust) retrieval images using proposed method from DB2.
Facial expression recognition using LDSMP with CNN.
Presented a novel method to recognize the image and retrieval and facial expression analysis under different variations in differences named LDSMP for CBIR. LDSMP encodes the feature based on the differences of 8 directional values then divides the range of values into 6 categories. Thus, the values are substituted with quantized values based on threshold. We also proposed the dynamic thresholds to divide the range from the image itself to improve the accuracy for various images. In addition, CNN (Covolutional Neural Network) algorithm used for image classification and recognition based on feature vector for DB1, DB2. We observed that performance of the proposed system improved markable in terms of precision, recall and ARR (Average Retrieval Rate) compared to existing methods LBP, LTP, DBC etc. The method works same for the large datasets (sample expression recognition images are shown in Fig. 12) like cohn – kannede (CK
