Abstract
Texture features of the salient patches are closely related to the facial expression recognition on face images. To obtain these features, we applied the Gabor wavelets to extract the relevant values on the whole-face and important regions such as the eyes, nose, and mouth of the face, and assigned different weights to them with respect to their different recognition effectiveness. Since the LBP operator is largely dependent on the center pixel and is easily to be affected by the lighting conditions, an Around Center Instable Local Binary Pattern (ACI-LBP) operator is applied in this research. The technique takes consideration of the relationship between the center point and the adjacent points, thus extends the representations of the fetures in the local region and is more robust to noise and illumination. To get the ACI-LBP, the LBP value is calculated first, then the Near Local Binary Pattern (N-LBP) value is calculated based on the distance between each pixel point and its neighborhood points in clockwise direction. The inconsistent values of LBP and N-LBP in corresponding positions are calculated in terms of their absolute values. In addition, a multi-scale histogram statistics method is adopted in the ACI-LBP extraction. Finally, the two parts features, Gabor and ACI-LBP, are merged as an integrated feature vector to classify and recognize the facial expression. Experimental results based on the JAFFE and CK facial databases show that the method can effectively improve the recognition accuracy of the facial expression recognition.
Introduction
Psychological research shows that when people communicate with each other, language accounts for 7%, tone accounts for 38%, and expression accounts for 55% of the information conveyed. Thus, expression is a fundamental mode for communicating human emotions and plays important role in human communication activities. Facial expression recognition has been applied to human-computer interface (HCI), video surveillance, driver safety monitor, emotional robotics, etc. With the development of artificial intelligence and pattern recognition, computer based facial expression recognition has received increasing attention among the researchers in the domain of computer vision.
A computer based facial expression recognition process consists of digital image preprocessing, feature extraction, and pattern classification operations. Effective facial expression analysis largely depends on the accurate representation of facial features in the images. Therefore the feature extraction operation is an essential step in the process. At present, the facial expression feature extractions are classified into two categories: one is based on statistical measurements, and the other is based on frequency domain evaluations. Common algorithms based on statistical features include Linear Discriminant Analysis (LDA) [1], Principal Component Analysis (PCA) [2], Independent Component Analysis (ICA) [3], etc. Major problem with these algorithms is that the recognition results could be easily influenced by the illumination conditions of the images and the poses of the faces. Research in facial expression feature extraction thus is more focused on the method of frequency domain evaluations. It is also known that facial expressions involve changes in local texture of the faces. Measurements that address both the whole face and the local regions of the faces are often applied to encode the texture features of facial expression, such as the Gabor wavelets [4, 5], wavelet analysis [6], directions of gradient histograms [7], Local Binary Pattern (LBP), sparse representation [8], etc.
Among the texture features, Gabor wavelet representations have been widely adopted in face image analysis due to their superior performance [4, 5]. The Gabor wavelet technique extracts relevant features in different scales and different directions and has good adaptability to illumination changes of the images. The Gabor wavelet functions are similar to the biological function of human eyes. Extraction of facial features by dividing the face region into several blocks achieved better accuracy, as reported in [9, 10]. However, most of the researches selected the size and position of these facial patches to be different for different databases. Our research attempts to identify the salient facial regions that possess generalized features for expression recognition. The size and location of patches are kept the same for two databases experimented for the purpose of methodological generalization. In our method, the eyes, nose, and mouth regions of the facial image are treated as the important regions, which are more closely related to facial expression. Texture features of these regions are extracted, respectively, by the Gabor wavelet representations, which are applied to both the whole-face and the important regions, with an assignment of different weights to each of these regions in the face images.
It was reported in [11] that LBP algorithm can effective describe the local information of the face image with lower computational complexity. LBP has become the research hotspot of facial expression recognition recently. However, the original LBP operator was too dependent on the center pixel, and the local features of the image were not completely exposed by the operation. Moreover, any intensity change of the center pixel due to lighting conditions would cause serious change of the entire LBP values. Therefore, a number of improved LBP algorithms were proposed by researches lately. Zhu et al. [12] proposed an Orthogonal Combination of Local Binary Patterns (OCLBP) approach, which drastically reduced the dimensionality of the original LBP histogram while keeping its discriminative power. Specifically, given P neighboring pixels equally located on a circle of radius R around a central pixel c, OCLBP was obtained by combining the histograms of [P/4] different 4-orthogonal-neighbor operators, each of which consisted of turning the previous four orthogonal neighbors by one position in a clockwise direction. OCLBP can improve the robustness of face pose. However, the robustness of the algorithm in terms of illumination variation and noise effect needs to be strengthened. A Symmetric Local Graph Structure (SLGS) [13] approach was proposed by Abdullah et al., which made a proper use of the details of the human face image to improve the noise robustness. However, the method suffered from its time complexity which was in a high order due to the larger feature dimension. In [14], a Noise-resistant Local Binary Patterns (NRLBP) approach was reported which assigned the values of uncertain bits so as to form uniform patterns. A non-uniform pattern was generated only if no uniform pattern could be formed. NRLBP has a certain improvement to the noise robustness. However, it is not suitable for the recognition of the high contrast illumination images. Zhao et al. [15] proposed Completed Robust Local Binary Pattern (CRLBP) approach which had a good effect on facial expression recognition with noise, but it needed to calculate the average gray level of the image, and the time complexity was also high. An image indexing method was introduced firstly as Local Mesh Patterns (LMep) in [16] which was computed based on the relationship among the surrounding neighbors for a given center pixel in an image. In addition, the effectiveness of the algorithm was confirmed by combining it with Gabor transform. It can effectively improve the recognition rate, but the high computational cost is again an obstacle in using it in a real-time application. In [17], a Joint Local Binary Patterns (JLBP) approach was proposed by Dan et al. that made better use of the local information by computing the differences among the pixel points to improve the recognition rate. However, it ignored the overall information of the image. Heikkilä et al. [18] proposed Center-symmetric Local Binary Pattern (CSLBP) approach, which reduced the feature dimension and significantly increased the time complexity, but without significant improvement for the recognition rate. In [19], Multi-scale Block Local Binary Pattern (MBLBP) approach was proposed by Liao et al., that made a better use of the whole information of the image, but it needed to calculate the average pixel value, and the computation cost was too large. In [20], an Extended Local Binary Pattern (ELBP) approach was proposed by Guo et al. that enhanced the facial expression recognition effect. However, improvements for the robustness of algorithm with respect to both the light and noise were still needed.
In this paper an Around Center Instable Local Binary Pattern (ACI-LBP) operator is proposed. The ACI-LBP operator considers the differences between the center pixel and its neighborhood positions in a clockwise direction respectively for the local feature extraction. The inconsistent values in corresponding positions are calculated by the absolute values of these differences. In addition, a scale histogram statistical method is applied in the extraction of the ACI-LBP features. In the final step of the process, the Gabor features of the whole-face and salient patches, as well as the ACI-LBP features, are integrated as the basis of facial expression features for classification. The new method takes into account the local texture information of the image and the overall texture information.
The advantages of the proposed method are two folds: the improved robustness and efficiency of the system performance resulted from the combination of two kinds of features, Gabor and ACI-LBP. 1) The Gabor features are extracted from the whole face image and three important patches with different weights due to the different roles, which makes the method more robust with respect to image noises; and 2) The ACI-LBP is an improved algorithm of LBP. Its value is calculated based on the distance between each pixel point and its neighborhood points in clockwise direction, which extends the reach and representation of the features to a larger local region. The experiments conducted on two database show that the method can effectively improve the recognition accuracy of the facial expression recognition and has a good robustness under the lighting variations and noise factors of the face images.
Image pre-processing
In facial expression recognition system, face images are often preprocessed by using geometric normalization, histogram equalization, and gray degree normalization techniques to reduce the image noises and guarantee the size and position uniformity, as well as the consistency of the image quality.
We take the geometric normalization as the first step in our process to locate the center of the eyes in the original facial expression image. Denote C l and C r as the central position of two eyes, the distance between two eyes is denoted as d, where point O is the midpoint of C l and C r . Based on these facial feature points and a geometric model to cut, we can get a rectangular region of facial expression as shown in Fig. 1. The height of the rectangle is 2.2d, and the width is 1.8d, with point O fixed at (0.6d, 0.9d). The geometric normalization is achieved by a linear mapping of the original image to a cutting face image I of M * N pixels. A Histogram equalization is then performed on the cutting face image I, and a new facial expression image is obtained and recorded as I′. Automatic adjustment of the facial image contrast is performed in this process. The histogram equalization operation is performed again on the facial expression image I′ for gray degree normalization, and for further weakening the effect of lighting and light intensity discrepancy.

Facial geometry model.
The preprocessing results of some facial expression images are shown in Fig. 2.

Results of the image preprocessing. (a) Original facial expression images.(b) Facial expression image after preprocessing.
The results clearly show that the face images are the same size, which retains the useful face information, removes the interference, and enhances the gray contrast through the preprocessing. These face expression images and the important regional images of the human face are more uniform and moderate in brightness.
Gabor wave filter
A Two-dimensional Gabor filter is a band-pass filter, which is defined in Equation (1) as an amplitude function modulated by Gauss’ function [21].
And
Where (x, y) is the location of a spatial domain pixel; ω0 is the central frequency of the filter; θ is the direction of the Gabor wavelet; σ is the standard deviation of the Gauss function along two coordinates. The relationship between σ and ω0 is:
Among the parameters of the above, W
p
is a Gabor wavelet time domain window. When the parameter σ is determined, the time domain window width is inversely proportional to the center frequency. The characteristics of Gabor filter are determined by the central frequency ω0 and the direction θ. In our experimental comparison on images from two human face expression database, the features with 5 scales

Gabor texture feature of face image. (a) Original image. (b) Gabor texture feature real part. (c) Convolution effect image.
Facial feature extraction is usually based on the whole facial expression image. When the facial expression changes, the changes of the eyes, nose and mouth parts, which form the salient patches of facial expressions, are more noticeable. The extraction of texture features from these key regions of the face images thus are more favorable in facial expression recognition. In this research, we extract Gabor features from both the whole-face and the salient patches of the facial image, and assign different weights to them, so as to obtain a better recognition result. The process is conducted as the following:
Step 1. Extracting Gabor features of the whole facial expression image, where the feature matrix is denoted as Gb1.
Step 2. Extracting Gabor features of three important regions of the eye, nose, and mouth, where the feature matrix is denoted as Gb2.
In the extraction of the facial features from the key regions, it is necessary to locate the facial feature points first. Conventional feature point location algorithms such as template matching, interactive feature location [22], etc., have the following deficiencies: 1) strict requirements for lighting conditions, and 2) quadratic time complexity for the iterative operation. Inspired by the work in [23], we use DMF_Mean_shift algorithm to locate the facial feature points. The algorithm introduces the prediction mechanism for the change of target position to reduce the search time as well as the amount of computation. At the same time, the algorithm converges to the exact position of the target through the continuous iterative Mean-Shift vector computation. The DMF_Mean_shift algorithm determines 66 facial feature points, and the distribution is shown in Fig. 4. The three parts of the eye, nose, and mouth were selected as the main areas of the face.

Facial feature point distribution.
Step 3. Assign different weights w1 and w2 to the two parts of the above features, and merge them into a series of characteristic matrices, as shown in Fig. 5.

Gabor feature extraction process according to the important regional distribution weights.
Step 4. At last, obtain the Gabor features of facial expression images. The characteristic matrices can reflect the information of the whole information and the critical areas of the expression image, which are helpful for the improvement of the expression recognition rate.
LBP operator
The LBP operator generates a binary numbered image feature by comparing the neighboring pixel values with the center pixel value [11]. The pattern with 8 neighborhoods is shown in Equation (4).
And
Where (x c , y c ) is the coordinate of the center pixel C of the 8 neighborhood, and its pixel value is g e ; i is neighborhood pixels, and g i is the pixel value. If the pixel value g i is greater than or equal to the center pixel value g e , the value of S (g i - g e ) is 1. Otherwise, the value is 0. R is the radius of the neighborhood, P represents the number of pixels in the neighborhood of a circle with the radius of R. R = 1, and P = 8 is the standard LBP definition. The specific process of standard LBP code is shown in Fig. 6.

Schematic diagram of the basic LBP operator.
The LBP operator can efficiently extract certain features of a pixel neighborhood and has a low computational complexity. However, the LBP value is only determined by the relation between the center pixel and its neighborhood, the overall information of the image is not fully utilized. The operator is susceptible to noises in the image. Moreover, LBP operator is limited to extract the neighborhood information at a particular scale.
Based on the above analysis, the around center instable local binary pattern (ACI-LBP) operator is proposed as an improved LBP operator in this research. The ACI-LBP uses “0” and “1” to represent two stable states and X
n
to represents the pending state. The value X
n
is determined based on the relationship between the center pixel and the neighborhood pixel as well as the relationship between the neighborhood pixels and their previous pixel points in the clockwise direction. The specific computational process is described as follows: According to Equation (4), calculate the eight binary LBP value. From the upper left side of the neighborhood pixel point, compare neighborhood points and their next point in a clockwise direction. This computation leads to a cycle over a circle of the eight neighborhood pixels, and results an eight-bit binary sequence. Record this sequence as a near local binary pattern (N-LBP), the calculation process is shown in Equation (6).
And
Compare LBP and N-LBP, if the value of the corresponding position is equal, the corresponding position of ACI-LBP takes this value. If the value of the corresponding position is not the same, then calculate the absolute value of the corresponding position pixel and the center pixel in the LBP value, and calculate the absolute value of the corresponding position pixel and the adjacent front point pixel in the N-LBP value. If the absolute value of the former is larger, the corresponding position of ACI-LBP takes the value of the LBP. Conversely, the corresponding position of ACI-LBP makes the value of the N-LBP. As shown in Equation (8).
And
A specific example is shown in Fig. 7. When the corresponding position value of LBP and the value of N-LBP are the same, the corresponding position value of ACI-LBP takes this same value. When the values of LBP and N-LBP in the corresponding position are different, a comparison of the absolute values is needed, as shown in Fig. 7, where X1, X2 and X3 are pending positions. In this case, the ACI-LBP takes the larger one between LBP and N-LBP values in the corresponding position.

ACI-LBP coding map.
It is noticed that the ACI-LBP value adopts the same part of the LBP value and N-LBP value. Therefore it can reflect the local characteristics of the image. For different parts, the values are determined by comparing the absolute value of LBP and N-LBP, and that indicates the image information which contrast is more evident. The ACI-LBP value reduces the dependence of the LBP value on the center pixel, enhances the relationship between the surrounding pixels and neighboring pixels, thus improves the robustness of illumination and noise Multi-scale histogram statistical method.
To make a full use of the facial features extracted from the whole and local images, this research applies a multi-scale histogram statistics method to extract the ACI-LBP features. In the process, the method first subdivides the facial expression image into a series of sub-regions with different scales. The facial expression images are refined into N layers. In the layer i, the image is divided into H i * H i sub-regions, i = 1, …, N, and i is refined as a layer among the N layers. H i is the number of sub-regions of layer i. An ACI-LBP feature histogram of the current layer is obtained by the measurements of 1) the histogram distribution sequence of the ACI-LBP features on each sub-region of the corresponding layer, and 2) the ACI-LBP histogram of each scale of each sub-region. A serial of the ACI-LBP characteristic histogram in each scale is used as the ACI-LBP histogram of the N scale. The principle scheme is shown in Fig 8. This method makes an adequate use of the image information due to consideration of both the local and the whole information of the facial expression image.

Multi-scale histogram statistical method.
In this research, an experiment was carried out by using two widely used facial expression databases, i.e., the Japanese Female Facial Expression (JAFFE) and the Cohn-Kanade (CK) database. The JAFFE database contains 213 pieces of Japanese women’s faces which are the positive face. Lighting is under a positive light source, but the light intensity is different. Each image is labeled with its original facial expression. The database has 10 people, and each person has 7 expressions such as neutral, happy, sadness, surprise, anger, disgust and fear. We selected 137 images from the JAFFE expression database for training and 76 images for testing. The CK database contains a facial expression for participants from different countries and regions. Three percent were women, fifteen percent were African American, two-thirds were Asian or Latino. Facial expression images were taken under complex illumination environment. We selected 315 expressions in the CK database, each of which contains 4 facial expressions sequences, with a total of 1260 images, 238 of which are for training and 77 are for testing.
Image important region for weight parameter selection
The Gabor feature extraction is performed on the whole expression image and its important regions. Since the contribution of the whole expression image and the important regions to the recognition rate may not be the same, the weights of Gabor feature are assigned differently with respect to the whole image and the important regions. The results of the experiment are shown in Table 1, and the values of w1, w2 are recorded as (x, y) on the first row. When w1 = 1, w2 = 1, that is (1, 1), the recognition result is the best. The experimental results show that different weights can affect the recognition rate, with w1 = 1, w2 = 1 as the optimal weight coefficient.
Expression recognition rate under different Weights
Expression recognition rate under different Weights
When w1 = 1 and w2 = 1 are chosen as the optimal weight, the Gabor feature is extracted from the whole image and the important region. And then the ACI-LBP feature is obtained using multi-scale histogram statistics. Table 2 shows the total recognition rate of different expressions at different scales. Parameter H1, H2 are expressed as (x, y) on the first row. The experimental results show that the best recognition effect is obtained when the multi-scale values are H1 = 4 and H2 = 6.
Recognition rate under different stratification
Recognition rate under different stratification
Through the above experiments, we obtained the best weight coefficient w1, w2 and the best stratification coefficient H1, H2. Plugging the two pair coefficients into the whole algorithm, the experiment results of our method then are compared with the previous results of OCLBP [12], SLGS [13], NRLBP [14], CRLBP [15], LMeP [16], JLBP [17], CSLBP [18], MBLBP [19] and ELBP [20]. In order to ensure the stability of the experimental results, the average recognition rate of each expression was calculated by six-fold cross of experiments validation on the CK database and the JAFFE database. Experiments use the SVM classifier, the recognition rate as shown in Tables 3 and 4. Table 3 shows that in the CK database, the results of our method are significantly better than the average level of several other methods, and the happy, fear two kinds of expression recognition rate are most obvious. Table 4 shows that in the JAFFE database, the results of our method for angry, fear, happy, neutral, surprise five kinds of expression recognition rate are improved significantly. Based on these two databases with a variety of facial expression recognition rate it shows that our method for facial expression recognition effect has improved in general.
Comparison of different methods of expression recognition rate on CK database
Comparison of different methods of expression recognition rate on CK database
Comparison of different methods of expression recognition rate on JAFFE database
During the classifying stage, two classifiers, SVM [24] and ELM, are used to identify the facial expressions from the extracted features. The final recognition rates were shown respectively in Figs. 9 and 10. The results show that in the CK database and JAFFE database, respectively, using the ELM and SVM classifier for facial expression recognition, the recognition effect of the ACI-LBP method, denoted as the “paper method” in the table, has increased dramatically compared to other methods. And the result of SVM is better than ELM.

Comparison of recognition rate of different methods on CK database.

Comparison of recognition rate of different methods on JAFFE database.
Experimental results on the JAFFE and CK databases show that the method can use the ACI-LBP texture information of the facial expression image more efficiently to achieve the goal of improving the recognition accuracy.
This paper has proposed an ACI-LBP integrated method for merging salient patches and whole-face image Gabor features with different weights to the recognition of different facial expressions. The salient patches of face images are extracted which are responsible for the face deformation from an expression. The position and size of these key regions are predefined. Analyses of the results show that features of the salient patches can be extracted to play an important role in facial express recognition. The ACI-LBP method better reflects the contribution of the important parts on the face to the facial expression recognition. In addition, the ACI-LBP operator is computed in a way which adequately reflects the correlation between the center pixel and its neighborhood points. Multi-scale histogram statistical method is used in ACI-LBP features extraction to further enhance the performance. The method considers both the local and the whole image information for an adequate use of image information. Image texture description is more objective and effective through the fusion of the two kinds of features. Experiments on JAFFE and CK databases show that the proposed algorithm can effective improve the recognition accuracy of facial expressions.
Footnotes
Acknowledgments
This work was granted by Tianjin Sci-tech Planning Projects (Grant No. 14RCGFGX00846), the Natural Science Foundation of Hebei Province, China (Grant No. F2015202239) and Tianjin Sci-tech Planning Projects (Grant No. 15ZCZDNC00130).
