Abstract
The changing process of facial expression contains dynamic texture information, and its accurate extraction is crucial to expression recognition. In this paper, a weighted adaptive symmetric center binary pattern from three orthogonal planes (ASCBP-TOP) is proposed for facial expression recognition. First, facial expression image sequences are partitioned into sub-blocks on different scales to establish a multi-scale space, and ASCBP-TOP is used to extract the dynamic texture features of facial expression image sequences in each scale space and obtain the corresponding feature histograms. Then, these feature histograms are connected in series with different weights to obtain an overall feature histogram for describing the dynamic texture features of facial expression sequences. Finally, support vector machine (SVM) is used to classify and recognize facial expressions. Experiments on Cohn-Kanade and JAFFE databases show that the proposed method is superior to the state-of-the-art methods, and it can extract the dynamic texture information more effectively. In addition, the proposed method is more robust to illumination, expression and pose variations, and it has higher expression recognition rate.
Introduction
With the development of intelligent human-computer interaction, the research on facial expression recognition has become a hot topic in image processing and pattern recognition. Facial expression recognition system includes image preprocessing, face detection, expression feature extraction and expression recognition. Expression feature extraction is the most important part, and how to extract expression features effectively is crucial to facial expression recognition [1].
Common facial expression recognition approaches are classified into global and local features-based methods. The former methods obtain the feature space of facial expression by mapping and depend on the correlation between image pixels, such as principal component analysis(PCA) [2], linear discriminant analysis (LDA) [3, 4] and independent component analysis (ICA) [5]. The latter methods include local binary pattern (LBP) and scale-invariant feature transformation (SIFT). SIFT [6] is more stable with respect to translation and rotation, which can extract abundant feature information. However, it often contains extreme points, and the dimensionality of feature vectors generated by SIFT is quite high. Ojala et al. first proposed an LBP operator, which is simple and effective for calculation and has the advantages of gray scale invariance and rotation invariance [7, 8]. The LBP operator has been widely used in many fields, such as texture classification, object detection and image analysis.
The original LBP leads to high-dimensional feature vectors, and it has a negative impact on recognition efficiency [9]. Without the consideration of the effect of center pixel on the surrounding pixels, the local structure information is missed under certain circumstances, which results in the decrease of recognition rate. The binary data computed by LBP operator are sensitive to noise and have poor robustness [10]. To solve these problems, Tan and Triggs proposed local ternary pattern (LTP) [11], which is different from LBP in that it selects dissimilar quantized thresholds and its quantization function extends the binary function to three-valued function. Local quinary patterns (LQP) [12] changes the quantization function based on the LTP operator, and the neighborhood points around the center pixel are computed by five-valued quantization function. This method fully reflects the differences between pixel points, but it also brings in large calculation cost. In [13], Jia et al. proposed the idea of weighted block, in which the texture features are extracted from each sub-image of facial expression images and different weights are assigned to the feature histograms of sub-patches; meanwhile, multi-layer sparse representation (MLSR) is used in facial expression recognition. Local directional pattern (LDP) [14] uses local edge gradient information to describe the facial expression feature, which is more robust to noises. The orthogonal combination of local binary patterns (OC-LBP) [15] was proposed to extract texture features. The operator is composed of two ordinary 4-neighbor LBP, one part of which is calculated by comparing the horizontal and vertical neighborhood points with the center pixel, and the other is calculated by comparing the diagonal neighborhood points with the center pixel. Compound local binary pattern (CLBP) [16] is defined by adopting two bits for each neighborhood point to encode the feature value. In this method, the coding method of the first bit is the same as the basic LBP; the other bit is encoded by comparing the magnitude of the difference with a threshold. Meanwhile, the average magnitude of the difference between the neighborhood values and the center pixel is set as the threshold. In [17], a facial expression image is cut into eight different images by bit plane slicing, and they are further encoded by local bit plane code (LBPC), which reduces the influence of illumination and noises. In [18], volume local binary patterns (VLBP) and local binary pattern from three orthogonal planes (LBP-TOP) were proposed to extract the dynamic texture features and analyze the image sequences or videos. VLBP operator extends the basic LBP from two-dimensional space to three-dimensional space, then the center pixel and neighborhood points are compared in three-dimensional space; LBP-TOP extracts the feature values on three orthogonal planes to effectively obtain the spatial-temporal feature information of facial expression image sequences. As Gabor wavelet has good frequency and directional selectivity [19], in [20], radial encoding is used to obtain the local Gabor features in each sub-block of the facial expression image. Local Gabor binary patterns from three orthogonal planes (LGBP-TOP) [21] is proposed to extract the dynamic spatial-temporal texture feature of facial expression image sequences on three orthogonal planes. In [22], sparse coding algorithm was used to extract features, and the concept of spatial pyramid matching (SPM) was introduced to obtain the motion characteristics and dynamic texture information of facial expression image sequences. Centralized binary pattern (CBP) [23] operator considers the effect of center pixel, and it compares pairs of pixels in the circular neighborhood to compute the feature values: the pair of pixels are compared whether they are symmetrical about the center pixel; if not, the neighborhood points are calculated by using bilinear interpolation. Heikkila et al. proposed a center-symmetric local binary pattern (CS-LBP) operator and introduced the idea of central symmetry. Therefore, the dimensionality of feature histogram generated by CS-LBP is quite smaller [24]. However, this operator ignores the effect of center pixel, as well as the relationship of sizes between the center pixel and the surrounding pixels. The threshold value used to extract the expression features cannot be automatically selected, thus CS-LBP is less robust to noises [25].
In this paper, considering the advantages and shortcomings of CS-LBP operator, we take the effect of center pixel into consideration based on CBP, and propose a novel facial expression recognition method, called weighted adaptive symmetric center binary pattern from three orthogonal planes (ASCBP-TOP). Experiments on Cohn-Kanade (CK) [26] and JAFFE facial expression databases show that the proposed weighted multi-scale ASCBP-TOP method outperforms LBP-TOP, CSLBP-TOP, CBP-TOP and LQP-TOP, and achieves a higher expression recognition rate.
The main contributions from this paper are as follows: The proposed ASCBP operator considers the effect of center pixel, and the threshold is adaptively selected. This operator overcomes the shortcomings of easily ignoring the contrast between the center pixel and the surrounding pixels, as well as texture thickness caused by the fixed threshold, thus it has better noise immunity capability, and its discrimination capability is improved. Time dimension is proposed, and ASCBP-TOP extends ASCBP operator from two-dimensional space to three-dimensional space to effectively extract the dynamic texture feature of facial expression image sequences. A weighted multi-scale form is proposed and a multi-scale block is utilized on facial expression sequence images. The multi-scale space is established to highlight the detailed texture information in the local areas. Meanwhile, different weights are assigned to the scale spaces to reflect the uniqueness of texture feature in different areas. Therefore, the dynamic texture feature of facial expression image sequences is described more fully.
The rest of this paper are organized as follows. After the introduction of three typical expression feature extraction algorithms in Section 2, the ASCBP-TOP operator for facial expression image sequences is presented in Section 3. Section 4 describes the process of the proposed facial expression recognition method and Section 5 shows the experimental results and the detailed analysis. Section 6 concludes the paper, and discusses the research work in the future.
Expression feature extraction
As for facial expression feature extraction methods, both CBP and CS-LBP compare the pairs of pixels, which are center-symmetric in code images, thus the number of dimensionality of feature histogram generated by the two algorithms will be reduced. CBP considers the effect of center pixel on the surrounding pixels, while CS-LBP ignores the role of center pixel. Besides, the thresholds of the two algorithms are fixed, and cannot be automatically selected. Therefore, ASCBP operator is proposed to extract the texture features of facial expression images.
CBP
The original LBP encodes every pixel of an image by comparing the surrounding pixels with the center pixel in a neighborhood [27]. The dimensionality of feature histogram generated by LBP is rather high, and this operator does not take the effect of center pixel into account, thus CBP is proposed. Figure 1 illustrates the calculating process of CBP operator.

CBP operator.
CBP operator compares the pairs of pixels in the circular neighborhood to compute the feature values. The pair of pixels are compared whether they are symmetrical about the center pixel; if not, the neighborhood points are calculated by using bilinear interpolation, i.e., the center pixel is compared with the average value of all the neighborhood points including itself. The CBP operator is calculated according to Equation (1).
The dimensionality of CS-LBP is less than CBP, and the idea of central symmetry is introduced to encode the images. CBP operator compares the pairs of pixels that are centrally symmetric, instead of comparing the neighborhood pixels around the center pixel. Specifically, for each neighborhood pixel, the result will be encoded as 1 if the difference between the pairs of pixels is not less than the threshold; otherwise, the result will be encoded as 0. Then a binary number is generated by connecting all binary values clockwise, and the binary number is converted into a decimal value, which will be set as the CS-LBP code [28]. The CS-LBP operator can be expressed as follows:
Figure 2 illustrates the calculation of CS-LBP operator. The feature dimensionality generated by CS-LBP is 16. Therefore, this operator can greatly reduce the computational complexity [29].

CS-LBP operator.
One of the differences between CBP and CS-LBP is that CBP operator considers the effect of center pixel on the surrounding pixels and assigns the largest weight to the center pixel, which is beneficial for describing the texture information. Meanwhile, sign function is redefined so that the influence of noises on image feature extraction is reduced. In this way, CBP operator is robust to noises.
Based on CBP operator, ASCBP considers the effect of the center pixel on the surrounding pixels and assigns the largest weight to the center pixel. Compared with the center pixel, the neighborhood points are divided into two parts according to the idea of Fourier parity-decomposition, then the center pixel gc is compared with the average of all pixel points on the odd location and that on the even location, respectively. The odd operator ASCBP o and even operator ASCBP e of ASCBP can be described by
The threshold T is adaptively selected according to the surrounding pixels, which overcomes the shortcomings of easily ignoring the contrast between the center pixel and the surrounding pixels as well as the denseness of texture caused by fixed threshold. The selection method of T is to calculate the average magnitude of the difference between the center-symmetric pixel pairs in the neighborhood, as shown in Equation (4).
The detailed calculation of ASCBP operator is shown in Fig. 3. The two feature histograms of the facial expression image
This single histogram ASCBP operator. Principle of ASCBP features.

ASCBP-TOP introduces the time dimension based on ASCBP, and extends the ASCBP operator from a two-dimensional space to a three-dimensional space to describe the dynamic feature of facial expression. The facial expression image sequences contain three orthogonal planes, as shown in Fig. 5, where XY plane supplies spatial information, and XT and YT planes supply spatial-temporal transformation information. Fig. 6 illustrates the relationship among these three orthogonal planes. To extract the feature of the facial expression image sequence, we take the middle frame of the facial expression image sequence as a benchmark. Afterwards, we use each pixel of this facial expression image as the center pixel and the neighboring pixels in the circular neighborhood of radius R to calculate the ASCBP feature value in XY, XT, and YT planes. Facial expression image sequence diagram: (a) on XY plane; (b) on YT plane; (c) on XT plane. Three orthogonal planes XY, XT, YT.

This single histogram Procedure of ASCBP-TOP histogram. Flowchart of facial expression recognition method with weighted multi-scale ASCBP-TOP.

Different scale spaces contain different facial expression image texture information, and the more abundant the texture information of image region, the larger the amount of information it contains; and vice versa. We adopt a weighted multi-scale space and assign different weights to these scale spaces to reflect the uniqueness of texture feature in different regions. Figure 8 shows the flowchart of the facial expression recognition method with weighted multi-scale ASCBP-TOP. The steps are as follows: Facial expression image sequences are partitioned into sub-blocks on different scales to establish a multi-scale space. The features of facial expression image sequences are generated in each scale space by using ASCBP-TOP to obtain the corresponding feature histograms. On different scales, with more sub-blocks, more abundant texture feature information can be obtained. Therefore, different weights are assigned to the feature histograms in the scale spaces according to the richness of texture feature information, and these feature histograms are connected in series according to different weights to obtain an overall feature histogram for describing the dynamic texture features. Facial expression images are divided into the training and test sets, then a support vector machine (SVM) classifier is used. As the input of SVM classifier, the extracted feature histograms of facial expression image sequences are trained and tested to realize facial expression classification and recognition.
Weighted multi-scale ASCBP-TOP featuredescriptor
A multi-scale block is utilized on expression sequence images. Suppose that the facial expression images are classified into N scales, and these expression sequence images are divided into 2m+1 × 2m+1 non-overlapping sub-blocks [30] in the mth scale space, then ASCBPo(m,b,j) and ASCBPe(m,b,j) operators are utilized for the characteristic statistic of all pixel points on three orthogonal planes in the bth sub-block, as shown in Equation (7).
The feature histograms
Then these three feature histograms
In the mth scale space, the feature histogram of facial expression image sequence is extracted for each sub-block, and the feature histograms of all sub-blocks are connected in series to obtain a whole feature histogram of facial expression image sequences:
Meanwhile, different weights are assigned to the feature histogram of facial expression image sequences in each scale space. The value of weight w m on the mth scale space is 2-(N-1-m). The assignment principle of weights is that the feature histogram of facial expression image sequences in the large-scale block is given smaller weight, whereas the feature histogram of facial expression image sequences in the small-scale block is given larger weight. Therefore, the weighted multi-scale ASCBP-TOP feature histogram of facial expression image sequence is described by:
Facial expression recognition is a nonlinear multi-classification problem, thus an SVM classifier is adopted. The extracted feature histograms of facial expression image sequences are input to the SVM classifier for training and testing. The training set matrix consists of the feature vectors of all facial expression image sequences from the training sets, and the test set matrix consists of the feature vectors of all facial expression image sequences from the test sets. These input feature data are projected into a high-dimensional space, and the kernel function is used to calculate the mapped high-dimensional data, by which the original linear non-separable problems are transformed into linear separable problems. The common kernel functions include linear, RBF, polynomial and Sigmoid functions. RBF kernel function that needs few parameters is mainly used for linear non-separable problems to reduce the calculation load and complexity, so it is selected in this paper, i.e.,
Expression recognition is a multi-classification problem, while SVM is designed for two-class problems. Therefore, we should build a multi-classifier by combining several different two-class classifiers. We adopt a one-to-one classification strategy, i.e., the two-class classifiers are constructed between samples, two kinds of samples related to training samples corresponds to a classifier, if there are k different kinds of samples, we would need to construct k (k - 1)/ - 2 SVM two-class classifiers [31].
In this paper, CK and JAFFE databases are used. The CK database contains 210 adults ranging from 18 to 50 years (69% females and 31% males; 81% Euro-American and 13% Afro-American, 6% others). Each person has several image sequences including neutral and certain expressions. These image sequences consist of 8-bit gray images or 24-bit color images with 640 × 480 or 640 × 490 pixels, and the number of gray images is almost 2000 [32].
This database is based on action unit coding, and dynamic images are used to describe each expression. Meanwhile, the influences of illumination and pose are taken into account. Some image sequences contain un-typical expressions. For the same expression, if there are some image sequences for one person, only one sequence is chosen. Therefore, we select 340 expression sequences (45 Angry, 49 Disgust, 56 Fear, 66 Happy, 58 Sad and 66 Surprise). Then 246 randomly selected sequences are set as the training sets, and the remaining 94 sequences are set as the test sets. Each expression sequence contains ten images varying from neutral to extreme expressions in a total of 3400 images. The JAFFE database contains ten Japanese females, and each female has seven basic expressions including neutral expressions (Angry, Fear, Disgust, Happy, Surprise, Sad, Neutral) in a total of 213 images. For each expression, each person has three images with 256 × 256 pixels on average. We select one or two images of each person for each expression from this database in a total of 70 images as the training sets, and the test sets consist of the rest of images, i.e., the test sets include 143 images. Samples in CK and JAFFE databases.
In the above weighted multi-scale feature extraction of facial expression image sequences, facial expression images are divided into non-overlapping sub-blocks to build a multi-scale space. The number of image blocks has some effect on recognition performance. If the size of sub-block is too large, it cannot fully reflect the detailed texture information of local regions; if too small, the extreme case is the pixel-level image that ignores the features of eyes, nose and mouth. Thus it leads to too small details and increases the computational complexity. Moreover, feature extraction is mostly disturbed by image noises. Therefore, the images in the spaces on different scales should be reasonably partitioned into sub-blocks to obtain effective image texture features, and the best multi-scale space is established. The larger the amount of sub-blocks in different scale spaces, the more abundant the texture information.
We compare the proposed ASCBP-TOP operator with the following four dynamic texture feature extraction methods, i.e., LBP-TOP [18], CSLBP-TOP [24], CBP-TOP [23] and LQP-TOP [12] and discuss the effect of the number of blocks on CK database. Table 1 lists the expression recognition rates of different methods with different numbers of blocks based on CK. We also compare the proposed ASCBP operator with LBP [7], CS-LBP [24], CBP [23] and LQP [12], then discuss the effect of the number of blocks on the JAFFE database. Table 2 lists the expression recognition rate of different methods with different numbers of blocks based on JAFFE database.
Tables 1 and 2 show that the recognition results with blocking are superior to those without. With more sub-blocks, we obtain smaller block size that contains more local details, resulting in a higher recognition rate. When the number of blocks is 16 × 16, the recognition rate reaches its highest. As the number of blocks grows larger than 16 × 16, the recognition rate is reduced and the running time increases. We can observe that the recognition rate of ASCBP-TOP is obviously higher than those of LBP-TOP, CSLBP-TOP, CBP-TOP and LQP-TOP. The extracted dynamic texture information of image sequences is more abundant using the proposed ASCBP-TOP. Compared with LBP, CS-LBP, CBP, LQP operators, the proposed ASCBP achieves the best result in facial expression recognition. ASCBP considers the effect of center pixel on the surrounding pixels to obtain richer expression features. Meanwhile, the threshold is adaptively selected according to the surrounding pixels, i.e., the average magnitude of the difference between the center-symmetric pixel pairs in the neighborhood is set as the threshold. This operator overcomes the shortcomings of easily ignoring the contrast between the center pixel and the surrounding pixels, as well as the denseness of texture caused by the fixed threshold. In this way, ASCBP operator has better noise immunity capability, and its discrimination is improved. Besides, the proposed operator is more robust to illumination and pose variations.
Recognition rates (%) with different numbers of blocks on CK database
Recognition rates (%) with different numbers of blocks on CK database
Recognition rates (%) with different numbers of blocks on JAFFE database
Facial expression images are divided into different scales, and the corresponding recognition rates are different. If the scale parameter is too small, it is easy to form the single scale facial expression image sequences that cannot establish the multi-scale space, and it fails to reflect abundant texture feature information in the multi-scale space. Meanwhile, if the scale parameter is too large, the multi-scale space will contain redundant texture feature information, thus the computational complexity is extremely large. Therefore, we must select an appropriate scale parameter to establish the best multi-scale space.
We compare the proposed ASCBP-TOP operator with four dynamic texture feature extraction methods, i.e., LBP-TOP [18], CSLBP-TOP [24],CBP-TOP [23]and LQP-TOP [12], and discuss the effect of scale parameter on CK database. Table 3 lists the expression recognition rates of different methods with different scale parameters based on CK database. We also compare the proposed ASCBP operator with four static feature extraction methods, i.e., LBP [7] CS-LBP [24],CBP [23] and LQP [12], and then discuss the effect of scale parameters on JAFFE database. Table 4 lists the expression recognition rates of different methods with different scale parameters based on JAFFE database.
Recognition rates(%) with different scale parameters on CK database
Recognition rates(%) with different scale parameters on CK database
Recognition rates (%) with different scale parameters on JAFFE database
From Tables 3 and 4, we can observe that the recognition rate reaches its highest when the scale parameter is 4, and the multi-scale space contains abundant texture feature information at this point. When the scale parameter is larger than 4, the recognition rate is reduced, and the calculation complexity increases. Therefore, the facial expression images are classified into four scales, i.e., the values of m are set as 0, 1, 2, 3 on the mth scale, respectively.
In the facial expression recognition process, different blocks are used to build a multi-scale space to avoid emphasizing the overall information while ignoring local details. According to the above experimental results, we select four different numbers of blocks, i.e., 2 × 2, 4 × 4, 8 × 8, 16 × 16. In another word, the facial expression sequence images are divided into 2m+1 × 2m+1 non-overlapping sub-blocks on the mth scale to establish a multi-scale space (m = 0, 1, 2, 3). Different scale spaces contain different image texture information, and the more abundant the texture information of image region, the larger the amount of information it contains; and vice versa. If different space regions are treated indiscriminately, it will easily neglect the uniqueness of texture feature in different regions, which affects the recognition result. Therefore, we adopt a weighted multi-scale space and assign different weights therein, i.e., different weights are assigned to the feature histograms in the scale spaces, and these feature histograms are connected in series according to different weights to obtain an overall feature histogram for describing the facial expression features.
Recognition rates (%) under different weighted multi-scale situations on CK database
Recognition rates (%) under different weighted multi-scale situations on CK database
Table 5 lists the expression recognition rates of different methods based on image sequences under different weighted multi-scale situations on CK database. Table 6 lists the expression recognition rates of different methods based on static images under different weighted multi-scale situations on JAFFE database. The four weights of the following two tables correspond to different scale spaces with different numbers of blocks, i.e., 2 × 2, 4 × 4, 8 × 8, 16 × 16.
Recognition rates (%) under different weighted multi-scale situations on JAFFE database
For Tables 5 and 6, when the four weights of 1/8, 1/4, 1/2 and 1 are separately given to the four scale spaces with different numbers of blocks(2 × 2, 4 × 4, 8 × 8, 16 × 16), i.e., the value of weight w m in the mth scale space is 2-(N-1-m) (m = 0, 1, 2, 3), all methods get the best recognition rate. The recognition rate of weighted multi-scale ASCBP-TOP achieves 94.68% on CK database, and the recognition rate of weighted multi-scale ASCBP achieves 98.57% on JAFFE database. We can observe that the way of weighted multi-scales has better recognition result than those without weights. By giving different weights, different space regions are treated discriminately to reflect the uniqueness of texture feature in different regions, thus facial expression features can be more fully described. Besides, texture complication is significant to the description of facial features, and the higher the given weight, the more obvious detail information of texture is obtained. Therefore, the method has higher recognition rate by giving large weights. While with more bigger weight of the fourth scale, recognition rate drops.
Table 7 lists the recognition results of expressions with weighted multi-scale ASCBP-TOP on CK database. We can observe that angry and disgust expressions are easily confused because the two expressions have similar information with detailed changes; Happy and Surprise are falsely recognized as Fear, because these three expressions are very similar especially with mouth changes. The facial changes of Happy, Disgust and Surprise are obvious. Thus, the recognition rate of Happy, Disgust and Surprise is higher than that of the other three expressions.
Recognition rates (%) with weighted multi-scale ASCBP-TOP on CK
Recognition rates (%) with weighted multi-scale ASCBP on JAFFE database
Comparison between different methods on CK database
Comparison between different methods on JAFFE database
Table 8 shows the recognition results of expressions with weighted multi-scale ASCBP on JAFFE database. We can observe that the recognition rates of most expressions achieve 100%, only Disgust and Sad exist error identification. The major reason is that Disgust and Sad have similar feature information on detailed changes of eyes, noses, mouth. Besides, the facial changes of some expression images are small, and these expressions are easily confused, which leads to lower recognition rate.
We compare the proposed weighted multi-scale ASCBP-TOP method with the state-of-the-art expression recognition methods on CK database. Table 9 lists the comparative results of different methods on CK database. The first four methods belong to a dynamic feature extraction type based on image sequences, and the last five methods including the proposed method belong to a static feature extraction type based on static images. The OC-LBP operator is composed of two ordinary 4-neighbor LBP: one part is calculated by comparing the horizontal and vertical neighborhood points around the center pixel, and the other part is calculated by comparing the diagonal neighborhood points around the center pixel. CLBP is defined by adopting two bits for each neighborhood point to encode the feature value. The dynamic approach based on image sequences is used to describe the local facial variations to effectively extract the motion features of image sequences and dynamic texture information. Therefore, the recognition result of dynamic approach is superior to static approach. The recognition rate of the proposed weighted multi-scale ASCBP-TOP in this paper reaches 94.68%, which achieves the best recognition result on CK database.
We compare the proposed weighted multi-scale ASCBP method with the state-of-the-art expression recognition methods on JAFFE database. Table 10 lists the comparative results of different methods on JAFFE database. PCA and LDA belong to a global feature-based type, which obtains the feature space of facial expression by mapping, and the other methods including the proposed method belong to a local feature-based approach. LTP, LDP, OC-LBP and CLBP are improved based on LBP operator. OC-LBP is composed of two ordinary 4-neighbor LBP, one part of which is calculated by comparing the horizontal and vertical neighborhood points around the center pixel, and the other part is calculated by comparing the diagonal neighborhood points around the center pixel. CLBP is defined by adopting two bits for each neighborhood point to encode the feature value. Local Bit Plane Code is used to cut a single facial expression image into eight different images by bit plane slicing, and this operator encodes the eight different images. The local feature-based approach has the advantages of rotation, shift and direction invariance, which is beneficial to the description of image texture features. From Table 10, we can observe that the recognition result of local feature-based approach is superior to that of the global feature-based approach. The recognition rate of the proposed weighted multi-scale ASCBP in this paper reaches 98.57%, which is higher than those of other methods on JAFFE database.
Conclusion
Considering that CS-LBP operator ignores the effect of center pixel as well as the relationship of sizes between the center pixel and the surrounding pixels, we propose a novel facial expression recognition method based on weighted multi-scale ASCBP-TOP. The proposed method no longer extracts the overall image features of facial expression image sequences, and expression sequences are partitioned into sub-blocks on different scales to establish a multi-scale space. The feature of each sub-block in each scale space is extracted, and different weights are assigned to the feature histograms in the scale spaces according to the richness of texture feature information to reflect the uniqueness of texture feature in different areas. The ASCBP operator considers the effect of center pixel on the surrounding pixels and assigns the largest weight to the center pixel. Meanwhile, the threshold value is adaptively selected according to the surrounding pixels, i.e., the average magnitude of the difference between the center-symmetric pixel pairs in the neighborhood is set as the threshold. ASCBP-TOP extends the ASCBP operator from a two-dimensional space to a three-dimensional space to effectively extract the dynamic feature information. The proposed method obtains a higher recognition rate on CK and JAFFE databases, and it is more robust to illumination, expression and pose variations. Moreover, it can effectively describe the dynamic texture information of facial expression sequences. However, this method mainly recognizes less occluded frontal facese. Our future study is to improve the recognition rate of obviously occluded or side faces effectively.
Footnotes
Acknowledgments
Thanks to the Psychology Department in Kyushu University offering JAFFE expression database.
The financial support by Tianjin Sci-tech Planning Projects of China (17ZLDZF00040),Tianjin Sci-tech Planning Projects of China (15ZCZDNC00130), Tianjin Sci-tech Planning Projects of China (14RCGFGX00846), and Natural Science Foundation of Hebei Province of China (F2015202239).
