Facial expression analysis using local directional stigma mean patterns and convolutional neural networks

Abstract

This paper represents automatic facial expression analysis method named Local Directional Stigma Mean Patterns (LDSMP) for automatic facial expression analysis and image retrieval using content based facial expression image retrieval and CNN. The traditional local patterns such as Local Binary Patterns (LBP) and Local Ternary Patterns (LTP) are applied for face recognition and expression analysis, calculated using relationship between the center pixel and neighboring pixels. The proposed method calculates the eight directional difference values then divided into the three ranges based on threshold values. Thus, the values are substituted with basic three positive values ( $+$ 3, $+$ 2, $+$ 1) and three negative values ( $-$ 3, $-$ 2, $-$ 1) to get more sensitive information from an image rather than aforementioned methods. The threshold can be select either static which is selected by user or dynamic is evaluated from image itself and supports to improve the efficiency. The performance of the proposed method is further improved by giving this patterns as input to the Convolutional Neural Networks (CNN) and compared with the existing methods LBP, LTP, and Directional Binary Code (DBC) in terms of Average Precision (AP), Average Recall (AR), and Average Retrieval Rate (ARR) using standard databases COREL 10K (DB1) and JAFFE (The Japanese Female Facial Expression) (DB2) and Extended Cohn-Kanade (CK $+$ ) (DB3) dataset.

Keywords

Local directional stigma mean patterns (LDSMP)convolutional neural networks (CNN)facial expression analysis content-based image retrieval (CBIR)feature extraction dynamic threshold

1. Introduction

Facial expression analysis (FEA) is the challenging task in the field of Artificial Intelligence (AI) and computer vision. FEA demands in many applications such as criminals expression analysis, entertainment, surveillance in public transportation, patient mood analysis, student interest in online classes, customer satisfaction etc. probably statistical methods have been applying with the combination of classification algorithms such as ANN, SVM etc. but still expecting more accurate results.

The exponential growth of digital data due to usage of internet especially online services and through digital equipment like digital cameras, mobile phones etc. are generating daunting size of data is making stun to handle such databases very hard and inept by using only human annotations. Text – based systems are used in earlier of 1970’s, images are searched based on human annotations but this systems are suffered with some disadvantages like staff is required to give annotations and inaccuracy due to wrong notation to an image. So managing this copious data is tedious task to the administrator, thus to overcome this there is a acute demand of proficient and automatic structure is required named Content – Based Image Retrieval. Its prominent step is feature extraction whose impinge based on the technique developed to extract the features from an image. There are two categories of features (i) Low-level features and (ii) High-level features, CBIR uses low-level features such as color, texture, spatial information, shape etc. from image itself only. Here, generating the common representation of an image by considering perceptual content is difficult task because a user may take photographs in various kind of situations, illumination, orientation etc. illustrative, Comprehensive and upgraded survey about extraction, multidimensional on CBIR and future directions was given by Yong Rui, Liu and Kokare [1, 2, 3]. The retrieval accuracy of CBIR is generally depends on the efficient feature extraction following with similarity measurement methods. The recent applications of Convolutional Neural Networks (CNN) for image classification has proved that provides better results, so motivated to fusion the local directional stigma patterns to the CNN as input feature vector array to recognize and analyze the facial expressions.

Texture feature is one of the most important characteristics among basic low level features of an image, this psych iatry has been extensively used in many CBIR applications due to its potentiality. Muller et al. [4]presented exclusive review on generic content based image retrieval and technologies used for medical diagnosis images especially heart imagery applications and future directions. Moghadam et al. [7] had proposed a new algorithm called wavelet correllogram for image retrieval based on the color correllogram and multiresolution using daubechies wavelets then quantization method was applied. Moghaddam and Saadatmand were developed extended wavelet correllogram ie. Gabor Wavelet Correllogram by extracting rotation invariant features using gabor wavelets with optimized weighted distance to enhance the accuracy [8]. Zhang et al. [9] were proposed a hybrid method used to gather the global feature with the help of training free LBP variance (LBPV) and also used dissimilarity metric for dimensionality reduction with the combination of nearest neighbor classification and chi-square distance to create model. Kokare et al. [10] have introduced DT-RCWF and DT-RCT to retrieve the texture features in 12 directions used to decompose image. Zhen et al. [14] were proposed a hybrid method having space, scale, and orientation using gabor filters and LBP for face recognition. Second order derivatives for Local ternary patterns with KPCA (kernel principal component analysis) to confine the traces of median filtering [17].

Local binary patterns and extensions uLBP, CLBP etc., gradient based patterns, histogram based patterns are became popular due to its simplicity and efficiency. In spite of that LBP methods suffering from disadvantages especially while encoding large and small intensity difference values shown in Fig. 1 that leads to unsuccessful to retrieve the features separately as positive and negative features from the prominent positions of an image particularly in facial expressions. However, these variations can be addressed as mark by proposed method.

Figure 1.

Example for generating same LBP (00100111) pattern for both textures with large and small intensity values.

2. Literature survey

Local binary pattern operator became an emerging method in texture feature extraction branch of content based image retrieval was introduced by Ojala et al. [5] further developed combinational operator using grayscale and rotation invariant to detect the uniform patterns for multiresolution and multidirectional ways to retrieve the spatial information [18]. Liao et al. [19] were proposed dominant local binary patterns (DLBP) for local texture classification, in addition circular symmetric gabor filter (CSGF) used to retrieve the global information to improve the accuracy. Tan and Triggs [20] were introduced a generalized feature descriptor ie. Local ternary patterns for face recognition as noiseless sensitive and more discriminate in uniform regions compared LBP. Murala et al. [21] has proposed local tetra patterns (LTrPs) by using second order derivatives in the directions of vertical and horizontal that combined with additional magnitude pattern also presented and compared results with gabor transform and also proposed directional local extrema patterns to retrieve the image edge information based on four directions such as 0 ${}^{\circ}$ , 45 ${}^{\circ}$ , 90 ${}^{\circ}$ and 135 ${}^{\circ}$ [22]. Also proposed local maximum edge binary patterns (LMEBPs) are introduced for local region based retrieval and object tracking [23]. Vipparthi et al. [24] has proposed color directional local quinary patterns (CDLQP) using DBC and quantization values for individual colors in RGB model. Furthermore, local gabor maximum edge positioned octal patterns are introduced by combining the sign and magnitude maximum edge octal patterns features [25]. Chen et al. [26] has intended a method ie. Dynamic cluster based image retrieval using unsupervised learning based on image feature. Manjunath et al. [27] has proposed clustered color space representation to retrieve the color space descriptor for large image databases. Fernando et al. [28] has built a system using combinational factors such as K-NN classifier, color space representation (grayscale, RGB, CMY etc.), color and texture combinational features (mean, standard deviation etc.) for automatic smooth surface image classification like ceramic tiles, textile surface etc. Mitra et al. [29] has proposed a new feature similarity measurement called as maximum information comparison index based on redundancy reduction for multiscale dataset representation to improve the speed. Zia Uddin et al. proposed latest technique named LDRHP (Local Directional Rank Histogram Patterns) and LDSMP (Local Directional Strength Patterns) are combined to extract the feature of facial expression images and used CNN for expression classification with three layers [34].

The basic patterns such as LBP, LTP are extracted the local information depends on edge distribution, which encoded either in positive direction or negative direction. Therefore these methods can also be advanced by considering the more directions instead of two directions. Our work proposed evaluating the possible directional information for each pixel as first-order derivatives then second order derivatives are evaluated based on quantization of proposed threshold values referred as local directional stigma patterns (LDSMP) for texture feature extraction for classification.

Organization of the paper includes: Sections 1 and 2 describes the brief introduction and precise literature survey. Section 3 represents the proposed work for feature extraction, Framework and Convolutional Neural Network description is mentioned in Sections 4 and 5, Results and analysis is presented in Section 6 and retrospectively, in Section 7, concluded with proposed work and mentioned probable future work, and finally references are listed in Section 8.

2.1 Local binary patterns

Ojala et al. [5] has firstly introduced LBP operator for texture classification of an image for feature extraction. LBP operator has been utilized and also proved in different application areas especially face recognition [18], expression analysis, biometric finger recognition and object tracking [23] etc.

Given center pixel ‘ $g_{c}$ ’ in the 3 $\times$ 3 image pixel matrix form, LBP value is calculated by finding the difference involving the center pixel $g_{c}$ and neighboring pixels $g_{p}$ where $p=1,2,3,\ldots,8$ using Eqs (1) and (2).

$\displaystyle\textit{LBP}_{P,\mathcal{R}}=\sum^{P}_{n=0}2^{n}\times f(I(g_{p})% -I(g_{c}))|_{p=1,2,\ldots,8}$ (1) $\displaystyle f(x)=\begin{cases}1&x\geqslant 0∼{}∼{}(\text{positive})\\ 0&\text{Otherwise}\end{cases}$ (2)

As shown in Fig. 2, ‘ $I(g_{c})$ ’ stand the center pixel gray value and ‘ $I(g_{p})$ ’ stand the neighboring pixel gray value. ‘ $P$ ’ be the no. of surrounding pixels and ‘ $\mathcal{R}$ ’ be the radius from the center pixel to the neighboring pixels.

Figure 2.

(a) 3 $\times$ 3 pixel format surrounding the center pixel and 8 neighboring pixels (b) differences among center pixel and surrounding pixels (c) Substituted with binary (0, 1) values.

Figure 2 shows an example for LBP calculation, then evaluate histograms of LBP patterns extracted for image edge distribution.

2.2 Local ternary patterns (LTPs)

LTP is an upgraded method of LBP which is commenced by Tan and Triggers [20]. Gray values are quantized based on width of threshold ‘ $\tau$ ’. ‘ $g_{c}$ ’ is quantized to zero, upper bound ( $g_{c}+\tau$ ) are quantized to ‘ $+$ 1’ and lower bound ( $g_{c}-\tau$ ) are quantized to ‘ $-$ 1’, $f(x)$ in Eq. (3) is reinstated with three valued function [].

$\displaystyle\overline{f}(x,g_{c},\tau)=\left.\begin{cases}+1&x\geqslant g_{c}% +\tau\\ 0&|{x-g}_{c}|<\tau\\ -1&x\leqslant g_{c}-\tau\end{cases}\right|_{x=g_{p}-g_{c}}$ (3)

Histogram can be build using Eqs (4) and (5) for an image by computing the LTP of each pixel $(m,n)$ dipicted in Fig. 3.

$\displaystyle H_{\text{LTP}}(\ell)=\sum^{\mathbb{N}_{1}}_{m=1}\sum^{\mathbb{N}% _{2}}_{n=1}f(\textit{LTP}(m,n),\ell),\ell\in[0,2^{p}-1]$ (4) $\displaystyle f(i,j)=\left.\begin{cases}1&i=j\\ 0&\text{Otherwise}\end{cases}\right|_{\mathbb{N}_{1}\times\mathbb{N}_{2}\text{% is image size}}$ (5)

Figure 3.

Computation of LTP operator, the ternary (3) patterns are split in to upper pattern and lower patterns by substituting ‘1’ for ‘ $+$ 1’ (positive) to get upper binary pattern and ‘1’ for ‘ $-$ 1’ (negetive) for lower binary pattern in both the cases ‘0’ for 0’s in ternary pattern.

2.3 Local adaptive image descriptor (LAID)

Local Adaptive Image Descriptor is proposed by Zahid et al. [11]. It is a deviation of LTP operator using the dynamic threshold based calculated value which is calculated from Eqs (6) and (7) the given image itself.

$\displaystyle\text{LAID}^{N}_{R}=\sum^{N}_{p=1}u(p_{l}-p_{c})\times 3^{p}$ (6)

Here,

$\displaystyle u(x)=\begin{cases}+1&x\geqslant\omega\\ -1&x\leqslant-\omega\\ 0&\text{else}\end{cases}$ (7)

$p_{l}$ and $p_{c}$ are the pixel intensities of neighbor and center pixels. ‘ $\omega$ ’ be the dynamic threshold i.e. $\omega=\text{median}\{|p_{l}-p_{c}|\}$ .

3. Feature extraction using proposed method local directional stigma mean patterns (LDSMPs)

The basic idea of this patters are stimulated from the local textural patterns like LBP, LTP, LDP and DBC etc. [5, 19, 17, 23]. It depicts the spatial and temporal structure of the local texture feature based on the directions of the centered gray pixel value ‘ ${g}_{c}$ ’. Given image ‘ $I$ ’, the first-ordered derivatives are calculated along with the $0^{\circ},{180}^{\circ},\pm{45}^{\circ},\pm 90^{\circ}$ and $\pm 135^{\circ}$ directions and are indicated as ${I}^{\prime}_{\alpha}{({g}_{c})}_{\alpha=0^{\circ},{180}^{\circ},\pm{45}^{% \circ},\pm 90^{\circ},\pm 135^{\circ}}$ . Let ${g}_{c}$ denote the center pixel in, Let ${I}^{\prime}_{\text{Dir}}({g}_{c})$ is the direction of the pixel then, the first-ordered derivatives at the center pixel ‘ ${g}_{c}$ ’.

$\displaystyle I^{\prime}_{0^{\circ}}(g_{c})=I(g_{0^{\circ}})-I(g_{c})$ $\displaystyle{I}^{\prime}_{+45^{\circ}}({g}_{c})=I({g}_{+45^{\circ}})-I({g}_{c})$ $\displaystyle{I}^{\prime}_{+90^{\circ}}({g}_{c})=I({g}_{+90^{\circ}})-I({g}_{c})$ $\displaystyle{I}^{\prime}_{+135^{\circ}}({g}_{c})=I({g}_{+135^{\circ}})-I({g}_% {c})$ $\displaystyle{I}^{\prime}_{180^{\circ}}({g}_{c})=I({g}_{180^{\circ}})-I({g}_{c})$ $\displaystyle{I}^{\prime}_{-45^{\circ}}({g}_{c})=I({g}_{-45^{\circ}})-I({g}_{c})$ $\displaystyle{I}^{\prime}_{-90^{\circ}}({g}_{c})=I({g}_{-90^{\circ}})-I({g}_{c})$ $\displaystyle I^{\prime}_{-135^{\circ}}(g_{c})=I(g_{-135^{\circ}})-I(g_{c})$

Figure 4.

Numerical illustration of proposed method considering static threshold values and generation of six binary patterns by substituting each threshold value with ‘1’ and other values and 0’s are with ‘0’.

From above equations, possible differences of all directions to an image pixel can also be calculated using Eq. (3) by considering 8 different directions. Thus, substituting the given threshold values using Eq. (9).

$\displaystyle I^{\prime}_{\text{Dir}}(g_{c})=I(g_{\text{Dir}})-I(g_{c})$ $\displaystyle\forall\text{Dir}=0^{\circ},{180}^{\circ},\pm{45}^{\circ},\pm 90^% {\circ}\text{ and }\pm 135^{\circ}$ (8)

The second order ‘ $\text{LDSP}^{2}(g_{c})$ ’ is defined as follows:

$\displaystyle f(p_{c},\tau_{3},\tau_{2},\tau_{1})=\begin{cases}+3,&{p}_{c}% \geqslant\tau_{3}\\ +2,&{\tau_{3}>p}_{c}\geqslant\tau_{2}\\ +1,&\tau_{2}>p_{c}\geqslant\tau_{1}\\ 0,&-\tau_{1}<p_{c}<\tau_{1}\\ -1,&-\tau_{2}<p_{c}\leqslant-\tau_{1}\\ -2,&-\tau_{3}<p_{c}\leqslant-\tau_{2}\\ -3,&{p}_{c}\leqslant-\tau_{3}\end{cases}$ (9)

Furthermore, substituted with binary ‘1’ for each threshold value and others values with ‘0’s that calculation shown in Fig. 4. Subsequently, for the remaining five values to get six patterns respectively using Eq. (3).

$\displaystyle P_{\text{Dir}}=\sum^{p-1}_{p=0}{2^{p-1}}\times$ $\displaystyle f(\text{Proposed}^{2}(g_{c}))|_{\text{Dir}=0^{\circ},{180}^{% \circ},\pm{45}^{\circ},\pm 90^{\circ},\pm 135^{\circ}}$ (10)

Upper and lower patterns are generated using Eqs (3)–(3)

$\displaystyle\text{LDSP\_UP}_{3}(I)=\sum^{n}_{p=0}{s(p_{l}-p_{c})}\times 2^{p}$ $\displaystyle\text{where},s(x)=\begin{cases}1,&{p}_{c}\geqslant\tau_{3}\\ 0,&\text{else}\end{cases}$ (11) $\displaystyle\text{LDSP\_UP}_{2}(I)=\sum^{n}_{p=0}{s(p_{l}-p_{c})}\times 2^{p}$ $\displaystyle\text{where},s(x)=\begin{cases}1,&{\tau_{3>}p}_{c}\geqslant\tau_{% 2}\\ 0,&\text{else}\end{cases}$ (12) $\displaystyle\text{LDSP\_UP}_{1}(I)=\sum^{n}_{p=0}{s(p_{l}-p_{c})}\times 2^{p}$ $\displaystyle\text{where},s(x)=\begin{cases}1,&{\tau_{2>}p}_{c}\geqslant\tau_{% 1}\\ 0,&\text{else}\end{cases}$ (13) $\displaystyle\text{LDSP\_UP}_{-1}(I)=\sum^{n}_{p=0}{s(p_{l}-p_{c})}\times 2^{p}$ $\displaystyle\text{where},s(x)=\begin{cases}1,&{-\tau_{2<}p}_{c}\leqslant-\tau% _{1}\\ 0,&\text{else}\end{cases}$ (14) $\displaystyle\text{LDSP\_UP}_{-2}(I)=\sum^{n}_{p=0}{s(p_{l}-p_{c})}\times 2^{p}$ $\displaystyle\text{where},s(x)=\begin{cases}1,&{{-\tau}_{3}<p}_{c}\leqslant% \tau_{2}\\ 0,&\text{else}\end{cases}$ (15) $\displaystyle\text{LDSP\_UP}_{-3}(I)=\sum^{n}_{p=0}{s(p_{l}-p_{c})}\times 2^{p}$ $\displaystyle\text{where},s(x)=\begin{cases}1,&{p}_{c}\leqslant-\tau_{3}\\ 0,&\text{else}\end{cases}$ (16)

3.1 Ascertaining threshold

Determining threshold is a challenging task while preprocessing the image. In view of calculating threshold we can do the data partition into specific limits to retrieve the required feature where some of the methods like LBP etc. are failing to cover the wanted intensity levels especially in ‘0’ assigning cases. Threshold can be determined in two ways

(ii) (i)
Static threshold and
(ii)
Dynamic threshold.

Figure 5.
Query image (top left) feature images of LBP, uniform LBP, circular LBP, LTP, DBC, Gabor filtered transform (GT), LGMMEPOP (local Gabor magnitude maximum edge positioned octal patterns), LGSMEPOP (local Gabor signed maximum edge positioned octal patterns), and proposed method LDSMP (top left corner query image).

Static threshold is user defined so user has to change frequently until getting satisfiable result to the given query. It is time consuming process and not generalized to different image categories. Dynamic threshold is calculated from the existed image pixel intensity data as appropriate to corresponding image. In proposed method considered static threshold as depicted in Fig. 4 like high, medium and low as 15, 6 and 1 according to the differences of given image so that data can be quantized in to 6 groups it covers three positive ranges and three negative ranges. Thus, dynamic threshold also obtained from an image using Eqs (17)–(19) maximum value of surrounding pixel is obtained as high threshold; minimum is obtained as low and median of 8 difference set is obtained as medium threshold.

$\displaystyle\tau_{3}=\max(\{f(x):x=1,2,3,\ldots,8\})$ (17) $\displaystyle\tau_{2}=\text{median}(\{f(x):x=1,2,3,\ldots,8\})$ (18) $\displaystyle\tau_{1}=\min(\{f(x):x=1,2,3,\ldots,8\})$ (19)

Here, $f(x)=|I^{\prime}_{\text{Dir}}(g_{c})|$ from Eq. (3).
3.2 Advantages of the proposed method (LDSMP) over other patterns

Advantages of the proposed patterns compare to familiar texture patterns LBP, LTP, DBC etc. can be validated as follows:

1) 1)
LBP, LTP and DBC are extracting the encoded information in the form of binary ie. ‘1’ and ‘0’ or in LTP ‘1’, ‘0’ or ‘ $-$ 1’ respectively. However, LDSMP extracts the information in 8 directions differences by substituting with 3 positive and 3 negative values so that it covers the contrast features from an image
2)
As encoded LBP and LTP using differences of center and neighboring pixels. Though LDSMP encoded second order substitution compared to threshold values by dividing differences into valid limits to cover the nearer features.
3)
LDSMP works as generalized like immune to background noise (as shown in Fig. 5), illumination invariant as well compared to others and also extracts the local textural feature for better results.
4)
CNN will improves the efficiency up to mark.

3.3 Framework for proposed method

Figure 6.

Framework for generating proposed feature extraction pattern local directional stigma mean patterns (LDSMPs).

Algorithm:

Load the image then convert into grayscale and pixels.

Calculate the first ordered derivatives in eight directions and construct the difference matrix.

Quantize the differences with 6 different values ( $+$ 3, $+$ 2, $+$ 1, $-$ 1, $-$ 2, $-$ 3) by comparing three threshold values (minimum ( $\tau_{1}$ ), medium ( $\tau_{2}$ ), maximum ( $\tau_{3}$ )) as shown in Fig. 4.

Divide the patterns into 3 positive 3 negative.

Evaluate the stigma patterns and split them into 6 binary patterns by alternating ‘1’ for each value at a time and others leave it as ‘0’s.

Give the constructed feature list as input to the CNN algorithm.

Classify the images based on detection using feature extraction.

Match up to the given query image with images in dataset.

Retrieve the top most images based upon the best matches.

Figure 6 illustrates the algorithmic step of proposed method.

Figure 7.

Convolutional neural network with multi layers.

4. Convolutional neural networks (CNN)

CNN is the deep learning algorithm which is sub field of machine learning and it is proved as efficient method for image classification, recognition etc. in various applications. CNN is the multi layer technique consists input layers, hidden layers and output layers as shown in Fig. 7. Input layer takes input from the feature database from proposed LDSMP method with unsupervised (content based instead of human annotated), hidden layers are for processing the images and the output layer is the fully connected layer to predict the required result based on the probability recognition of all images in the DB. Further, maximum probability will be considered as the required prediction.

performance of the proposed system is determined in terms of conservative measurements such as

Precision and recall are defined using Eqs (4) and (4) as follows:

$\displaystyle\text{Precision}(I_{Q},N)=$ $\displaystyle\frac{1}{N}\sum_{i=1}^{|\textit{DB}|}\Delta(\lambda(I_{i}),% \lambda(I_{Q}))|R(I_{i},I_{Q})\leqslant N$ (20)

where, $I_{Q}$ and $I_{i}$ are the query image and $i^{\text{th}}$ image in $|\textit{DB}|$ , ‘ $N$ ’ is the whole no. of images in the database $|\textit{DB}|$ , ‘ $\lambda(I)$ ’ is the category of an image ‘ $R(I_{i},I_{Q})$ ’ finds the rank of image $I_{i}$ from $|\textit{DB}|$ images and ‘ $\Delta(\lambda(I_{i}),\lambda(I_{Q}))$ ’ can be done

$\displaystyle\Delta(\lambda(I_{i}),\lambda(I_{Q}))=\begin{cases}1,&\lambda(I_{% i})=\lambda(I_{Q})\\ 0,&\text{else}\end{cases}$

Recall is described as

$\displaystyle\text{Recall}(I_{Q},N)=$ $\displaystyle\frac{1}{N_{g}}\sum_{i=1}^{|\textit{DB}|}\Delta(\lambda(I_{i}),% \lambda(I_{Q}))|R(I_{i},I_{Q})\leqslant N$ (21)

Figure 8.

Comparison between the different local patterns with proposed method in terms of average retrieval precision, average retrieval recall and average retrieval rate on DB1 (on $x$ -axis no. of images retrieved).

Average precision and average retrieval rate can be calculated using Eqs (22) and (23) as

$\displaystyle\text{Precision}_{\text{Avg}}=\frac{1}{|\textit{DB}|}\sum_{i=1}^{% |\textit{DB}|}\Pr\text{ecision}(I_{i},N)|_{N\leqslant\lambda(I_{i})}$ (22) $\displaystyle\text{Arr}=\frac{1}{|\textit{DB}|}\sum_{i=1}^{|\textit{DB}|}\text% {Recall}(I_{i},N)|_{N\leqslant\lambda(I_{i})}$ (23)

Here, ‘ $\lambda(I_{i})$ ’ is the no. of images of each category.

5. Results and analysis

5.1 Experiment 1

COREL-10K (DB1) database consists large volume of various category images like humans, nature, sports, animals, buildings etc. we collected 10000 images of 100 different categories elephants, Africans, buildings, dinosaurs, beaches, horses, buses, flowers, food and mountains for DB1. Each category has 192 $\times$ 128 or 128 $\times$ 192 sized 100 images. The given database divided into trained and test datasets, every test image is tested on trained dataset. If the retrieval results are matched with the expected category so we can say to the proposed system is efficient. results are compared in terms of average retrieval precision, average retrieval recall and average retrieval rate are listed in Tables 1 and 2 dipicted in Figs 8–10.

Table 1
Various local patterns with proposed method in terms of average retrieval precision, average retrieval recall and average retrieval rate on DB1

	Average retrieval precision (%)	Average retrieval recall (%)	Average retrieval rate (%)
LBP	37.19	14.97	45.05
CSLBP	26.45	10.15	34.72
UCLBP	27.46	10.18	41.56
LTP	40.8	16.35	48.17
LMEBP	40.25	15.47	46.07
DLEP	40	15.74	42.71
DBC	42.86	18.59	45.54
LTrP	40.1	16.54	45.41
SMEPOP	45.72	17.94	46.75
MMEPOP	48.19	19.22	48.65
Proposed	52.54	21.25	53.25

Figure 9.

Comparison of proposed method (LDSMP) and CNN with existing methods in terms of average retrieval recall, average retrieval precision and average retrieval rate (on $x$ -axis No. of images retrieved).

Table 2

Confusion matrix for facial expression image retrieval out of 18 images

	Neutral	Happy	Surprise	Sad	Anger	Fear	Disgust
Neutral	17	0	0	1	0	0	0
Happy	0	16	0	0	2	0	0
Surprise	0	0	16	0	0	0	2
Sad	2	0	0	15	0	1	0
Anger	0	0	1	0	16	1	0
Fear	1	0	0	1	0	16	0
Disgust	0	2	0	0	0	1	15

Table 3

Confusion matrix for facial expression recognition out of 100 images

	Neutral	Happy	Surprise	Sad	Anger	Fear	Disgust
Neutral	100	0	0	0	0	0	0
Happy	0	100	0	0	0	0	0
Surprise	0	0	100	0	0	0	0
Sad	2	0	0	98	0	0	0
Anger	0	0	1	0	99	0	0
Fear	0	0	0	2	0	98	0
Disgust	0	0	0	0	0	0	100

Table 4

Facial expression recognition rate (FERR) out of 100 images

	Facial expression recognition rate (FERR) in %
Neutral	100
Happy	100
Surprise	100
Sad	98
Anger	99
Fear	98
Disgust	100

Figure 10.

Facial expression retrieval using proposed (LDSMP) method with CNN (on $x$ -axis no. of images retrieved and on $y$ -axis percentage of recognized expression).

5.2 Experiment 2

DB2 (JAFEE) is used in experiment 2 which contains 213 images of 7 (happiness, surprise, sadness, fear, anger, disgust and neutral) various facial expressions of 10 different Japanese female models with different poses. It consists each image of 256 $\times$ 256 size. We must thank to Michael Lyons who has created this database and it is have been used for identifying facial expression and face recognition. In this experiment proposed system performance is measured in Facial Expression Recognition Rate (FERR) tabulated in Table 4, Average Retrieval Rate (ARR) by taking every image as query. results are mentioned in Fig. 11 and Table 3.

Figure 11.

Facial expression (disgust) retrieval images using proposed method from DB2.

Figure 12.

Facial expression recognition using LDSMP with CNN.

6. Conclusion

Presented a novel method to recognize the image and retrieval and facial expression analysis under different variations in differences named LDSMP for CBIR. LDSMP encodes the feature based on the differences of 8 directional values then divides the range of values into 6 categories. Thus, the values are substituted with quantized values based on threshold. We also proposed the dynamic thresholds to divide the range from the image itself to improve the accuracy for various images. In addition, CNN (Covolutional Neural Network) algorithm used for image classification and recognition based on feature vector for DB1, DB2. We observed that performance of the proposed system improved markable in terms of precision, recall and ARR (Average Retrieval Rate) compared to existing methods LBP, LTP, DBC etc. The method works same for the large datasets (sample expression recognition images are shown in Fig. 12) like cohn – kannede (CK $+$ ) etc. Retrospectively, as compared with the standard existing methods, We can also improve the results by applying filters like gabor, sobel etc. in preprocessing

References

Rui

and Huang

T.S.

, Image retrieval: Current techniques, promising directions and open issues, J Visual Commun Image Represent (1999) 10(1) 39–62.

Liu

Zhang

and Ma

W.Y.

, A survey of content-based image retrieval with high-level semantics, Pattern Recogn (2007) 40(1) 262–282.

Kokare

Chatterji

B.N.

and Biswas

P.K.

, A survey on current content based image retrieval methods, IETE J Res 48(3–4) (2002), 261–271.

Muller

Michoux

Bandon

and Geissbuhler

, A review of content-based image retrieval systems in medical applications – clinical benefits and future directions, International Journal of Medical Informatics (2004), 1–23.

Ojala

Pietikainen

and Harwood

, A comparative study of texture measures with classification based on feature distributions, Pattern Recogn (1996) 29(1) 51–59.

Saadatmand Tarzjan

and Moghaddam

H.A.

, A novel evolutionary approach for optimizing content based image retrieval, IEEE Trans Syst, Man, Cybern B, Cybern 37(1) (2007), 139–153.

Moghaddam

H.A.

Khajoie

T.T.

and Rouhi

A.H.

, A new algorithm for image indexing and retrieval using wavelet correlogram, in: Proc ICIP (2003), III-497–III-500.

Moghaddam

H.A.

and Saadatmand Tarzjan

, Gabor wavelet correlogram algorithm for image indexing and retrieval, in: Proc ICPR (2006), 925–928.

Guo

Zhang

and Zhang

, Rotation invariant texture classification using LBP variance with global matching, Pattern Recogn 43(3) (Mar 2010), 706–719.

10.

Kokare

Biswas

P.K.

and Chatterji

B.N.

, Texture image retrieval using new rotated complex wavelet filters, IEEE Trans Syst, Man, Cybern B, Cybern 35(6) (2005), 1168–1178.

11.

Pietikainen

Ojala

Scruggs

Bowyer

K.W.

Jin

Hoffman

Marques

Jacsik

and Worek

, Rotational invariant texture classification using feature distributions, Pattern Recogn 33(1) (2000), 43–52.

12.

Ahonen

Hadid

and Pietikainen

, Face description with local binary patterns: Applications to face recognition, IEEE Trans Pattern Anal Mach Intel 28(12) (2006), 2037–2041.

13.

Lei

Liao

Pietikäinen

and Li

S.Z.

, Face recognition by exploring information jointly in space, scale and orientation, IEEE Trans Image Process 20(1) (2011), 247–256.

14.

Zhao

and Pietikainen

, Dynamic texture recognition using local binary patterns with an application to facial expressions, IEEE Trans Pattern Anal Mach Intell 29(6) (2007), 915–928.

15.

Zhang

Gao

Zhao

and Liu

, Local derivative pattern versus local binary pattern: Face recognition with higher-order local pattern descriptor, IEEE Trans Image Process 19(2) (2010), 533–544.

16.

Zhang

Wang

and Shi

Y.Q.

, Revealing the traces of median filtering using high-order local ternary patterns, IEEE Signal Processing Letters 21(3) (2014), 275–279.

17.

Ojala

Pietikainen

and Maenpaa

, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans Pattern Anal Mach Intell 24(7) (2002), 971–987.

18.

Liao

Law

M.W.K.

and Chung

A.C.S.

, Dominant local binary patterns for texture classification, IEEE Transactions on Image Processing 18(5) (2009), 1107–1118.

19.

Tan

X.Y.

and Triggs

, Enhanced local texture feature sets for face recognition under difficult lighting conditions, IEEE Transactions on Image Processing 19(6) (2010), 1635–1650.

20.

Murala

Maheshwari

R.P.

and Balasubramanian

, Local tetra patterns: A new feature descriptor for content-based image retrieval, IEEE Transactions on Image Processing 21(5) (2012), 2874–2886.

21.

Murala

Maheshwari

R.P.

and Balasubramanian

, Directional local extrema patterns: A new descriptor for content based image retrieval, International Journal of Multimedia Information Retrieval 1(3) (2012), 191–203.

22.

Subrahmanyam

Maheshwari

R.P.

and Balasubramanian

, Local maximum edge binary patterns: A new descriptor for image retrieval and object tracking, Signal Processing (2012), 1467–1479.

23.

Vipparthi

S.K.

and Sagar

S.K.

, Color directional local quinary patterns for content based indexing and retrieval, Human-centric Computing and Information Sciences (2014), 1–13.

24.

Vipparthi

S.K.

Murala

and Sagar

S.K.

, Local Gabor maximum edge position octal patterns for image retrieval, Neuro Computing 167 (2015), 336–345.

25.

Chen

Wang

J.Z.

and Krovetz

, CLUE: Cluster-based retrieval of images by unsupervised learning, IEEE Trans Image Processing (2005), 1187–1201.

26.

Deng

Manjunath

B.S.

Kenney

Moore

M.S.

and Shin

, An efficient color representation for image retrieval, IEEE Trans Image Processing (2001), 140–147.

27.

Manjunath

B.S.

Ohm

J.R.

Vasudevan

V.V.

and Yamada

, Color and texture descriptors, IEEE Trans Circuits and Systems for Video Technology (2001), 703–715.

28.

Mitra

Murthy

C.A.

and Pal

S.K.

, Unsupervised feature selection using feature similarity, IEEE Trans Pattern Analysis and Machine Intelligence (2002), 301–312.

29.

Zhang

and Shen

L.L.

, Directional binary code with application to PolyU near-infrared face database, Pattern Recognition Letters 31 (2010), 2337–2344.

30.

Jagadeesh

H.S.

Suresh Babu

and Raja

K.B.

, DBC based face recognition using DWT, Signal and Image Processing: An International Journal 3(2) (2012).

31.

M.N.

and Vetterli

, Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance, IEEE Trans Image Processing (2002), 146–158.

32.

Varaprasad

and Sundra Murthy

S.B.

, Detection of potholes in autonomous vehicle, IET Intelligent Transport Systems (2013), 543–549.

33.

Viswanadha Raju

and Sreedhar

, Query processing for content based image retrieval, International Journal of Soft Computing and Engineering 1(5) (2011), 122–131.

34.

Zia Uddin

M.D.

Khaksar

and Torresen

, Facial expression recognition using salient features and convolutional neural network, IEEEAccess, Special Section on Visual Surveillance and Biometrics: Practices, Challenges, and Possibilities 5(4) (2017), 26146–26161.

35.

Mohamed

Asnaoui Khalid

and Mohammed

, Content-based image retrieval using convolutional neural networks, (2018), 463–476.

Facial expression analysis using local directional stigma mean patterns and convolutional neural networks

Abstract

Keywords

1. Introduction

2.1 Local binary patterns

5.1 Experiment 1

Table 1 Various local patterns with proposed method in terms of average retrieval precision, average retrieval recall and average retrieval rate on DB1

References

Table 1
Various local patterns with proposed method in terms of average retrieval precision, average retrieval recall and average retrieval rate on DB1