Abstract
Probabilistic neural network (PNN) is simple and can be easily implemented. PNN has fast learning speed, and its outputs are posterior probabilities which facilitate the combination of classifiers with fuzzy integral. In this paper, we proposed a face recognition algorithm named EPNN, which combine PNN classifiers with fuzzy integral, and can make full use of the superiority of PNN and ensemble learning. The proposed method includes three stages: (1) the incomplete wavelet packet decomposition of face images; (2) training PNN classifiers with wavelet sub-images with low frequency components. (3) combination of the trained PNN classifiers by fuzzy integral. Compared with four matrix subspace algorithms, the proposed method can obtain competitive performance. Such as, it can improve the accuracy of face recognition with less CPU time. The experimental results on JAFFE, YALE, ORL and FERET confirm that the proposed method outperform the four matrix subspace algorithms.
Keywords
Introduction
As a hot research topic, face recognition [1] have been widely investigated, and have been successfully applied in security detection, person identification, video surveillance, etc. For static face images, roughly speaking, the process of face recognition consists of two phases: feature extraction and classification or matching. In the framework of subspace face recognition, the feature extraction methods can be roughly categorized into two classes: linear subspace methods and nonlinear subspace methods. Eigenface [2] is the pioneering work in linear subspace methods. This method used PCA (Principal Component Analysis) [3] to efficiently represent face images. Since the Turk’s seminal work, PCA has become one of the most successful approaches in face recognition. As pointed in [4], in Eigenface method, the 2D face image matrices must be previously transformed into 1D image vectors, which are usually high dimensional vectors. It is difficult and even impracticable to calculate the eigenvalues of the covariance matrix due to the small sample size problem [5]. In order to deal with this problem, Yang et al. [4] proposed an algorithm 2DPCA, which can directly construct the covariance matrix from the original image matrices. The size of the constructed covariance matrix in 2DPCA is much smaller than the one in PCA. But 2DPCA has two drawbacks [6]: (1) it needs more coefficients to represent the face image than PCA, in other words, 2DPCA needs much more storagespace than PCA. (2) 2DPCA only extracts the features in one direction of images rather than two directions. An improved algorithm named (2D) 2PCA which can overcome the two shortcomings mentioned above was developed in [6]. Along with this technique route, based on linear discriminant analysis (LDA) or fisher discriminant analysis (FDA), Belhumeur et al. [7] proposed Fisherfaces method, and Li et al. proposed 2DLDA also named 2DFDA [8]. Noushath et al. proposed (2D) 2LDA [9]. Based on these basic linear subspace methods, some other competitive algorithms have been proposed by different authors. For instance, by considering the statistical uncorrelation between the extracted factors, a discriminant subspace learning method constrained by locally statistical uncorrelation for face recognition was presented in [10]. By introducing the semi-random technique into subspace methods for face recognition, Zhu et al. [11] proposed a novel method, which can simultaneously address the small sample size problem and the sensitivity problem. A unified framework for linear subspace methods including PCA and LDA have been proposed by Wang andTang [12].
The nonlinear subspace methods mainly include kernel transform based approaches, the Fourier transform based approaches and the wavelets transform based approaches. The kernel transform based approaches combine kernel methods and linear subspace methods. Such as, Chu et al. [13] presented kernel discriminant transformation approach for face recognition. Kim et al. [14] applied kernel principal component analysis to face recognition. The basic ideas of Fourier and wavelet transform based approaches are same. The face images are first mapped into different subspace with Fourier transform or wavelet transform, and then some or single subspace images are selected for face recognition. The representative works include the Laplacianfaces method [15], waveletfaces method [16] and multiresolution analysis method [17] et al. A hybrid approach of linear subspace and nonlinear subspace has been proposed in [18], an excellent survey paper on face recognition in subspace can be found in [19].
For the purpose of improving the accuracy of face recognition, based on ensemble learning, some ensemble face recognition methods have been proposed. Owusu et al. [20] proposed a neural ensemble method which uses AdaBoost to integrate the neural networks for face recognition. A feature fusion method for face recognition has been proposed in [21]. Kwak and Pedrycz [22] use fuzzy integral and wavelet decomposition method for face recognition. In this method [22], all wavelet sub-images are used for integration. But we find by experimentally studies that the wavelet sub-images with bidirectional high frequency components have little contribution to improving the accuracy of face recognition. In order to deal with this problem, an improved algorithm named EPNN is presented in this paper. The proposed method consists of three steps, firstly, the face images are decomposed with incomplete wavelet packet transform, secondly, the PNN classifiers are trained with the selected wavelet sub-images, which include low frequency components, thirdly, the trained PNN classifiers are fused by fuzzy integral. Compared with four matrix subspace algorithms, the proposed method can obtain competitive performance.
The paper is organized as follows. The preliminaries used in this paper are given in section 2. The proposed methods are presented in section 3. Experimental results and analysis are presented in section 4.Section 5 concludes the paper.
Preliminaries
In this section, we briefly review the preliminaries used in this paper, including 2D wavelet transform and 2D wavelet packet transform, probabilistic neural network and fuzzy integral.
2D Wavelet Transform and 2D Wavelet Packet Transform
The 2D wavelet transform [23, 24] of image f (x, y) include two steps. Firstly, f (x, y) is 2-down-sampled and filtered with low-pass filter L and high-pass filter H in each row of f (x, y), two transformed images f L (x, y) and f H (x, y) are obtained. Secondly, f L (x, y) and f H (x, y) are 2-down-sampled and filtered again with low-pass filter L and high-pass filter H in each column of f L (x, y) and f H (x, y) respectively, four sub-images f LL (x, y), f LH (x, y), f HL (x, y) and f HH (x, y) are finally obtained, see Fig. 1. Where, l and h are the impulse response of filter L and H respectively. The process depicted in Fig. 1 is called one-level wavelet transform, If we repeat this process only on f LL (x, y) which contain low frequency components in two directions, i.e. row and column, we obtain two level wavelet transform of image f (x, y). If we repeat this process on all four sub-images, we call this transform as 2D wavelet packet transform. Wavelet transform and wavelet packet transform are two type of multiresolution analysis approaches.
Let f (x, y) be the face image of size M × N, l (i) be the low-pass coefficients of a wavelet basis function, i = 0, 1, N l - 1, where N l is the support length of the low-pass filter L. h (j) be the high-pass coefficients of a wavelet basis function, j = 0, 1, N h - 1, where N h is the support length of the low-pass filter H. Then the wavelet transform of f (x, y) can be formulated as follows.
Probabilistic Neural Network
The probabilistic neural network (PNN) [25] was originally developed by Specht in 1990. PNNs consist of three layers: input layer, pattern layer and category layer. Given a normalized training set D = {(x j , y j ) |j = 1, 2, . . . , n}, where x j ∈ R d , , y ∈ {1, 2, . . . , k}, i.e. the samples are classified into k categories ω1, ω2, . . . ω k , and let n i be the number of samples in class ω i (i = 1, 2, . . . , k). The PNN trained with D is diagramed in Fig. 2.
In Fig. 2, there are d units in input layer, n units in pattern layer and k units in category layer. The activation functions of input layer and output layer are all linear functions, the activation function of pattern layer is , where σ is smoothing parameter. Each input unit is connected to each of the n pattern units; each pattern unit is, in turn, connected to one and only one of the k category units. The connections from the input to pattern units represent modifiable weights, which will be trained by normalizing the pattern x j to unit length, i.e. w jk = x jk , (j = 1, 2, . . . , n ; k = 1, 2, . . . , d). Each link from a pattern unit to its associated category unit is of a single constant weight β jk = 1, for other link, the weight β jk = 0. The training process of PNN is actually the process of normalizing the patterns.
Given a normalized testing set T = {(x
t
, y
t
) |t = 1, 2, . . . , m}, for each x
t
∈ T, the corresponding output of unit j in pattern layer is
The output of unit s in category layer is
Accordingly, the output of PNN is p t = (p it , . . . , p kt ), where p st (s = 1, 2, . . . , k ; t = 1, 2, . . . , m) is the posterior probability of sample x t belonging to category ω s , which facilitate the combination of classifiers with fuzzy integral [26]. Let , then sample x t is classified into category ω t .
The reasons why we select PNN as component classifiers for ensemble lie in the following two points: (1) PNN classifier is a suitable candidate for ensemble of wavelet sub-images for face recognition using fuzzy integral. Because that the outputs of PNN are posterior probabilities which facilitate the combination of classifiers with fuzzy integral. (2) The fuzzy integral is a good approach for modeling the interactions of different component classifiers, i.e., the PNN trained with different wavelet sub-images. Yet other existing approaches can not deal with this kind of interactions.
Fuzzy integral
In this section, we present the notations related to fuzzy integral [26]. Let D = {(x
i
, y
i
) |x
i
∈ R
d
, y
i
∈ R
k
, i = 1, 2, ⋯ , n} be a training set, Ω = {ω1, ω2, ⋯ , ω
k
} be a set of class labels, L = {L1, L2, ⋯ , L
l
} be a set of classifiers trained on different subset of D, L
i
is called a component classifier. For ∀x ∈ R
d
, L
i
assigns a class label to x from Ω. As given by Kuncheva [27], we may define the classifier output to be a k-dimensional vector with support degree of the classes, i.e.
In this section, we firstly present the idea of this paper, and then present the proposed algorithm denoted by EPNN(Ensemble of Probabilistic Neural Network). The idea of the algorithm is simple. Specifically, the proposed algorithm includes three stages: (1) incomplete wavelet packet decomposition, (2) train PNN classifiers, (3) combination of the trained PNN classifiers with fuzzy integral. In our experimentally studies, we find that some wavelet sub-images have little contribution to improving the accuracy of face recognition, such as, in different level, the bidirectional high frequency images almost have no contribution to the recognition. Accordingly, in the first stage of the proposed algorithm, we apply incomplete wavelet packet decomposition to the face images. The component classifiers are trained with different multiresolution wavelet sub-images, which have more low frequency components. In this paper, we use probabilistic neural network as the component classifier. The framework that we select different multiresolution wavelet sub-images used for training component classifiers is depicted in Fig. 3.
After the component classifiers are trained, we integrate the trained component classifiers with fuzzy integral. Because that the component classifiers are trained with different multiresolution wavelet sub-images by probabilistic neural network, it can be guaranteed to some extent that the trained component classifiers have some diversities. Moreover, although the component classifiers are trained with different multiresolution wavelet sub-images, which are transformed from the same original image, it is unavoidable that there are interactions among the component classifiers. Because the fuzzy integral is a good approach for modeling the interactions, this is the reason why we choose fuzzy integral as the ensemble tool to combine the component classifiers. In the combination of classifiers, for different component classifiers, the determination of the fuzzy densities plays a crucial role. In this paper, we use an adaptive method to determine the fuzzy density of the different component classifiers [28, 29], the adaptive method can give full consideration to the training behavior and the testing behavior of the component classifier. Specifically, the initial fuzzy densities are firstly determined by the confusion matrix CM
i
of the component classifiers, in which the training performance of the component classifiers are embedded, and then the differentiate capability α
i
and support capability β
i
[28] of the component classifiers are determined by the decision matrix, in which the testing performance of the component classifiers are embedded. Finally, the fuzzy densities of the component classifiers are determined by α
i
and β
i
as follows [29].
Obviously, the smaller p ij is, the more likely that x will be misclassified by L i ; the bigger the p ij is, the more likely that x will be correctly classified by L i .
For component classifier L
i
(1 ≤ i ≤ l), its support capability β
i
are defined as
The details of the proposed algorithm EPNN is described as Algorithm 1.
The proposed algorithm EPNN consists of three phases: (1) incomplete wavelet packet decomposition, i.e. the step 1 to step 2 in the Algorithm 1; (2) train PNN classifiers, i.e. the step 3 to step 5 in the Algorithm 1; (3) combination of the trained PNN classifiers with fuzzy integral, i.e. the step 6 to step 26 in the Algorithm 1. Fig. 4 shows an illustrative example to interpret the process of the proposed algorithm. In Fig. 4, the inputs are 4 face images represented by 4 rectangles. There are 6 component classifiers trained with 6 wavelet sub-images. The sub-images marked with symbol “×” are discarded, i.e., these sub-images are not used to train PNN.
FDB = {f1 (x, y) , ⋯ , f n (x, y)}, face database.
Recognition rules.
1:For(i ← 1 ton)
2: Do m-level incomplete wavelet packet transform of f i (x, y);
3:Select the wavelet sub-images as the way depicted in Fig. 3, let l be the number of selected wavelet sub-images;
4:Train PNN classifiers L1, L2, ⋯ , L l with the selected wavelet subspaces images;
5:Return L = {L1, L2, ⋯ , L l }.
6:For(i ← 1 tol)
7: Compute its confusion matrix CM i ;
8:For(each testing image f)
9: Compute its decision matrix DP (f);
10:For(i ← 1 tol)
11: For(j ← 1 tok)
12: Let h j (L i ) = p ij (x);
13:For(j ← 1 tok)
14: Compute λ j with (4)
15:For(i ← 1 tol)
16: For(j ← 1 tok)
17: Compute g j (A i ) with (3)
18:For(i ← 1 tol)
19: Compute α i with (11);
20: Compute β i with (12);
21:For(i ← 1 tol)
22: For(j ← 1 tok)
23: Compute with (6);
24:For(j ← 1 tok)
25: Compute o j (x) as
26:Compute .
The proposed algorithm can directly deal with the matrix data, two experiments are conducted to verify the effectiveness of the proposed algorithm EPNN. The first one is to analyze the influence of parameter σ on the recognition accuracy, the second experiment is to compare EPNN with four matrix subspace algorithms on four face databases on two aspects: recognition accuracy and CPU time. The four face databases are JAFFE [30], YALE [31], ORL [32] and FERET [33], which include 213, 165, 400 and 1400 face images respectively, the size are 256 × 256, 320 × 243, 92 × 112, 80 × 80 respectively, the number of people are 10, 15, 40 and 200 respectively. The reason for choosing JAFFE, YALE and ORL is to conduct a fair comparison with four matrix subspace algorithms which are 2DPCA [4], 2DLDA [8], WT+2DPCA [22] and WT+2DLDA [22]. We select face database FERET because that it includes many classes, as many as 200 categories. The 10-fold cross-validation method is employed in our experiments, Our experiments run on a PC with 2.5GHz CPU and 4GB memory, the programming language is MATLAB 8.0, BIOR 3.1 is used as the wavelet basis function.
The parameter σ in the activation function of PNN has great influence on the recognition accuracy. In this section, we attempt to explore the relation between the value of parameter σ and the recognition accuracy of EPNN. We change the parameter σ from 0.0 to 50 with step-size 5. For different values of parameter σ, we record the corresponding recognition accuracies of the EPNN, the changing curves are shown in Fig. 5. From the curves, we can see that for different face database, the appreciate value of σ is different, such as, for face database FERET, the suitable value of σ should be taken from interval [23, 50], while for other three face database YALE, ORL and JAFFE, the suitable value of σ should be taken from the interval [17, 20], [4, 12] and [27, 35] respectively. In the next experiment, for face database YALE, ORL, JAFFE and FERET, we set the value of σ to 18, 10, 28 and 25 respectively.
Inwe compare the proposed algorithm EPNN with the 2DPCA [4], 2DLDA [8] WT+2DPCA [22], and WT+2DLDA [22] in two aspects, including the recognition accuracy and CPU time. we randomly select 90 percent samples in each face database as training set, other 10 percent samples as testing set, For each face database, we run 10-fold cross-validation ten times, the experimental results are the average of the 10 outputs. The experimental results compared with 2DPCA [4] are listed in Table 1, The experimental results compared with 2DLDA [8] are listed in Table 2, The experimental results compared with WT+2DPCA [22] are listed in Table 3, The experimental results compared with WT+2DLDA [22] are listed in Table 4. It can be seen from the experimental results that the proposed algorithm outperforms the four matrix subspace algorithms both in recognition accuracy and CPU time. The high recognition of the proposed algorithm EPNN are due to the combination of basic classifiers trained with wavelet sub-images, while the fast learning speed of the proposed algorithm EPNN are due to the PNN. Regarding the experimental results, we also conduct a statistical analysis with paired T-test in confidence level 0.05 [34, 35]. For each data set and for each algorithm, we run 10-fold cross-validation 10 times and then obtain five 100-dimensional statistics X i (i = 1, 2, ⋯ , 5) which are respectively corresponding to 2DPCA, 2DLDA, WT+2DPCA, WT+2DLDA and the proposed algorithm. Next the paired T-test is applied to the experimental results by computing the values of MATLAB functions ttest2 (X i , X5) (i = 1, 2, 3, 4). Due to the limitation of pages, we only list the results (i.e. p-values) of the statistical analysis on testing accuracy in Table 5.From the p-values listed in Table 5, we can further confirm that the proposed algorithm in testing accuracy outperforms the four matrix subspace algorithms.
Conclusions
This paper proposed a face recognition algorithm, which combine PNN classifiers with fuzzy integral. The proposed algorithm has the following merits: (1) it can make full use of the superiority of PNN and ensemble learning. (2) it can make full use of the superiority of the interactions between the basic classifiers used for combination. (3) it can improve the recognition with fast learning speed. These advantages of the proposed algorithm have been verified by the experiments conducted in this paper. Regarding the future works, the following three problems are worth studying. (1) For a given face database, how to find the optimal wavelet basis function by using the method of statistical analysis? (2) The problem of scalability, i.e., whether the algorithm can be extended to big data environment by deploying the computation of fuzzy integral to different cloud computing nodes? (3) In recent years, deep learning is a very hot research topic, and has been successfully applied to computer vision, natural language processing and speech recognition. Is it feasible to replace PNN with deep neural networks, such as, deep convolution neural network?
Footnotes
Acknowledgments
This research is supported by the national natural science foundation of China (61170040, 71371063), by the natural science foundation of Hebei Province (F2013201110, F2013201220), by the Key Scientific Research Foundation of Education Department of Hebei Province (ZD20131028) and by the Opening Fund of Zhejiang Provincial Top Key Discipline of Computer Science and Technology at Zhejiang Normal University, China.
