Ensemble of multiresolution probabilistic neural network classifiers with fuzzy integral for face recognition

Abstract

Probabilistic neural network (PNN) is simple and can be easily implemented. PNN has fast learning speed, and its outputs are posterior probabilities which facilitate the combination of classifiers with fuzzy integral. In this paper, we proposed a face recognition algorithm named EPNN, which combine PNN classifiers with fuzzy integral, and can make full use of the superiority of PNN and ensemble learning. The proposed method includes three stages: (1) the incomplete wavelet packet decomposition of face images; (2) training PNN classifiers with wavelet sub-images with low frequency components. (3) combination of the trained PNN classifiers by fuzzy integral. Compared with four matrix subspace algorithms, the proposed method can obtain competitive performance. Such as, it can improve the accuracy of face recognition with less CPU time. The experimental results on JAFFE, YALE, ORL and FERET confirm that the proposed method outperform the four matrix subspace algorithms.

Keywords

Probabilistic neural networks ensemble learning face recognition fuzzy integral wavelet transform

1 Introduction

As a hot research topic, face recognition [1] have been widely investigated, and have been successfully applied in security detection, person identification, video surveillance, etc. For static face images, roughly speaking, the process of face recognition consists of two phases: feature extraction and classification or matching. In the framework of subspace face recognition, the feature extraction methods can be roughly categorized into two classes: linear subspace methods and nonlinear subspace methods. Eigenface [2] is the pioneering work in linear subspace methods. This method used PCA (Principal Component Analysis) [3] to efficiently represent face images. Since the Turk’s seminal work, PCA has become one of the most successful approaches in face recognition. As pointed in [4], in Eigenface method, the 2D face image matrices must be previously transformed into 1D image vectors, which are usually high dimensional vectors. It is difficult and even impracticable to calculate the eigenvalues of the covariance matrix due to the small sample size problem [5]. In order to deal with this problem, Yang et al. [4] proposed an algorithm 2DPCA, which can directly construct the covariance matrix from the original image matrices. The size of the constructed covariance matrix in 2DPCA is much smaller than the one in PCA. But 2DPCA has two drawbacks [6]: (1) it needs more coefficients to represent the face image than PCA, in other words, 2DPCA needs much more storagespace than PCA. (2) 2DPCA only extracts the features in one direction of images rather than two directions. An improved algorithm named (2D) ²PCA which can overcome the two shortcomings mentioned above was developed in [6]. Along with this technique route, based on linear discriminant analysis (LDA) or fisher discriminant analysis (FDA), Belhumeur et al. [7] proposed Fisherfaces method, and Li et al. proposed 2DLDA also named 2DFDA [8]. Noushath et al. proposed (2D) ²LDA [9]. Based on these basic linear subspace methods, some other competitive algorithms have been proposed by different authors. For instance, by considering the statistical uncorrelation between the extracted factors, a discriminant subspace learning method constrained by locally statistical uncorrelation for face recognition was presented in [10]. By introducing the semi-random technique into subspace methods for face recognition, Zhu et al. [11] proposed a novel method, which can simultaneously address the small sample size problem and the sensitivity problem. A unified framework for linear subspace methods including PCA and LDA have been proposed by Wang andTang [12].

The nonlinear subspace methods mainly include kernel transform based approaches, the Fourier transform based approaches and the wavelets transform based approaches. The kernel transform based approaches combine kernel methods and linear subspace methods. Such as, Chu et al. [13] presented kernel discriminant transformation approach for face recognition. Kim et al. [14] applied kernel principal component analysis to face recognition. The basic ideas of Fourier and wavelet transform based approaches are same. The face images are first mapped into different subspace with Fourier transform or wavelet transform, and then some or single subspace images are selected for face recognition. The representative works include the Laplacianfaces method [15], waveletfaces method [16] and multiresolution analysis method [17] et al. A hybrid approach of linear subspace and nonlinear subspace has been proposed in [18], an excellent survey paper on face recognition in subspace can be found in [19].

For the purpose of improving the accuracy of face recognition, based on ensemble learning, some ensemble face recognition methods have been proposed. Owusu et al. [20] proposed a neural ensemble method which uses AdaBoost to integrate the neural networks for face recognition. A feature fusion method for face recognition has been proposed in [21]. Kwak and Pedrycz [22] use fuzzy integral and wavelet decomposition method for face recognition. In this method [22], all wavelet sub-images are used for integration. But we find by experimentally studies that the wavelet sub-images with bidirectional high frequency components have little contribution to improving the accuracy of face recognition. In order to deal with this problem, an improved algorithm named EPNN is presented in this paper. The proposed method consists of three steps, firstly, the face images are decomposed with incomplete wavelet packet transform, secondly, the PNN classifiers are trained with the selected wavelet sub-images, which include low frequency components, thirdly, the trained PNN classifiers are fused by fuzzy integral. Compared with four matrix subspace algorithms, the proposed method can obtain competitive performance.

The paper is organized as follows. The preliminaries used in this paper are given in section 2. The proposed methods are presented in section 3. Experimental results and analysis are presented in section 4.Section 5 concludes the paper.

2 Preliminaries

In this section, we briefly review the preliminaries used in this paper, including 2D wavelet transform and 2D wavelet packet transform, probabilistic neural network and fuzzy integral.

2.1 2D Wavelet Transform and 2D Wavelet Packet Transform

The 2D wavelet transform [23, 24] of image f (x, y) include two steps. Firstly, f (x, y) is 2-down-sampled and filtered with low-pass filter L and high-pass filter H in each row of f (x, y), two transformed images f_L (x, y) and f_H (x, y) are obtained. Secondly, f_L (x, y) and f_H (x, y) are 2-down-sampled and filtered again with low-pass filter L and high-pass filter H in each column of f_L (x, y) and f_H (x, y) respectively, four sub-images f_LL (x, y), f_LH (x, y), f_HL (x, y) and f_HH (x, y) are finally obtained, see Fig. 1. Where, l and h are the impulse response of filter L and H respectively. The process depicted in Fig. 1 is called one-level wavelet transform, If we repeat this process only on f_LL (x, y) which contain low frequency components in two directions, i.e. row and column, we obtain two level wavelet transform of image f (x, y). If we repeat this process on all four sub-images, we call this transform as 2D wavelet packet transform. Wavelet transform and wavelet packet transform are two type of multiresolution analysis approaches.

Let f (x, y) be the face image of size M × N, l (i) be the low-pass coefficients of a wavelet basis function, i = 0, 1, N_l - 1, where N_l is the support length of the low-pass filter L. h (j) be the high-pass coefficients of a wavelet basis function, j = 0, 1, N_h - 1, where N_h is the support length of the low-pass filter H. Then the wavelet transform of f (x, y) can be formulated as follows.

$\begin{matrix} f_{L} (x, y) = \frac{1}{N_{l}} \sum_{i = 0}^{N_{l} - 1} l (i) f ((2 x + i) \mod M, y) \\ f_{H} (x, y) = \frac{1}{N_{h}} \sum_{j = 0}^{N_{h} - 1} l (j) f ((2 x + j) \mod M, y) \\ f_{LL} (x, y) = \frac{1}{N_{l}} \sum_{i = 0}^{N_{l} - 1} l (i) f_{L} (x, (2 y + i) \mod N) \\ f_{LH} (x, y) = \frac{1}{N_{h}} \sum_{j = 0}^{N_{h} - 1} l (j) f_{L} (x, (2 y + j) \mod N) \\ f_{HL} (x, y) = \frac{1}{N_{l}} \sum_{i = 0}^{N_{l} - 1} l (i) f_{H} (x, (2 y + i) \mod N) \\ f_{HH} (x, y) = \frac{1}{N_{h}} \sum_{j = 0}^{N_{h} - 1} l (j) f_{H} (x, (2 y + j) \mod N) \end{matrix}$

2.2 Probabilistic Neural Network

The probabilistic neural network (PNN) [25] was originally developed by Specht in 1990. PNNs consist of three layers: input layer, pattern layer and category layer. Given a normalized training set D = {(x_j, y_j) |j = 1, 2, . . . , n}, where x_j ∈ R^d, $\sum_{i = 1}^{d} x_{ji}^{2} = 1$ , y ∈ {1, 2, . . . , k}, i.e. the samples are classified into k categories ω₁, ω₂, . . . ω_k, and let n_i be the number of samples in class ω_i (i = 1, 2, . . . , k). The PNN trained with D is diagramed in Fig. 2.

In Fig. 2, there are d units in input layer, n units in pattern layer and k units in category layer. The activation functions of input layer and output layer are all linear functions, the activation function of pattern layer is $g (x) = \exp (\frac{x - 1}{σ^{2}})$ , where σ is smoothing parameter. Each input unit is connected to each of the n pattern units; each pattern unit is, in turn, connected to one and only one of the k category units. The connections from the input to pattern units represent modifiable weights, which will be trained by normalizing the pattern x_j to unit length, i.e. w_jk = x_jk, (j = 1, 2, . . . , n ; k = 1, 2, . . . , d). Each link from a pattern unit to its associated category unit is of a single constant weight β_jk = 1, for other link, the weight β_jk = 0. The training process of PNN is actually the process of normalizing the patterns.

Given a normalized testing set T = {(x_t, y_t) |t = 1, 2, . . . , m}, for each x_t ∈ T, the corresponding output of unit j in pattern layer is $o_{jt} = \exp {\frac{w_{j}^{T} x_{t} - 1}{σ^{2}}} = \exp {\frac{\sum_{i = 1}^{d} w_{ji} x_{ti} - 1}{σ^{2}}}$

The output of unit s in category layer is $p_{st} = β_{s}^{T} o_{t} = \sum_{j = 1}^{n_{s}} o_{jt}$

Accordingly, the output of PNN is p_t = (p_it, . . . , p_kt), where p_st(s = 1, 2, . . . , k ; t = 1, 2, . . . , m) is the posterior probability of sample x_t belonging to category ω_s, which facilitate the combination of classifiers with fuzzy integral [26]. Let $s_{t} = arg max_{k} {p_{st}}$ , then sample x_t is classified into category ω_t.

The reasons why we select PNN as component classifiers for ensemble lie in the following two points: (1) PNN classifier is a suitable candidate for ensemble of wavelet sub-images for face recognition using fuzzy integral. Because that the outputs of PNN are posterior probabilities which facilitate the combination of classifiers with fuzzy integral. (2) The fuzzy integral is a good approach for modeling the interactions of different component classifiers, i.e., the PNN trained with different wavelet sub-images. Yet other existing approaches can not deal with this kind of interactions.

2.3 Fuzzy integral

In this section, we present the notations related to fuzzy integral [26]. Let D = {(x_i, y_i) |x_i ∈ R^d, y_i ∈ R^k, i = 1, 2, ⋯ , n} be a training set, Ω = {ω₁, ω₂, ⋯ , ω_k} be a set of class labels, L = {L₁, L₂, ⋯ , L_l} be a set of classifiers trained on different subset of D, L_i is called a component classifier. For ∀x ∈ R^d, L_i assigns a class label to x from Ω. As given by Kuncheva [27], we may define the classifier output to be a k-dimensional vector with support degree of the classes, i.e. $L_{i} (x) = (p_{i 1} (x), p_{i 2} (x), . . ., p_{ik} (x))$ (1) where p_ij (x) ∈ [0, 1] (1 ≤ i ≤ l ; 1 ≤ j ≤ k) denotes the support degree given by classifier L_i to the hypothesis that x comes from class ω_j. In this paper, p_ij (x) is an estimate of the posterior probability p (ω_j|x). In the following, we will give some related definitions.

Definition 1. Given L = {L₁, L₂, ⋯ , L_l}, Ω = {ω₁, ω₂, ⋯ , ω_k}, and arbitrary testing sample x. The following matrix is called decision matrix. $DP (x) = [\begin{matrix} p_{11} (x) & \dots & p_{1 j} (x) & \dots & p_{1 k} (x) \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ p_{i 1} (x) & \dots & p_{ij} (x) & \dots & p_{ik} (x) \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ p_{l 1} (x) & \dots & p_{lj} (x) & \dots & p_{lk} (x) \end{matrix}]$ (2) where the i^th row of the matrix is the output of classifier L_i, the j^th column of the matrix is the support degree from classifiers L₁, L₂, . . . , L_l for class ω_j.

Definition 2. Given L = {L₁, L₂, ⋯ , L_l}, let P (L) be the power set of L, the fuzzy measure on L is a set function: g : P (L) → [0, 1], such that (1)g (∅) =1, g (L) =1 (2) For ∀A, B ⊆ L, if A ⊂ B, then g (A) ≤ g (B).

Definition 3. For ∀A, B ⊆ L and A∩ B = ∅, g is called λ-fuzzy measure, if $g (A \cup B) = g (A) + g (B) + λ g (A) g (B)$ (3) where λ > -1 and λ ≠ 0. The value of λ can be determined with the following formula. $λ + 1 = \prod_{i = 1}^{l} (1 + λ g^{i})$ (4)

Definition 4. Given L = {L₁, L₂, ⋯ , L_l}, ∀L_i ∈ L, let gⁱ = g ({L_i}), gⁱ is called fuzzy density of classifier L_i.

Definition 5. Given L = {L₁, L₂, ⋯ , L_l}, g is the fuzzy measure on L, h : L → [0, 1] be a function defined on L, and without loss of generality, suppose that 1 ≥ h (L₁) ≥ h (L₂) ≥ ⋯ ≥ h (L_l) ≥0. The Sugeno fuzzy integral of function h with respect to g is defined as follows. $(S) \int h (\cdot) g (\cdot) = ⋁_{i = 1}^{l} (h (L_{i}) ⋀ g (A_{i}))$ (5) where, A_i = {L₁, L₂, ⋯ , L_i}.

3 The Proposed Algorithm

In this section, we firstly present the idea of this paper, and then present the proposed algorithm denoted by EPNN(Ensemble of Probabilistic Neural Network). The idea of the algorithm is simple. Specifically, the proposed algorithm includes three stages: (1) incomplete wavelet packet decomposition, (2) train PNN classifiers, (3) combination of the trained PNN classifiers with fuzzy integral. In our experimentally studies, we find that some wavelet sub-images have little contribution to improving the accuracy of face recognition, such as, in different level, the bidirectional high frequency images almost have no contribution to the recognition. Accordingly, in the first stage of the proposed algorithm, we apply incomplete wavelet packet decomposition to the face images. The component classifiers are trained with different multiresolution wavelet sub-images, which have more low frequency components. In this paper, we use probabilistic neural network as the component classifier. The framework that we select different multiresolution wavelet sub-images used for training component classifiers is depicted in Fig. 3.

After the component classifiers are trained, we integrate the trained component classifiers with fuzzy integral. Because that the component classifiers are trained with different multiresolution wavelet sub-images by probabilistic neural network, it can be guaranteed to some extent that the trained component classifiers have some diversities. Moreover, although the component classifiers are trained with different multiresolution wavelet sub-images, which are transformed from the same original image, it is unavoidable that there are interactions among the component classifiers. Because the fuzzy integral is a good approach for modeling the interactions, this is the reason why we choose fuzzy integral as the ensemble tool to combine the component classifiers. In the combination of classifiers, for different component classifiers, the determination of the fuzzy densities plays a crucial role. In this paper, we use an adaptive method to determine the fuzzy density of the different component classifiers [28, 29], the adaptive method can give full consideration to the training behavior and the testing behavior of the component classifier. Specifically, the initial fuzzy densities $g_{j}^{i}$ are firstly determined by the confusion matrix CMⁱ of the component classifiers, in which the training performance of the component classifiers are embedded, and then the differentiate capability α_i and support capability β_i [28] of the component classifiers are determined by the decision matrix, in which the testing performance of the component classifiers are embedded. Finally, the fuzzy densities $G_{j}^{i}$ of the component classifiers are determined by α_i and β_i as follows [29]. $G_{j}^{i} = \frac{α_{i} + β_{i}}{2} g_{j}^{i}$ (6) The initial fuzzy density of component classifier L_i with respect class ω_j can be determined as $g_{j}^{i} = (\frac{1}{k - 1} \sum_{\binom{s = 1}{s \neq j}}^{k} (1 - r_{kj}^{i})) r_{jj}^{i}$ (7) where $r_{jj}^{i} = \frac{n_{jj}^{i}}{\sum_{t = 1}^{k} n_{jt}^{i}}$ (8) and $r_{kj}^{i} = \frac{n_{sj}^{i}}{\sum_{t = 1}^{k} n_{st}^{i}}$ (9) The $r_{jj}^{i}$ stands for the ratio that samples in class ω_j are correctly classified by classifier L_i. The $r_{kj}^{i}$ stands for the ratio that samples in class ω_k are incorrectly classified as class ω_j by classifier L_i, where k ≠ j. while, $n_{st}^{i} (1 \leq i \leq l; 1 \leq s, t \leq k)$ is the elements of the confusion matrix defined as ${CM}^{i} = [\begin{matrix} n_{11}^{i} & n_{12}^{i} & \dots & n_{1 k}^{i} \\ n_{21}^{i} & n_{22}^{i} & \dots & n_{2 k}^{i} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ n_{k 1}^{i} & n_{k 2}^{i} & \dots & n_{kk}^{i} \end{matrix}]$ (10) For component classifier L_i (1 ≤ i ≤ l), its differentiate capability α_i are defined as $α_{i} = 1 + \frac{1}{2 \log_{2} k} \sum_{j = 1}^{k} p_{ij}^{'} \log_{2} p_{ij}^{'}$ (11) where, $p_{ij}^{'} = \frac{p_{ij}}{\sum_{j = 1}^{k} p_{ij}} (1 \leq i \leq l; 1 \leq j \leq k)$ , if $p_{ij}^{'} = 0$ , then set $p_{ij}^{'} \log_{2} p_{ij}^{'} = 0$ .

Obviously, the smaller p_ij is, the more likely that x will be misclassified by L_i; the bigger the p_ij is, the more likely that x will be correctly classified by L_i.

For component classifier L_i (1 ≤ i ≤ l), its support capability β_i are defined as $β_{i} = 1 - \frac{1}{2} {TD}_{i}$ (12) where ${TD}_{i} = \frac{1}{2 (l - 1)} \sum_{i^{'} = 1}^{l} d_{{ii}^{'}}^{2}$ (13) where $d_{{ii}^{'}} = {(\sum_{j = 1}^{k} {(p_{ij} - p_{i^{'} j})}^{2})}^{\frac{1}{2}}$ (14) Obviously, the smaller β is, the bigger difference between L_i and other classifiers will be. Accordingly, the less important L_i is among all in ensemble classifying. On the contrary, the bigger β is, the smaller difference between L_i and other classifiers will be. Consequently, the more important L_i is among all in ensemble classifying.

The details of the proposed algorithm EPNN is described as Algorithm 1.

The proposed algorithm EPNN consists of three phases: (1) incomplete wavelet packet decomposition, i.e. the step 1 to step 2 in the Algorithm 1; (2) train PNN classifiers, i.e. the step 3 to step 5 in the Algorithm 1; (3) combination of the trained PNN classifiers with fuzzy integral, i.e. the step 6 to step 26 in the Algorithm 1. Fig. 4 shows an illustrative example to interpret the process of the proposed algorithm. In Fig. 4, the inputs are 4 face images represented by 4 rectangles. There are 6 component classifiers trained with 6 wavelet sub-images. The sub-images marked with symbol “×” are discarded, i.e., these sub-images are not used to train PNN.

Algorithm 1 EPNN Algorithm

Input:

FDB = {f₁ (x, y) , ⋯ , f_n (x, y)}, face database.

Output:

Recognition rules.

1:For(i ← 1 ton)

2: Do m-level incomplete wavelet packet transform of f_i (x, y);

3:Select the wavelet sub-images as the way depicted in Fig. 3, let l be the number of selected wavelet sub-images;

4:Train PNN classifiers L₁, L₂, ⋯ , L_l with the selected wavelet subspaces images;

5:Return L = {L₁, L₂, ⋯ , L_l}.

6:For(i ← 1 tol)

7: Compute its confusion matrix CMⁱ;

8:For(each testing image f)

9: Compute its decision matrix DP (f);

10:For(i ← 1 tol)

11: For(j ← 1 tok)

12: Let h_j (L_i) = p_ij (x);

13:For(j ← 1 tok)

14: Compute λ_j with (4)

15:For(i ← 1 tol)

16: For(j ← 1 tok)

17: Compute g_j (A_i) with (3)

18:For(i ← 1 tol)

19: Compute α_i with (11);

20: Compute β_i with (12);

21:For(i ← 1 tol)

22: For(j ← 1 tok)

23: Compute $G_{j}^{i}$ with (6);

24:For(j ← 1 tok)

25: Compute o_j (x) as $o_{j} (x) = max_{1 \leq i \leq l} {min {h_{j} (L_{i}), g_{j} (A_{i})}}$

26:Compute $j^{*} = \underset{1 \leq j \leq k}{argmax} {o_{j} (x)}$ .

4 Experimental Results and Analysis

The proposed algorithm can directly deal with the matrix data, two experiments are conducted to verify the effectiveness of the proposed algorithm EPNN. The first one is to analyze the influence of parameter σ on the recognition accuracy, the second experiment is to compare EPNN with four matrix subspace algorithms on four face databases on two aspects: recognition accuracy and CPU time. The four face databases are JAFFE [30], YALE [31], ORL [32] and FERET [33], which include 213, 165, 400 and 1400 face images respectively, the size are 256 × 256, 320 × 243, 92 × 112, 80 × 80 respectively, the number of people are 10, 15, 40 and 200 respectively. The reason for choosing JAFFE, YALE and ORL is to conduct a fair comparison with four matrix subspace algorithms which are 2DPCA [4], 2DLDA [8], WT+2DPCA [22] and WT+2DLDA [22]. We select face database FERET because that it includes many classes, as many as 200 categories. The 10-fold cross-validation method is employed in our experiments, Our experiments run on a PC with 2.5GHz CPU and 4GB memory, the programming language is MATLAB 8.0, BIOR 3.1 is used as the wavelet basis function.

Experiment 1: the influence of the parameter σ on the recognition accuracy

The parameter σ in the activation function of PNN has great influence on the recognition accuracy. In this section, we attempt to explore the relation between the value of parameter σ and the recognition accuracy of EPNN. We change the parameter σ from 0.0 to 50 with step-size 5. For different values of parameter σ, we record the corresponding recognition accuracies of the EPNN, the changing curves are shown in Fig. 5. From the curves, we can see that for different face database, the appreciate value of σ is different, such as, for face database FERET, the suitable value of σ should be taken from interval [23, 50], while for other three face database YALE, ORL and JAFFE, the suitable value of σ should be taken from the interval [17, 20], [4, 12] and [27, 35] respectively. In the next experiment, for face database YALE, ORL, JAFFE and FERET, we set the value of σ to 18, 10, 28 and 25 respectively.

Experiment 2: comparison with with four matrix subspace algorithms

Inwe compare the proposed algorithm EPNN with the 2DPCA [4], 2DLDA [8] WT+2DPCA [22], and WT+2DLDA [22] in two aspects, including the recognition accuracy and CPU time. we randomly select 90 percent samples in each face database as training set, other 10 percent samples as testing set, For each face database, we run 10-fold cross-validation ten times, the experimental results are the average of the 10 outputs. The experimental results compared with 2DPCA [4] are listed in Table 1, The experimental results compared with 2DLDA [8] are listed in Table 2, The experimental results compared with WT+2DPCA [22] are listed in Table 3, The experimental results compared with WT+2DLDA [22] are listed in Table 4. It can be seen from the experimental results that the proposed algorithm outperforms the four matrix subspace algorithms both in recognition accuracy and CPU time. The high recognition of the proposed algorithm EPNN are due to the combination of basic classifiers trained with wavelet sub-images, while the fast learning speed of the proposed algorithm EPNN are due to the PNN. Regarding the experimental results, we also conduct a statistical analysis with paired T-test in confidence level 0.05 [34, 35]. For each data set and for each algorithm, we run 10-fold cross-validation 10 times and then obtain five 100-dimensional statistics X_i (i = 1, 2, ⋯ , 5) which are respectively corresponding to 2DPCA, 2DLDA, WT+2DPCA, WT+2DLDA and the proposed algorithm. Next the paired T-test is applied to the experimental results by computing the values of MATLAB functions ttest2 (X_i, X₅) (i = 1, 2, 3, 4). Due to the limitation of pages, we only list the results (i.e. p-values) of the statistical analysis on testing accuracy in Table 5.From the p-values listed in Table 5, we can further confirm that the proposed algorithm in testing accuracy outperforms the four matrix subspace algorithms.

5 Conclusions

This paper proposed a face recognition algorithm, which combine PNN classifiers with fuzzy integral. The proposed algorithm has the following merits: (1) it can make full use of the superiority of PNN and ensemble learning. (2) it can make full use of the superiority of the interactions between the basic classifiers used for combination. (3) it can improve the recognition with fast learning speed. These advantages of the proposed algorithm have been verified by the experiments conducted in this paper. Regarding the future works, the following three problems are worth studying. (1) For a given face database, how to find the optimal wavelet basis function by using the method of statistical analysis? (2) The problem of scalability, i.e., whether the algorithm can be extended to big data environment by deploying the computation of fuzzy integral to different cloud computing nodes? (3) In recent years, deep learning is a very hot research topic, and has been successfully applied to computer vision, natural language processing and speech recognition. Is it feasible to replace PNN with deep neural networks, such as, deep convolution neural network?

Footnotes

Acknowledgments

This research is supported by the national natural science foundation of China (61170040, 71371063), by the natural science foundation of Hebei Province (F2013201110, F2013201220), by the Key Scientific Research Foundation of Education Department of Hebei Province (ZD20131028) and by the Opening Fund of Zhejiang Provincial Top Key Discipline of Computer Science and Technology at Zhejiang Normal University, China.

References

S.Z.

and Jain

A.K.

, Handbook of Face Recognition, Springer Science Business Media, Inc, (2005).

Turk

and Pentland

, Eigen-faces for recognition, Journal of Cognitive Neuroscience3(1) (1991), 71–86.

Kirby

and Sirovich

, Application of the KL procedure for the characterization of human faces, IEEE Trans Pattern Analysis and Machine Intelligence12(1) (1990), 103–108.

Yang

and Zhang

, Two-dimensional PCA: A new approach to appearance-based face representation and recognition, IEEE Transactions Pattern Analysis and Machine Intelligence26(1) (2004), 131–137.

Fukunaga

, Introduction to Statistical Pattern Recognition, Academic Press, (1990).

Zhang

and Zhou

, (2D)²PCA: Two-directional two dimensional PCA for efficient face representation and recognition, Neurocomputing69(1-3) (2005), 224–231.

Belhumeur

P.N.

, Hespanha

J.P.

and Kriegman

D.J.

, Eigenfaces vs, Fisherfaces: Recognition using class specific linear projection, IEEE Transactions Pattern Analysis and Machine Intelligence19(7) (1997), 711–720.

and Yuan

, 2D-LDA: A novel statistical linear discriminant analysis for image matrix, Pattern Recognition Letter26(5) (2005), 527–532.

Noushath

, Kumar

G.H.

and Shivakumar

, (2D)²LDA: An efficient approach for face recognition, Pattern Recognition39(7), 1396–1400.

10.

Chen

, Zheng

W.S.

, Xu

X.H.

and Lai

J.H.

, Discriminant subspace learning constrained by locally statistical uncorrelation for face recognition, Neural Networks42 (2013), 28–43.

11.

Zhu

, Liu

and Chen

, Semi-random subspace method for face recognition, Image and Vision Computing27(9) (2009), 1358–1370.

12.

Wang

and Tang

, A unified framework for subspace face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence26(9), (2004), 1222–1228.

13.

Chu

W.S.

, Chen

J.C.

and Lien

J.J.

, Kernel discriminant transformation for image set-based face recognition, Pattern Recognition44(8) (2011), 1567–1580.

14.

Kim

K.I.

, Jung

and Kim

H.J.

, Face recognition using kernel principal component analysis, IEEE Signal Processing Letters9(2) (2002), 40–42.

15.

, Yan

, Hu

, et al., Face recognition using Laplacianfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence27(3) (2005), 328–340.

16.

Chien

J.T.

and Wu

C.C.

, Discriminant waveletfaces and nearest feature classifiers for face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence24(12) (2002), 1644–1649.

17.

Ekenel

H.K.

and Sankur

, Multiresolution face recognition, Image and Vision Computing23(5) (2005), 469–477.

18.

Franco

, Lumini

, Maio

, et al., An enhanced subspace method for face recognition, Pattern Recognition Letters27(1) (2006), 76–84.

19.

Shakhnarovich

and Moghaddam

, Face recognition in subspaces. In Handbook of Face Recognition. Springer, London, 2011, pp. 19–29.

20.

Owusu

, Zhan

and Mao

Q.R.

, A neural-AdaBoost based facial expression recognition system, Expert Systems with Applications41(7) (2014), 3383–3390.

21.

Pong

K.H.

and Lam

K.M.

, Multi-resolution feature fusion for face recognition, Pattern Recognition47(2) (2014), 556–567.

22.

Kwak

K.C.

and Pedrycz

, Face recognition using fuzzy integral and wavelet decomposition method, IEEE Transactions on Systems, Man, and Cybernetics-Part B34(4) (2004), 1666–1675.

23.

Mallat

S.G.

, A theory for multiresolution signal decomposition: The wavelet representation, Pattern Analysis and Machine Intelligence, IEEE Transactions on11(7) (1989), 674–693.

24.

Pajares

and Cruz

M.J.

, A wavelet-based image fusion tutorial, Pattern recognition37(9) (2004), 1855–1872.

25.

Specht

D.F.

, Probabilistic neural networks, Neural networks3(1) (1990), 109–118.

26.

Ralescu

and Adams

, The fuzzy integral, Journal of Mathematical Analysis and Applications75(2) (1980), 562–570.

27.

Kuncheva

L.I.

, Combining classifiers: Soft computing solutions, in: Pal

S.K

and Pal

, (Eds), Pattern Recognition: From Classical to Modern Approaches, World Scientific2001, pp. 427–451.

28.

Zhan

Y.Z.

, Zhang

and Mao

Q.R.

, Fusion recognition algorithm based on fuzzy density determination with classification capability and supportability, Pattern Recognition and Artificial Intelligence25(2) (2012), 346–351.

29.

Chibelushi

C.C.

, Deravi

and Mason

J.S.D.

, Adaptive classifier integration for robust pattern recognition, IEEE Transactions on Systems, Man, and Cybernetics-PART B: Cybernetics29(6) (1999), 902–907.

30.

Lyons

M.J.

, Budynek

and Akamatsu

, Automatic classification of single facial images, IEEE Transactions on Pattern Analysis and Machine Intelligence21(12) (1999), 1357–1362.

31.

Georghiades

A.S.

, Belhumeur

P.N.

and Kriegman

D.J.

, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Transactions Pattern Analysis and Machine Intelligence23(6) (2001), 643–660.

32.

Samaria

and Harter

, Parameterization of a stochastic model for human face identification, Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, 1994.

33.

Phillips

P.J.

, Wechsler

, Huang

and Rauss

, The FERET database and evaluation procedure for face recognition algorithms, Image Vision Computing16(5) (1998), 295–306.

34.

Janez

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research7(1) (2006), 1–30.

35.

Hastie

, Tibshirani

and Friedman

, The elements of statistical learning: Data mining, inference, and prediction. Second Edition, New York: Springer-Verlag, 2009.