Illumination invariant face recognition using contourlet transform and convolutional neural network

Abstract

The face is a dominant biometric for recognizing a person. However, face recognition becomes challenging when there are severe changes in lighting conditions, i.e., illumination variations, which have been shown to have a more severe effect on recognition performance than the inherent differences between individuals. Most of the existing methods for tackling the problem of illumination variation assume that illumination lies in the large-scale component of a facial image; as such, the large-scale component is discarded, and features are extracted from small-scale components. Recently, it has been shown that large-scale component is also important; in addition, small-scale component contains detrimental noise features. Keeping this in view, we introduce a method for illumination invariant face recognition that exploits large-scale and small-scale components by discarding the illumination artifacts and detrimental noise using ContourletDS. After discarding the unwanted components, local and global features are extracted using a convolutional neural network (CNN) model; we examined three widely employed CNN models: VGG-16, GoogLeNet, and ResNet152. To reduce the dimensions of local and global features and fuse them, we employ linear discriminant analysis (LDA). Finally, ridge regression is used for recognition. The method was evaluated on three benchmark datasets; it achieved accuracies of 99.7%, 100%, and 79.76% on Extended Yale B, AR, and M-PIE, respectively. The comparison reveals that it outperforms the state-of-the-art methods.

Keywords

Face recognition deep learning convolutional neural network (CNN)

1 Introduction

Face recognition is a key biometric technology that has attracted the attention of a large research community for decades. The performance of a face recognition system is influenced by many factors [1]; one critical factor among them is the variation in lighting conditions, i.e., illumination variations. As illustrated in Fig. 1, the appearance of facial images for the same person is greatly altered when illumination varies. The variations caused by changing illumination effects have been shown to have a more severe effect on face recognition algorithms than the variations due to facial characteristics of individuals [2]. A great deal of methods has been proposed based on various approaches [3]. Though the existing methods deal with different types of variations, severe illumination variation is still a challenging problem. Many efforts based on several approaches have been made to solve this problem. The methods based on traditional image processing techniques such as histogram equalization (HE) [4, 5] and log transform [6] can account for mild light variations only.

Fig. 1

Effect of the varying illumination on facial images of the same person.

The most effective approach to tackle severe lighting conditions is based on Lambertian reflectance model I (x, y) = ρ (x, y) S (x, y), where ρ (x, y) is small scale component (surface albedo), and S (x, y) is large scale component (reflectance component). In this approach, illumination invariant representation is extracted from face images. GradientFaces [7] uses gradient-domain, and WeberFace [8] employs Weber’s law to extract illumination invariant features. Kim et al. [9] introduced SVDFace and Hu et al. [10] proposed HFSVDFace based on SVD decomposition of face images to compensate for illumination variations. Hu et al. [11] and Lia et al. [12] extended the idea of WeberFace to the logarithmic domain to propose illumination invariant descriptors. All these methods offer good tolerance to illumination variations. However, these methods assume that large-scale component contains light intensity and exploit small-scale component; however, this component is sensitive to detrimental noise [13].

Keeping in view the strengths and limitations of the existing methods for illumination robust face recognition under severe variations of lighting conditions, we felt motivated to propose a new method that exploits discriminative information embedded in small-scale and large-scale components. For this purpose, we employed an improvement of contourlet transform (ControuletDS) [14] in the logarithmic domain, which allows decomposing a face image into a fine-grained hierarchy of multi-scale and multi-directional sub-bands and yields perfect reconstruction. It allows to retain the important information from small-scale and large-scale components and remove the detrimental noise and illumination effects. We extract discriminative features using pre-trained CNN models because of the limited amount of labeled data, and finally, we employ ridge regression for identification. The main contributions of this work are as follows:

Proposed an illumination-invariant face recognition method that normalizes the illumination effect using ContourletDS and extracts discriminative features from face images using deep learning. The method considers the important information embedded in small-scale and large-scale components, unlike most existing methods. It fuses local and global features using a class discriminative approach. It does not need to solve optimization problems for varying lighting conditions, unlike DMI [15]. The proposed method gives promising results under severe illumination variations.

Performed detailed experiments and evaluated the state-of-the-art best CNN models. It has been observed that ResNet152 is the best model for extracting features for illumination invariant face recognition.

Evaluated the method thoroughly on three benchmark datasets, and its performance has been compared with the state-of-the-art methods. Our proposed deep learning approach outperforms the other methods.

The rest of this paper is organized as follows: In Section 2, we discuss the literature review. In Section 3, we describe the proposed method for illumination invariant face recognition. In Section 4, we discuss the dataset used for validation and evaluation protocol. In Section 5, we present the detailed experiments and show the results. Then, we discuss the obtained results and compare the performance of the proposed method with the recent works. In Section 6, we conclude our work.

2 Literature review

In this section, we discuss the existing state-of-the-art methods for Illumination Invariant Face Recognition. One such approach is the use of texture descriptors which have also been widely used for face recognition. Tan and Triggs [16] employed local ternary patterns (LTP), Lei et al. [17] proposed discriminant face descriptor, Ren et al. [18] used Gabor filters, Hu et al. [19] introduced diagonal symmetric pattern (DSP) to tackle the problem of light variations. These methods also exploit small-scale components and perform well when the lighting condition is not severe. Their performance degrades when the illumination variations become severe.

It is not only a small-scale component but also a large-scale component that contains important information for illumination invariant face recognition [15, 20]. Multilayer decomposition-based method – multi-layer illumination-robust feature extraction (DMI) – was proposed by Yu et al. [15] to exploit both small-scale and large-scale components for dealing with difficult light conditions. In this method, the face image is decomposed into multiple layers using the total variation model (TV-L₁), and then the illumination-robust representation is recovered as a weighted linear sum of the layers. The optimization problem is solved to optimize the weights. Though this method gives very good results, the procedure of decomposition and reconstruction can lose some important information, and further improvement is needed.

Kumar et al. [21] presented an assessment of the light source direction from an image. The proposed technique detects the forged images by using the inconsistencies in the light source direction. Their proposed technique works on images that have homogenous illuminated surfaces. CASIA1 dataset was used for evaluation. Kumar et al. [22] presented an image forgery detection method based on using multiple light sources for synthetic images. They estimated the source light vector using pixel patches from the image regions. They used images where more than one light sources were present and were able to detect the forged images by using the angle α, which was obtained from the source of light and surface normal.

Aggarwal et al. [23] proposed a convolutional neural network (CNN) model based on deep learning for the image surface texture analysis and their classification. They designed two sub-models and used customized parameters for the classification of surface textures. Rani et al. [24] presented an image forgery detection system that detected the copy-move and image splicing-based forgeries using a pixel-based forgery detection approach. The authors proposed an enhanced version of the speed-up robust feature (SURF) and template matching for the detection of forged images. CASIA is used for the evaluation of the approach, and overall accuracy of 97.5% is obtained. Aggarwal et al. [25] presented an analysis by using three techniques; HMM’s, ANN, and PCA using eigenfaces. The authors present their findings using these methodologies on the ORL and Yale Face datasets and find that ANN performs the best among other techniques.

Kumar et al. [26] also presented a lighting estimation technique for image forensics where they used a physics-based approach. Their proposed approach is robust as it works for all types of images. Manipulated objects are identified by the lighting parameters, and the incidence angle is returned. Aggarwal et al. [27] discussed the use of segmentation and ROI for the authentication of license plates of cars. They presented a three-phase system for the recognition of the license plate numbers and reported a detection rate of 93.34% and a false positive rate of 6.65%.

Nowadays, it has been shown that deep learning extracts the best features for face recognition [28 –31]. However, these features do not exhibit satisfactory performance when the illumination variations are severe [32]. The development of an illumination robust face recognition method needs a huge volume of labeled faces images captured under a variety of severe lighting conditions, but it is not practical.

The overview of the state-of-the-art methods given above indicates that most of the existing methods do not give good results, especially in poor lighting conditions [32]. Thus there is a need for the development of a robust model for illumination robust face recognition.

3 Proposed method

This section introduces the proposed method for illumination invariant face recognition. First, the overview and motivation for the proposed solution are presented. Then, we give the detail of how features are extracted and how recognition is performed.

It is commonly assumed that illumination variations are embedded in large-scale components (i.e., low-frequency components) of an image [33]. In view of this, most of the techniques [8 –12] for illumination invariant face recognition decompose a face image into large-scale and small-scale (high-frequency) components using different techniques. The large-scale component is discarded, and features are extracted from the small-scale component. Recently, it has been shown that large-scale component also contains discriminative information and cannot be discarded [15]. On the other hand, small-scale component contains detrimental features [13], such as noise which must be removed. Motivated by this observation, we decompose a face image into a hierarchy of multi-scale and multi-directional sub-bands, discard the highest and the lowest scales to eliminate the effects of illumination and noise and for illumination invariant face recognition, extract discriminative features from the image reconstructed from the remaining large-scale and small-scale components.

Based on this idea, the designs of the training and testing phases of the proposed method are shown in Figs. 2 and 3; the pseudo-codes of the training and testing phases are given in Algorithm-1 and Algorithm-2. First, an image is transformed to a log domain using log transform. Then it is processed using ContourletDS [14], an edge-preserving version of contourlet transform, to remove the illimitation effect and noise and retain the large-scale and small components. After that, we extract discriminative features using a convolutional neural network (CNN) and project them on a low dimensional space using linear discriminant analysis. Finally, the classification is performed using ridge regression. The detail and motivation of each step are given in the following subsections.

Fig. 2

Face Recognition System (training phase).

Fig. 3

Face Recognition System (testing phase).

Algorithm 1: Training phase

Input: (1) Training Data (Face Images): I_1,I_2,…, I_n

(2) labels Y = [y₁, y₂, …, y_n] (3) Decomposition

levels nd (4) number of classes nC (5) Regularization

parameter λ

Output: LDA models [M_PCA, M_LDA] and RR model W

Processing:

1. Fori = 1, 2, . . . , n

2. lI_i = log(I_i + 1)

3. CI_i = ContourletSD (lI_i, nd)

4. DCI_i = Decimate (CI_i)

5. RI_i = ContourletSDRec (DCI_i)

6. EndFor

7. Fori = 1, 2, . . . , n

[x_{i}^{G}, x_{i}^{L}] = CNN ({RI}_{i})

9. X^G (i)=

x_{i}^{G}

10. X^L (i)=

x_{i}^{L}

11. EndFor

12.

\overset{=}{X} Cocatenate (X^{G}, X^{L})

13.

M_{PCA} = ComputePCAMatrix (\overset{,}{X} k)

14.

X^{PCA} = ProjectUsingPCA (\overset{,}{X} M_{PCA})

15. M_LDA = ComputeLDAMatrix (X^PCA, Y, nC)

16. X = ProjectUsingLDA (X^PCA, M_LDA)

17. W = (XX^T + λI)^-1XY^T

Algorithm 2: Testing phase

Input: (1) Probe Image I, and the label matrix Y

LDA models [M_PCA, M_LDA] and RR model W

Output: Predicted label

\hat{y}

of I

Processing:

1. lI = log(I + 1)

2. CI = ContourletSD (lI, nd)

3. DCI = Decimate (CI)

4. RI = ContourletSDRec (DCI)

5. [x^G, x^l] = CNN (RI)

\overset{=}{x} Cocatenate (x^{G}, x^{L})

x^{PCA} = ProjectUsingPCA (\overset{,}{x} M_{PCA})

8. x = ProjectUsingLDA (x^PCA, M_LDA)

9. d = Y^TW^Tx

10.

\hat{y} = max_{i} {d_{i}}_{i = 1}^{n}

3.1 Logarithm transform

The logarithmic domain is more effective in dealing with illumination variations [11, 12]. Because of this, the first step is to transform the image into a logarithmic domain by applying log transform (step 2 in algorithm 1 and step 1 in algorithm 2) shown in Eq 1, which turns the multiplication in the reflectance model into addition, shown in Eq 2 [20]: $I (x, y) = R (x, y) L (x, y)$ (1) $logI (x, y) = logR (x, y) + logL (x, y) = u + v .$ (2)

Log transform preserves the structures, which means that u has the same characteristics of reflectance R, and v has the same characteristics as luminance L. Another advantage of using log transform is that it reduces the effect of illumination [20].

3.2 Decomposition using ContourletDS

To keep the large-scale and small-scale components and discard the illumination effect and noise, the image must be decomposed into multi-scale and multi-directional sub-bands so that the illumination effect (the highest scale sub-band, i.e., the lowest frequency sub-band) and noise (the lowest scale sub-band, i.e., the highest frequency sub-band) can be eliminated. There are many possible options to accomplish this task. However, we employ ContourletDS [14] (step 3 in algorithms 1 and step 2 in algorithm 2), a variant of contourlet transform CT), due to the reasons that (i) it decomposes an image into a fine-grained hierarchy of multi-dimensional and multi-scale sub-bands, which allow to remove the illumination effect and noise without losing the discrimination information embedded in large-scale and small components, (ii) it preserves edges by employing basis images in the frequency domain and (iii) significantly outperforms CT in image denoising [34]. As edges play a key role in discrimination and noise deteriorates the discrimination, ContourletDS is a suitable choice for decomposing a face image. It allows to keep edges and other discriminative information and remove detrimental features such as noise.

Though CT has already been used for illumination invariant face recognition [35, 36], it has been employed for a different purpose, i.e., for feature extraction. On the other hand, we do not use CT, but ContourletDS, and our purpose is not to extract features using ContourletDS.

3.3 Decimation and reconstruction

Using contourletDS, an image is decomposed into a fine-grained hierarchy of nd decomposition levels, lower levels represent small-scale components, and higher levels encode large-scale components. The lowest-frequency sub-band is at the highest level among the higher levels and embeds illumination variations. To remove the illumination component, the lowest frequency sub-band is eliminated by setting its coefficients to zero. On the other hand, the highest frequency sub-bands are at the lowest scale among lower levels and mainly encode noise. To get rid of detrimental noisy features, the highest frequency sub-bands are eliminated by setting their coefficients to zero. Step 4 in algorithm 1 and step 3 in algorithm 2 accomplish this task. After this, the image is reconstructed, which now contains the discriminative information of large- and small-scale components and is devoid of detrimental illumination and noisy features. It is obtained with step 5 in algorithm 1 and step 4 of algorithm 2. In the case of RGB, images, decomposition and reconstruction are applied on each of the R, G, andB planes.

3.4 CNN model

The next important step is to extract discriminative features from the reconstructed image. Motivated by the outstanding performance of deep learning and in particular, convolutional neural networks (CNN) for various computer vision problems, we employ CNN for feature extraction. It has been shown that CNN does not exhibit satisfactory performance when the illumination variations are severe [32] because for developing an illumination invariant face recognition system, the available data is limited, and it is not practical to acquire training data under all possible illumination variation. However, it can be employed as a feature extractor after removing the illumination effects from an image; we use a pre-trained CNN model as a featureextractor.

After preprocessing the image using ContourletDS, it is passed to a CNN model to extract features (step 8 in algorithm 1 and step 5 in algorithm 2). Different CNN models exist in the literature. We examine three widely used CNN pre-trained models, which are VGG-16 [37] (winner of ILSVRC’s localization task in 2014), GoogLeNet [38] (winner of the classification task in ILSVRC 2014), and ResNet-152 [39] (winner of ILSVRC 2015). We extract local features (X^L) from the last convolution (CONV) layers, which preserve the spatial arrangement of features and global features (X^G) from the fully connected (FC) layers, which encode the global information of the input image. We employed pre-trained CNN models available at [40]; these models are pre-trained on challenging ImageNet database.

3.5 Fusion using LDA model

Two issues must be addressed to employ the extracted global (X^G) and local (X^L) features (i) their dimensions are very high, and (ii) the features must be fused. We address these issues using linear discriminant analysis (LDA); it enhances the discriminability of features by projecting the high dimensional features to lower dimensional space by maximizing the inter-class separation and minimizing the intra-class variation [41]. Two approaches can be used to achieve this goal. The first approach is to apply LDA separately on global (X^G) and local (X^L) features, reduce their dimensions and fuse them. The second is to concatenate global and local features [X^G, X^L], then apply LDA to reduce the dimension (steps 12–16 in algorithm 1 and steps 6–8 in algorithm 2). We found that the second approach is more effective; see the results. In each case, the transformation matrix M_LDA of LDA is learned using training data. The dimensions of features X^GandX^L are very large. LDA faces the problem of singular within-class scatter matrices when the dimension of the feature space is excessively large as compared to the number of training images. To overcome this problem, first, we apply principal component analysis (PCA) to reduce the dimension of features [42] and the number of principal components (PCs) that preserve the maximum energy is selected for projection.

LDA has already been used to extract class-discriminative features from face images for face recognition [43]. However, we use it to fuse and project deep features.

3.6 Classification using ridge regression

The final step is to determine the identity of a subject using the extracted features; we employ ridge regression (RR) for this purpose. RR is very simple, fast, and has been shown to give good performance [44, 45]. Let X ∈ R^d×n, the matrix of n training d-dimensional vectors extracted from the training face images and Y ∈ { 0, 1 }^c×n, the matrix of the corresponding labels in on-hot-encoding i.e., in the ith column of Y, Y_ij = 1 if the ith column of X belongs to the jth subject and 0 otherwise. Ridge regression uses a matrix W ∈ R^d×c to model the linear dependencies between X and Y, and calculates it by solving the optimization problem with the following objective function shown in Eq 3: $C (W) = \sum_{i = 1}^{n} ∥ y_{i} - W^{T} x_{i} ∥^{2} + λ ∥ W ∥^{2}$ (3)

where λ is the regularization parameter that is used to balance the bias and variance of the solution. The solution to this problem is shown in Eq 4: $W = {({XX}^{T} + λ I)}^{- 1} {XY}^{T} .$ (4)

In order to learn the classifier parameters W for all c classes, we only need to invert a d×d matrix once. The matrix of parameters is learned using training data, and the regularization parameter ρ is calculated using cross-validation. Step 17 of algorithm 1 computes W using Eq. 4. After learning the model, the label of an unknown face image x (i.e., the d-dimensional feature vector of the image) is predicted as follows in Eq 5: $\hat{y} = max_{i} {d_{i}}_{i = 1}^{n}, d = Y^{T} W^{T} x,$ (5)

$\hat{y}$ is the label of the ith class for which d_i has the maximum value. Step 10 of algorithm 2 calculates the predicted label using Eq. 5.

4 Evaluation protocol

In this section, first, we discuss the databases that were used for evaluation and then we present the detail of how the performance of the method is evaluated and which measures are employed.

4.1 Datasets

The experiments were conducted on three benchmark face databases: Extended Yale B (EYB) [46], AR [47], and M-PIE [48]. The characteristics of each database are shown in Table 1. The Extended Yale B database is divided into 5 subsets based on the lighting conditions. Among these subsets, subset 1 is the set with the slight lighting variation effects, while subset 5 contains the images with large variations. Figure 4 shows some sample images from the three databases.

Table 1
AR, Extended Yale B, and M-PIE databases characteristics

Database Subjects Images illumination Conditions

Extended Yale B 38 2414 64

AR 100 800 8

M-PIE 250 2294 15

Database	Subjects	Images	illumination Conditions
Extended Yale B	38	2414	64
AR	100	800	8
M-PIE	250	2294	15

Fig. 4

Sample images from: a) Extended Yale B b) AR database c) M-PIE database.

4.2 Evaluation method and performance measures

To evaluate the proposed method, we used the evaluation protocols of the datasets. The recognition performance of the method is measured in terms of classification accuracy. It is computed as the fraction of correctly classified samples [49]. Given the test set X ={ (x₁, y₁) , …, (x_N, y_N) }, that contains N pairs of test samples where x_i represents the face image and y_i is its label, the classification accuracy is computed in Eq 6: $Accuracy = \frac{1}{N} \sum_{i = 1}^{N} x . {\begin{matrix} x = 1, y_{i} = = {\hat{y}}_{i} x > 0 \\ x = 0, otherwise y > 0 \end{matrix}$ (6)

where y_i and ${\hat{y}}_{i}$ are the true and predicated labels of the i^th sample x_i in X.

5 Results and discussion

In this section, we present and discuss the results of the experiments conducted on the three datasets to evaluate the proposed method.

5.1 Experimental results on Extended Yale B database

For experiments on the Extended Yale B database (EYB), subset1 is used as a gallery (i.e., for training), and the other subsets are used for testing. The subsets 1–5 contain 263, 454, 521, 456, and 714 images, respectively, of 38 subjects. On this database, four main experiments were conducted. Experiment 1 is performed to test the effect of CT decomposition levels. Experiment 2 is performed to test the effect of the CNN model. Experiment 3 is performed to test the effect of the number of principal components of PCA and the fusion approach of LDA. Finally, experiment 4 is performed to test the effect of the regularization parameter λ of the ridge regression.

5.1.1 Experiment 1

In order to demonstrate the effect of CT decomposition levels, the other parameters are fixed. The POOL5 (#31 and FC-6 (#34) layers of VGG-16 are used to extract local and global features (X^G, X^L), respectively. The second approach for fusing features using LDA is applied as mentioned in the system design, and the number of PCs is fixed at 100. The regularization parameter λ of the ridge regression is set equal to 20.

The results of this experiment on EYB datasets are shown in Fig. 5. It is noted that using larger number of decomposition levels achieves better accuracy. The maximum accuracy on each subset is achieved by using 5 levels of decomposition.

Fig. 5

Effect of decomposition levels on EYB database.

5.1.2 Experiment 2

This experiment is performed to select the most suitable CNN model. We considered three widely adopted CNN models: VGG-16, GoogLeNet, and ResNet-152. In each model, different combinations of layers are used to extract global and local features. In this experiment, the number of CT decomposition levels is fixed at 5, the number of PCs is fixed at 100 and λ = 20. The fusion of LDA is done using the second approach.

Results of this experiment on the EYB database are shown in Table 2. From the table, we see that the highest accuracy for each subset is achieved using the ResNet-152 model for feature extraction. The FC layer fc1000 is used to extract global features, and the CONV layer res3b7 is used to extract local features. It is important to note that low-level layers (24, Icp3_out, res3b7) of all models give better recognition accuracies than high-level layers (31, icp9_out, res5b). The reason is that the low-level layers encode the low-level features which are not specific to any particular types of images.

Table 2
Effect of pre-trained CNN models on EYB database

Layer for local features Layer for global features Accuracy

Subset2 Subset3 Subset4 Subset5

VGG-16 31 34 100 91.93 78.5 71.56

24 34 100 98.27 94.51 90.61

GoogLeNet icp9_out cls3_fc 100 90.21 75.65 67.78

Icp3_out cls3_fc 100 99.23 94.29 90.89

ResNet-152 res5b fc1000 100 94.43 82.67 79.69

res3b7 fc1000 100 99.8 99.78 98.73

	Layer for local features	Layer for global features	Accuracy
VGG-16	31	34	100	91.93	78.5	71.56
	24	34	100	98.27	94.51	90.61
GoogLeNet	icp9_out	cls3_fc	100	90.21	75.65	67.78
	Icp3_out	cls3_fc	100	99.23	94.29	90.89
ResNet-152	res5b	fc1000	100	94.43	82.67	79.69
	res3b7	fc1000	100	99.8	99.78	98.73

5.1.3 Experiment 3

In this experiment, the effects of two factors are explored: (i) the approach for the fusion of the local and global features using LDA, (ii) the choice of the number of PCs in each fusion approach. In this experiment, we used the best number of decomposition levels, which is 5 levels. The CNN model used in this experiment is ResNet-152 with the layers fc1000 and res3b7. Also, the λ parameter is set equal to 20.

The setup and results of this experiment on EYB are shown in Table 3. In this experiment, the best accuracy for subset2 has reached 100% using all the tested #PCs. For subset3, the maximum reached accuracy is 99.8%, which is achieved using 100, 150, and 200 #PCs for both approaches. Subset4 also reached 100% accuracy using 100, 150 #PCs for the first approach and 150, 200 #PCs for the second approach. On subset5 the maximum accuracy which is 99.01% is achieved only using the second approach with 150 PCs. From that, we can see that the best choice is the second approach with 150 PCs.

Table 3
Effect of features fusion approach and the number of PCs on EYB

First approach Second approach

#PCs 50 100 150 200 50 100 150 200

Subset2 100 100 100 100 100 100 100 100

Subset3 99.42 99.8 99.8 99.8 99.61 99.8 99.8 99.8

Subset4 98.46 100 100 99.78 97.8 99.78 100 100

Subset5 97.19 98.45 98.31 98.45 97.75 98.73 99.01 98.73

	First approach	Second approach
Subset2	100	100	100	100	100	100	100	100
Subset3	99.42	99.8	99.8	99.8	99.61	99.8	99.8	99.8
Subset4	98.46	100	100	99.78	97.8	99.78	100	100
Subset5	97.19	98.45	98.31	98.45	97.75	98.73	99.01	98.73

5.1.4 Experiment 4

This experiment is performed to find out the best value of the parameter λ. It is performed using the best choice of decomposition levels, which is 5 on EYB, layers fc1000, and res3b7 of ResNet-152 for feature extraction and 150 PCs. The results are shown in Table 4, which indicate that the best value for λ is 20 because it achieves the best accuracy for all the subsets of the EYB database.

Table 4
Effect of p parameter on EYB

λ 1 20 50 100 120

Accuracy subset2 100 100 100 100 100

subset3 99.8 99.8 99.8 99.8 99.8

subset4 99.78 100 100 100 99.56

subset5 98.87 99.01 98.73 98.31 98.31

λ	1	20	50	100	120
Accuracy	subset2	100	100	100	100	100
	subset3	99.8	99.8	99.8	99.8	99.8
	subset4	99.78	100	100	100	99.56
	subset5	98.87	99.01	98.73	98.31	98.31

From the results of the experiments performed on EYB, the best accuracies achieved for subset2, subset3, subset4, and subset5 are 100%, 99.8%, 100%, and 99.01%, respectively.

The accuracies are achieved using five decomposition levels of CT, ResNet-152 for feature extraction, the second approach to fuse the features employing LDA with 150 PCs, and λ equal to 20. In ResNet-152, FC layer fc1000 extracts global features and CONV layer res3b7 extracts local features. The overall accuracy of the method on EYB is 99.07%. The best accuracies are achieved due to three reasons: (i) the method extracts the discriminative information embedded in small scale and large scale components, (ii) the hierarchical features are extracted using deep learning, (iii) deep local and global features are fused using LDA, a class discriminative approach.

5.2 Experimental results of AR database

For each subject of the AR database, 3 images are used for training, and 5 images are used for testing. Three experiments were conducted. In Experiment 1, the effect of decomposition levels on the AR database is examined. The second experiment is conducted to demonstrate the impact of a CNN model and its layers for feature extraction. The focus of the third experiment is to test the effect of the LDA fusion method and the number of PCs, along with the effect of the λ parameter of ridge regression.

5.2.1 Experiment 1

This experiment is set up the same way as experiment 1 on EYB. Different levels of decomposition are tested. The CNN model used in this experiment is VGG-16 (layers 34 and 31). The #PCs is fixed at 100, and λ = 20. Figure 6 presents the results of Experiment 1 on the AR database. It shows that decomposing images using 5 levels achieves the best accuracy.

Fig. 6

Effect of decomposition levels on AR database.

5.2.2 Experiment 2

In this experiment, we test the effect of VGG-16, GoogLeNet, and ResNet-152 models. To do that, decomposition with 5 levels is used. The fusion approach used is the second approach for all the tested models. The #PCs is fixed at 100 and λ = 20. The results of this experiment are shown in Table 5.

Table 5
Effect of CNN pre-trained models on AR database

Local Global Accuracy

VGG-16 31 34 83.4

24 34 92.4

GoogLeNet icp9_out cls3_fc 83.6

Icp3_out cls3_fc 92.6

ResNet-152 res5b fc1000 87.6

res3b7 fc1000 96

	Local	Global	Accuracy
VGG-16	31	34	83.4
	24	34	92.4
GoogLeNet	icp9_out	cls3_fc	83.6
	Icp3_out	cls3_fc	92.6
ResNet-152	res5b	fc1000	87.6
	res3b7	fc1000	96

From Table 5, we see that using the ResNet-152 model for features extraction achieves the best accuracy by extracting global features from layer fc1000 and local features from layer res3b7. It is in agreement for experiment 2 on the EYB database.

5.2.3 Experiment 3

Experiment 3 was performed on AR to demonstrate the effect of the fusion approach of LDA, number of PCs, and λ parameter. The parameters chosen for this experiment are the parameters that result in the best accuracies in experiments 1 and 2, which are 5 decomposition levels, ResNet-152 with fc1000 and res3b7 layers. The accuracy results obtained from this experiment are shown in Table 6. The second approach of fusion achieves the maximum accuracy (100%) with 200 PCs, and the value of λ does not affect the accuracy with this number of PCs, whereas the effect of λ value is different on other #PCs. Averaging the accuracies achieved using the same λ value for all #PCs on both approaches, we can see that p = 1 achieves the maximum average.

Table 6
Effect of features fusion approach and number of PCs on the accuracy of the method on AR database

First Approach Second Approach

#PCs 50 100 150 200 50 100 150 200 AVG

λ = 1 95.4 98.6 99.4 95.2 92.6 99.4 100 100 97.58

λ = 20 95 98.8 99.2 96.6 87.4 96 98.8 100 96.48

λ = 50 92.8 98.2 98.8 97.2 82.2 95 96.8 100 95.13

λ = 100 90.2 96.6 98 97.8 77.4 91.8 94.4 100 93.28

λ = 120 88.8 96 97.4 97.6 76.4 90.6 93.6 100 92.55

First Approach	Second Approach
λ = 1	95.4	98.6	99.4	95.2	92.6	99.4	100	100	97.58
λ = 20	95	98.8	99.2	96.6	87.4	96	98.8	100	96.48
λ = 50	92.8	98.2	98.8	97.2	82.2	95	96.8	100	95.13
λ = 100	90.2	96.6	98	97.8	77.4	91.8	94.4	100	93.28
λ = 120	88.8	96	97.4	97.6	76.4	90.6	93.6	100	92.55

From the above experimental results on the AR database, the maximum reached accuracy is 100%. This accuracy is a result of using 5 decomposition levels, fc1000 and res3b7 layers from the ResNet-152 model for feature extraction, second fusion approach with 200 PCs and λ = 1.

5.3 Experimental Results on M-PIE database

For each subject of the M-PIE database, 4 images are used for training, and 11 images are used for testing. Three experiments were conducted like the experiments on the AR database. Experiment 1 is to test the effect of decomposition levels on the M-PIE database. Experiment 2 demonstrates the impact of the CNN model and the layers for feature extraction. Experiment 3 is performed to test the effect of the LDA fusion method and the number of PCs, along with the effect of the λ parameter of ridge regression.

5.3.1 Experiment 1

In this experiment, in order to test the effect of decomposition levels, the VGG-16 model is used to extract global features using layer 34 and local features using layer 31. Fusion of these features is done using the second approach. The number of PCs is fixed at 250 and λ = 20. The results of this experiment are shown in Fig. 7. The best accuracy is achieved using 4 levels, while 2 levels give the worst accuracy.

Fig. 7

Effect of decomposition levels on M-PIE database.

5.3.2 Experiment 2

This experiment demonstrates the effect of using VGG-16, GoogLeNet, and ResNet-152 CNN models. For each tested model, 4 levels of decomposition are used. The second LDA fusion approach is used. The number of PCs is fixed at 250 and λ = 20. The setup and results are shown in Table 7.

Table 7
Effect of CNN pre-trained models on M-PIE database

Local Global Accuracy

VGG-16 31 34 84.68

24 34 89.69

GoogLeNet icp9_out cls3_fc 84.54

Icp3_out cls3_fc 90.4

ResNet-152 res5b fc1000 87.08

res3b7 fc1000 93.43

	Local	Global	Accuracy
VGG-16	31	34	84.68
	24	34	89.69
GoogLeNet	icp9_out	cls3_fc	84.54
	Icp3_out	cls3_fc	90.4
ResNet-152	res5b	fc1000	87.08
	res3b7	fc1000	93.43

5.3.3 Experiment 3

In this experiment, different numbers of PCs are tested for both LDA fusion approaches, along with different values of λ. The decomposition with 4 levels is used. Features are extracted using ResNet-152 fc1000 and res3b7 layers. The accuracies of this experiment are shown in Table 8. The best accuracy is achieved using the second fusion approach with 600 PCs and λ = 20. It is noticeable that the best accuracy for each #PCs is achieved using different values of λ. Therefore, we show the average accuracies achieved using each value λ. The maximum average is using λ = 20.

Table 8
Effect of features fusion approaches and number of PCs on M-PIE database

First approach Second approach

#PCs 400 450 500 550 600 650 400 450 500 550 600 650 AVG

λ = 1 94.7 94.7 94.28 94.21 91.6 85.39 95.97 96.18 96.47 96.25 96.4 96.25 94.37

λ = 20 95.62 95.48 95.62 95.97 93.78 88.77 95.34 95.62 95.69 95.97 96.54 96.25 95.05

λ = 50 95.27 95.41 95.13 94.7 93.93 90.96 94.56 95.27 95.41 95.55 96.25 96.25 94.89

λ = 100 94.77 94.91 94.84 94.49 94 92.59 94.28 95.2 95.2 95.34 95.55 96.25 94.79

λ = 120 94.7 94.84 94.84 94.49 93.57 93.22 94.14 94.84 95.2 94.98 95.41 96.25 94.71

First approach	Second approach
λ = 1	94.7	94.7	94.28	94.21	91.6	85.39	95.97	96.18	96.47	96.25	96.4	96.25	94.37
λ = 20	95.62	95.48	95.62	95.97	93.78	88.77	95.34	95.62	95.69	95.97	96.54	96.25	95.05
λ = 50	95.27	95.41	95.13	94.7	93.93	90.96	94.56	95.27	95.41	95.55	96.25	96.25	94.89
λ = 100	94.77	94.91	94.84	94.49	94	92.59	94.28	95.2	95.2	95.34	95.55	96.25	94.79
λ = 120	94.7	94.84	94.84	94.49	93.57	93.22	94.14	94.84	95.2	94.98	95.41	96.25	94.71

From experimental results on the M-PIE database, the best-achieved accuracy is 96.54%. This accuracy is a result of using 4 levels of decomposition, and then extracting global and local features using ResNet-152 fc1000 and res3b7 layers respectively. The fusion of these extracted features is done using the second approach with 600 PCs and λ = 20.

5.4 Discussion and comparison

To assess the performance of the proposed face recognition method, we conducted different experiments on Extended Yale B (EYB), AR, and M-PIE databases. Different factors affect the performance of the method: the number of decomposition levels of CT, the choice CNN model, the fusion approach, the number of PCs, and the value of the regularization parameter of the RR method. The experiments revealed that 5 levels of decomposition are suitable for EYB and AR, while this number is 4 for the M-PIE database. This difference in the number of decomposition levels is probably due to the reason that the first two databases consist of grayscale images while M-PIE contains RGB images. The results show that larger levels of decomposition produce a fine-grained hierarchy of multi-scale and multi-directional features, which help us to keep the discriminative information at large and small scales. Using two or three decomposition levels implies that the large-scale (low-frequency) sub-band still contains distinctive features that cannot be discarded. As we expected, ContourletDS allows us to separate the multi-directional features of an image at different scales and remove the features which are due to the illumination variation and noise and preserve the large-scale and small-scale features, which are important for discrimination. This is validated by the results on the three databases.

Figure 8 gives a comprehensive view of the performance of different CNN models and their layers on the three datasets. The CNN model which yields the discriminative features from all databases is ResNet152, and the features from CONV layer res3b7 and FC layer fc1000 are the most discriminative. As the CNN models are trained on imageNet database, and the low-level layers learn the features which are independent of the domain of the database, the higher level CONV layer res5b does not give a result in the discriminative features but the lower level CONV layer res3b7.

Fig. 8

Effect of CNN different models on EYB, AR and M-PIE databases.

The experiments of all the three databases revealed that the second approach for the fusion of local and global features is more effective, but the number of PCs is different for different datasets; for EYB and AR, it is 150 and 200, respectively, but for M-PIE it is 600. For the first two datasets, the difference is not big, but for M-PIE, the difference is big. This difference can be attributed to the format of images. Also, the experiments have shown that the best value of the regularization parameter λ of RR is 20 for the three datasets. To establish the usefulness of the proposed method, we compare it with state-of-the-art methods. For a fair comparison, we use the EYB database due to two reasons: (i) it involves face images with severe illumination variations and validates the performance of an algorithm to tackle severe lighting conditions, (ii) in this database, the protocol is fixed, whereas in other databases there can be the differences in the choice of gallery and probe samples. The comparison results are shown in Table 9. We have taken the results, which are reported in the original papers, to avoid any bias due to parameter tuning.

Table 9

Comparison between some related work and proposed method on EYB database w.r.t Accuracy (%)

Ref.	Extended Yale B Database
	Subset1	Subset2	Subset3	Subset4	Subset5
Xie et al. [20], 2011	–	100	86	85.3	84.8
Lai et al. [33], 2014	–	89.29	88.07	88.39	91.69
Ajaya et al. [50], 2014	–	99.46	98.3	98.7	93.23
Lai et al. [12], 2015	81.3	53.35	81.45
Singh et al. [51], 2016	–	90
Tu et al. [52], 2017	–	100	90.7	80.5	75.6
Yu et al. [15], 2017	–	100	99.12	96.99	97.74
Hu et al. [10], 2017	94.92	83.02	97.97
Hu et al. [11], 2019	96.31	78.53	97.30
Deng et al. [31], 2019	86.31	47.14	27.67
Changhui et al. [19], 2020	90.58	90.79	97.74
Proposed method	–	100	99.8	100	99.01

It is noted that the proposed method outperforms the other state-of-the-art methods on all of the EYB subsets. The reason is that it employs contourletDS to decompose an image into a fine-grained hierarchy of multi-dimensional and multi-scale sub-bands, which allow getting rid of illumination variations and noise without losing the discriminative information, and CNN for discriminative feature extraction.

The fair comparison of the computational complexity of the state-of-the-art methods is possible if they are implemented and run on the same platform. As the implementation of the state-of-the-art methods is not available, it is not possible to compare with them. However, we give an idea of the computational complexity of our method. The main steps of the algorithm are (i) extracting small-scale and large scale components, which takes 0.167284 seconds; (ii) computation of local and global features, which costs 6.193673 seconds with ResNet, (iii) fusion of local and global features, it takes 0.0370 seconds and (iv) finally matching, which consumes 0.007468 seconds. Thus the total cost for a single query is 6.405425 seconds on an ordinary system equipped with an I7 processor. This computational cost with enhanced accuracy is acceptable for real-time applications. The bottleneck is the computation of the local and global features using CNN models. There are two possible avenues to explore to overcome this issue. First, we make two separate forward possesses, which take twice the time. An intelligent implementation can overcome this issue which allows computing local and global features in one pass. Second, to explore the CNN models with less computational cost.

6 Conclusion

We have proposed an illumination invariant face recognition system using ContourletDS and CNN models. Based on the Lambertian reflectance model, a face image can be decomposed into large-scale and small-scale components. The small-scale component contains the illumination invariant features, while the illumination variations affect the large-scale features. To retain the discriminative information from large-scale and small-scale components, and discard the illumination effect from large-scale components and detrimental noise from small components, the system employs ContourletDS. Then, the discriminative features are extracted from the preprocessed image using a CNN model. Global and local features are extracted using pre-trained models (VGG-16, GoogLeNet, and ResNet-152). After that LDA is used to fuse these features and reduce the dimension using two different approaches. For classification, ridge regression is used. The larger decomposition levels of ContourletDS help in a better way to separate and eliminate illumination variation effects and detrimental noise and lead to better results. The ResNet-152 model with layer CONV layer res3b7 and FC layer fc1000 beats the VGG-16 and GoogLeNet models. The proposed method achieved 99.7%, 100%, and 96.54% accuracies on EYB, AR, and M-PIE databases, respectively, and it outperforms the state-of-the-art methods. The proposed method uses ContourletDS to eliminate the illumination variation and noise. The alternative is to use generative adversarial learning to remove the illumination effects; it is our future work.

Footnotes

Acknowledgments

The research was supported under Researchers Supporting Project number (RSP-2019/109) King Saud University, Riyadh, Saudi Arabia.

References

S.Z.

, Jain

A.K.

, Handbook of Face Recognition, New York: Springer, 2005.

Adini

, Moses

, Ullman

, Face recognition: the problem of compensating for changes in illumination direction, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7) (1997), 721–732.

Turk

, Pentland

, Eigenfaces for recognition, Journal of Cognitive Neuroscience 3(1) (1991), 71–86.

Shan

S.G.

, Gao

, Cao

, Zhao

D.B.

, Illumination normalization for robust face recognition against varying lighting conditions, in IEEE International Workshop on Analysis and Modeling of Faces and Gestures, 2003, pp. 157–164.

, Ward

R.K.

, Adaptive region-based image enhancement method for robust face recognition under variable illumination conditions, IEEE Transactions on Circuits and Systems for Video Technology 20(9) (2010), 1165–1175.

Savvides

, Kumar

B.V.

, Illumination normalization using logarithm transforms for face authentication, in International Conference on Audio- and Video-Based Biometric Person Authentication, 2003, pp. 549–556.

Zhang

T.P.

, Tang

Y.Y.

, Fang

, Shang

Z.W.

, Liu

X.Y.

, Face recognition under varying illumination using gradientfaces, IEEE Transactions on Image Processing 18(11) (2009), 2599–2606.

Wang

, Li

W.F.

, Yang

W.M.

, Liao

Q.M.

, Illumination normalization based on weber’s law with application to face recognition, IEEE Signal Processing Letters 18(8) (2011), 462–465.

Kim

, Suh

, Hwang

, Han

, Svd face: Illumination-invariant face representation, IEEE Signal Processing Letters 21(11) (2014), 1336–1340.

10.

, Lu

, Ye

, Zeng

, Singular value decomposition and local near neighbors for face recognition under varying illumination, Pattern Recognition 64 (2017), 60–83.

11.

C.-H.

, Zhang

, Wu

, Lu

X.-B.

, Liu

, Jing

X.-Y.

, Toward driver face recognition in the intelligent traffic monitoring systems, IEEE Trans Intell Transp Syst, early access, Oct. 14, 2019, doi: 10.1109/TITS.2019.2945923.

12.

Lai

Z.-R.

, Dai

D.-Q.

, Ren

C.-X.

, Huang

K.-K.

, Multi-scale logarithm difference edgemaps for face recognition against varying lighting conditions, IEEE Trans Image Process 24(6) (2015), 1735–1747.

13.

Lai

Z.R.

, Dai

D.Q.

, Ren

C.X.

, Huang

K.K.

, Multi-layer surface albedo for face recognition with reference images in bad lighting conditions, IEEE Transactions on Image Processing 23(11) (2014), 4709–4723.

14.

, Do

M.N.

, A New Contourlet Transform with Sharp Frequency Localization, in 2006 International Conference on Image Processing, Atlanta, GA, USA, 2006.

15.

Y.-F.

, Dai

D.-Q.

, Ren

C.-X.

, Huang

K.-K.

, Discriminative multi-layer illumination-robust feature extraction for face recognition, Pattern Recognition 67 (2017), 201–212.

16.

Tan

X.Y.

, Triggs

, Enhanced local texture feature sets for face recognition under difficult lighting conditions, IEEE Transactions on Image Processing 564 19(6) (2010), 1635–1650.

17.

Lei

, Pietikainen

, Li

S.Z.

, Learning discriminant face descriptor, IEEE Transactions on Pattern Analysis and Machine Intelligence 36(2) (2014), 289–302.

18.

Ren

C.X.

, Dai

D.Q.

, Li

X.X.

and Lai

Z.R.

, Band-reweighted gabor kernel embedding for face image representation and recognition, IEEE Transactions on Image Processing 23(2) (2014), 725–740.

19.

, Wu

, Yu

, Jing

, Lu

, Liu

, Diagonal Symmetric Pattern-Based Illumination Invariant Measure for Severe Illumination Variation Face Recognition, IEEE Access 8 (2020), 63202–63213.

20.

Xie

, Zheng

W.-S.

, Lai

, Yuen

P.C.

, Suen

C.Y.

, Normalization of Face Illumination Based on Large-and Small-Scale Features, IEEE Transactions on Image Processing 20(7) (2011), 1807–1821.

21.

Kumar

, Srivastava

, Image authentication by assessing manipulations using illumination, Multimed Tools Appl 78 (2019), 12451–12463. https://doi.org/10.1007/s11042-018-6775-x

22.

Kumar

, Srivastava

, Uddin

, Forgery detection using multiple light sources for synthetic images, Australian Journal of Forensic Sciences 51(3) (2019), 243–250. DOI: 10.1080/00450618.2017.1356871.

23.

Aggarwal

, Kumar

, Image surface texture analysis and classification using deep learning, Multimed Tools Appl 80 (2021), 1289–1309. https://doi.org/10.1007/s11042-020-09520-2

24.

Rani

, Kumar

, Identification of copy-move and splicing based forgeries using advanced SURF and revised template matching, Multimedia Tools and Applications 80 (2021). 10.1007/s11042-021-10810-6

25.

Aggarwal

, Alshehri

, Kumar

, Sharma

, Alfarraj

, Deep

, Principal component analysis, hidden Markov model, and artificial neural network inspired techniques to recognize faces, Concurrency and Computation: Practice and Experience 33(9) (2021), e6157.

26.

Kumar

, Rani

, Srivastava

, Image forensics based on lighting estimation, International Journal of Image and Graphics 19(03) (2019), 1950014.

27.

Aggarwal

, Rani

, Kumar

, A robust method to authenticate car license plates using segmentation and ROI based approach, Smart and Sustainable Built Environment 9(4) (2020), 737–747. https://doi.org/10.1108/SASBE-07-2019-0083

28.

Parkhi

O.M.

, Vedaldi

, Zisserman

, Deep face recognition, in Proc. Brit. Mach. Vis. Conf., Sep. 2015, pp. 1–12.

29.

Schroff

, Kalenichenko

, Philbin

, FaceNet: A unified embedding for face recognition and clustering, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 815–823.

30.

Wang

, Wang

, Zhou

, Ji

, Gong

, Zhou

, Li

, Liu

, CosFace: Large margin cosine loss for deep face recognition, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit, Jun. 2018, pp. 5265–5274.

31.

Deng

, Guo

, Xue

, Zafeiriou

, ArcFace: Additive angular margin loss for deep face recognition, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 4690–4699.

32.

C.-H.

, Lu

X.-B.

, Liu

, Jing

X.-Y.

, Yue

, Single sample face recognition under varying illumination via QRCP decomposition, IEEE Trans Image Process 28(5) (2019), 2624–2638.

33.

Lai

Z.-R.

, Dai

D.-Q.

, Ren

C.-X.

, Huang

K.-K.

, Multilayer Surface Albedo for Face Recognition With Reference Images in Bad Lighting Conditions, IEEE Transactions on Image Processing 23(11) (2014).

34.

Bamberger

R.H.

, Smith

M.J.T.

, A filter bank for the directional decomposition of images: theory and design, IEEE Trans Signal Proc 40(4) (1992), 882–893.

35.

Mian

, Face Recognition Using Contourlet Transform and Multi-directional Illumination from a Computer Screen, in Advanced Concepts for Intelligent Vision Systems (ACIVS 2010), Lecture Notes in Computer Science, vol. 6475, pp. 332–344, Springer, Berlin, Heidelberg, 2010.

36.

Biswas

, Sil

, An efficient face recognition method using contourlet and curvelet transform, J King Saud University – Computer and Information Sciences 32 (2020), 718–729.

37.

Simonyan

, Zisserman

, Very Deep Convolutional Networks for Large-Scale Image Recognition, in International Conference on Learning Representations, 2015.

38.

Szegedy

, Liu

, Jia

, Sermanet

, Reed

, Anguelov

, Erhan

, Vanhoucke

, Rabinovich

, Going deeper with convolutions, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015.

39.

, Zhang

, Ren

, SunDeep

, Residual Learning for Image Recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016.

40.

https://www.vlfeat.org/matconvnet/pretrained/

41.

Togaçar

, Ergen

and Comert

, Application of breast cancer diagnosis based on a combination of convolutional neural networks, ridge regression and linear discriminant analysis using invasive breast cancer images processed with autoencoders, Medical Hypotheses 135 (2020), 109503.

42.

, Yuan

, Dimension reduction of image deep feature using PCA, Journal of Visual Communication and Image Representation 63 (2019), 102578.

43.

Belhumeur

P.N.

, Hespanha

J.P.

, Kriegman

D.J.

, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), july 1997.

44.

Gong

, Lazebnik

, Comparing data-dependent and data-independent embeddings for classification and ranking of Internet images, CVPR 2011, Providence, RI, pp. 2633–2640, (2011). doi: 10.1109/CVPR.2011.5995619

45.

Shen

, Shen

, Zhou

, Yang

, Shen

H.T.

, Face image classification by pooling raw features, Pattern Recognition 54 (2016), 94–103.

46.

Georghiades

, Belhumeur

, Kriegman

, From few to many: illumination cone models for face recognition under variable lighting and pose, IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6) (2001), 643–660.

47.

Martinez

A.M.

, Benavente

, The AR Face Database, CVC Technical Report #24, June 1998.

48.

Sim

, Baker

, Bsat

, The CMU Pose, Illumination, and Expression (PIE) database, in Proceedings of Fifth IEEE International Conference on Automatic Face and Gesture Recognition, 2002, Proceedings, DC, USA, 2002.

49.

Aghdam

H.H.

, Heravi

E.J.

, Guide to Convolutional Neural Networks, Springer International Publishing, 2017, pp, 106–108.

50.

Ajaya

H.S.

, Gowda

M.N.E.

, Manikantan

, Ramachandran

, Face recognition by optimal band selection in contourlet transform domain, in 2014 9th International Conference on Industrial and Information Systems (ICIIS), Gwalior, India, 2014.

51.

Singh

C.R.

, Patil

H.Y.

, A shearlet transform based illumination invariant 2-D face recognition, in International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, India, 2016.

52.

, Gao

, Xie

, Qi

, Ma

, Illumination normalization based on correction of large-scale components for face recognition, Neurocomputing 266 (2017), 29.

	First approach				Second approach
#PCs	50	100	150	200	50	100	150	200
Subset2	100	100	100	100	100	100	100	100
Subset3	99.42	99.8	99.8	99.8	99.61	99.8	99.8	99.8
Subset4	98.46	100	100	99.78	97.8	99.78	100	100
Subset5	97.19	98.45	98.31	98.45	97.75	98.73	99.01	98.73

First Approach					Second Approach
#PCs	50	100	150	200	50	100	150	200	AVG
λ = 1	95.4	98.6	99.4	95.2	92.6	99.4	100	100	97.58
λ = 20	95	98.8	99.2	96.6	87.4	96	98.8	100	96.48
λ = 50	92.8	98.2	98.8	97.2	82.2	95	96.8	100	95.13
λ = 100	90.2	96.6	98	97.8	77.4	91.8	94.4	100	93.28
λ = 120	88.8	96	97.4	97.6	76.4	90.6	93.6	100	92.55

Illumination invariant face recognition using contourlet transform and convolutional neural network

Abstract

Keywords

1 Introduction

3 Proposed method

3.3 Decimation and reconstruction

3.4 CNN model

3.5 Fusion using LDA model

3.6 Classification using ridge regression

4.1 Datasets

Table 1 AR, Extended Yale B, and M-PIE databases characteristics Database Subjects Images illumination Conditions Extended Yale B 38 2414 64 AR 100 800 8 M-PIE 250 2294 15

5.1 Experimental results on Extended Yale B database

5.1.1 Experiment 1

Table 4 Effect of p parameter on EYB λ 1 20 50 100 120 Accuracy subset2 100 100 100 100 100 subset3 99.8 99.8 99.8 99.8 99.8 subset4 99.78 100 100 100 99.56 subset5 98.87 99.01 98.73 98.31 98.31

5.2.1 Experiment 1

Table 5 Effect of CNN pre-trained models on AR database Local Global Accuracy VGG-16 31 34 83.4 24 34 92.4 GoogLeNet icp9_out cls3_fc 83.6 Icp3_out cls3_fc 92.6 ResNet-152 res5b fc1000 87.6 res3b7 fc1000 96

5.3.1 Experiment 1

Table 7 Effect of CNN pre-trained models on M-PIE database Local Global Accuracy VGG-16 31 34 84.68 24 34 89.69 GoogLeNet icp9_out cls3_fc 84.54 Icp3_out cls3_fc 90.4 ResNet-152 res5b fc1000 87.08 res3b7 fc1000 93.43

Footnotes

Acknowledgments

References

Table 1
AR, Extended Yale B, and M-PIE databases characteristics

Database Subjects Images illumination Conditions

Extended Yale B 38 2414 64

AR 100 800 8

M-PIE 250 2294 15

Table 4
Effect of p parameter on EYB

λ 1 20 50 100 120

Accuracy subset2 100 100 100 100 100

subset3 99.8 99.8 99.8 99.8 99.8

subset4 99.78 100 100 100 99.56

subset5 98.87 99.01 98.73 98.31 98.31

Table 5
Effect of CNN pre-trained models on AR database

Local Global Accuracy

VGG-16 31 34 83.4

24 34 92.4

GoogLeNet icp9_out cls3_fc 83.6

Icp3_out cls3_fc 92.6

ResNet-152 res5b fc1000 87.6

res3b7 fc1000 96

Table 7
Effect of CNN pre-trained models on M-PIE database

Local Global Accuracy

VGG-16 31 34 84.68

24 34 89.69

GoogLeNet icp9_out cls3_fc 84.54

Icp3_out cls3_fc 90.4

ResNet-152 res5b fc1000 87.08

res3b7 fc1000 93.43