ConvSRC: SmartPhone-based periocular recognition using deep convolutional neural network and sparsity augmented collaborative representation

Abstract

Smartphone-based periocular recognition (SPR) has gained significant attention because of the limitations of face and iris biometric modalities. For this problem, most of the existing methods employ hand-crafted features. On the other hand, deep convolutional neural networks (CNN), which learn features automatically, have shown outstanding performance for many visual recognition tasks over hand-crafted features. In view of this paradigm shift, we propose an SPR method based on CNN model. A CNN model needs a huge volume of data, but for periocular recognition problem only limited amount of data is available. One solution for this issue is to use a CNN model pre-trained on the dataset from a related domain, but this raises the questions of how to extract discriminative features from a pre-trained CNN model and classify them. We introduce a simple, efficient and compact image representation method based on a pre-trained CNN model (VGG-Net). This method employs the wealth of information and sparsity existing in the activations of convolutional layers of a CNN model. For recognition, we use an efficient and robust Sparse Augmented Collaborative Representation based Classification (SA-CRC) technique. For a thorough evaluation of ConvSRC (the proposed system), experiments were carried out on the VISOB database, which was presented as the challenge dataset in ICIP2016. The results show the superiority of ConvSRC over the state-of-the-art methods; it obtains a GMR of more than 99% at FMR = 10^-3 and outperforms the first winner of the ICIP2016 challenge by 10%.

Keywords

Periocular biometric mobile biometrics deep learning convolutional neural network sparse collaborative representation

1 Introduction

With the development of smart devices equipped with digital cameras and significant computational power, the biometrics research has turned its focus to mobile-based biometrics. Face is one of the most popular biometric modality, which performs well; however, the performance declines if the face is partially hidden [1]. The whole face of criminals often does not appear in surveillance videos. In some situations, the face is either covered by helmets, hair, glasses or skiing masks. Furthermore, due to cultural and religious reasons, women cover their faces partially in some countries. In these scenarios, the region around the eyes (periocular) is the only visible trait which can be used as biometric (see Fig. 1). Also, the acquisition of the image of the periocular region does not require high user cooperation and close capture distance unlike other ocular biometrics (e.g., iris, retina, and sclera). The periocular region gives a trade-off between the whole face and the iris alone.

Fig. 1

Examples of some cases where using periocular biometric is effective.

Several efforts have been made on periocular recognition [2 –4]. Most of the existing techniques use hand-crafted features such as Block HOG [5], HOG [6] and Color histogram [7], SURF [8], LBP [9, 10], Phase Intensive Local Pattern (PILP) [11] and SIFT [12]. The state-of-the-art techniques include the deep sparse representation [13], BSIF [14], Local phase Quantization (LPQ) features [15], Block BSIF [16], Phase Intensive Local Pattern (PILP) [40], Hybrid deep learning [41] and Deep Sparse Filters [17]. Only few works used learned features [18–22 , 41]. For recognition, mostly KNN with distance metrics such as city block [9 , 24], Euclidean [1, 25], chi-square [12 , 27] and mean square error [28] has been used. Some works used learning based techniques like SVM [18, 29], and Neural Networks [19, 17].

Recent studies have shown that learned features are more effective than hand-crafted features [2, 30]. Currently, though CNNs, which learn features automatically, have attracted a lot of attention for visual recognition tasks due to their outstanding performance [31], a huge volume of data is needed for their learning. For periocular recognition, a huge amount of data is not available and it is hard to learn descriptive features using CNNs. According to some studies, when a CNN model, pre-trained on a large dataset from a related domain, is employed for a visual classification task with small dataset, it results in impressive performance [32]. Most of the methods based on pre-trained CNN models use the activations of a fully connected layer as a representation (global features). The recent studies have shown that there is a wealth of information encoded in convolutional layers (local features), which results in better performance [33, 34]. In view of this, we examined the potential of convolutional layers for periocular recognition.

Recently, sparse and collaborative representation based classification techniques have shown promising results [35]. In a recent work, Naveed et al. [36] proposed Sparsity Augmented Collaborative Representation based Classification (SA-CRC) technique, which is the augmentation of a dense collaborative representation with a sparse representation and improves the recognition performance. In this study, we examined its usefulness for periocular recognition.

Motivated by the effectiveness of convolutional layers and SA-CRC, we propose a simple technique for extracting local features from convolutional layers using a pre-trained CNN model, and a periocular recognition method based on local CNN features and SA-CRC, we call this method as ConvSRC. The development of the method raises many questions: (1) how to extract discriminative features from convolutional layers for periocular recognition? (2) can deep features extracted using CNN models pre-trained on a dataset from a related domain (e.g. face dataset) be generalized to periocular recognition task? (3) are the features extracted using CNN model pre-trained on a dataset of natural images (e.g. ImageNet dataset) effective for this task? (4) which features (i.e. local or global) are more suitable for this task? (5) Is SA-CRC effective with deep features?

To answer these questions, we thoroughly analysed the performance of different feature representations extracted from convolutional and fully connected layers using the pre-trained VGG-Face [37], which is learned on a dataset from a closely related domain, and VGG-16 trained on ImageNet dataset. For classification, we investigated the impact of using SA-CRC [36] compared to KNN, ridge regression and different fusing methods of the collaborative (CRC) and sparse (SRC) representations. Extensive experiments conducted on VISOB, the ICIP2016 challenge dataset for smartphone periocular recognition [30], show that ConvSRC gives promising results; it outperforms the state-of-the-art methods and the first winner of the ICIP2016 challenge (up to 10%).

The Novelty of the proposed method is to use a pre-trained CNN model to overcome the small data size issue for periocular recognition problem and the procedure to extract discriminative features from the CONV layers of the pre-trained CNN model. The main contributions of this paper are summarized as follows:

Proposes an efficient and robust periocular recognition method, which is based on the discriminative feature extraction technique from a pre-trained CNN model and SA-CRC classifier.

Demonstrates that for periocular recognition problem, local and global periocular CNN features, even with a simple KNN classifier, result in remarkable performance improvement over the state-of-the-art hand crafted and learned features.

Thoroughly examines the effectiveness of local and global features and shows that for periocular recognition, local features extracted from a convolutional layer encode more discriminative information than global features determined from a fully connected layer.

Empirically, it was found that VGG-16 pre-trained on ImageNet [38] results in almost similar features extracted using VGG-Face model. This outcome reveals the strength of VGG architecture.

The rest of the paper is organized as follows. Section 2 provides the detail of the proposed method. The evaluation protocol is described in Section 3. The model selection is discussed in Section 4, while the results and discussion are presented in Section 5. Finally, Section 6 concludes the paper.

2 Periocular recognition method based on CNN model and SA-CRC

A systematic diagram of the proposed method (ConvSRC) is shown in Fig. 2. First, the input image is resized to 224x224 and passed to VGG Net to obtained the activations of convolutional and fully connected layers. These activations are sparse and include a wealth of discriminative information, and principal component analysis (PCA) is employed to extract discriminative features from them. Finally, SA-CRC is used for decision making.

Fig. 2

The systematic diagram of ConvSRC.

2.1 Deep CNN Features

For extracting CNN features, the first task is to select the suitable CNN model. VGG Net is a commonly used model [39], which have shown promising results for different Computer Vision applications. We adopted very deep VGG-Face model, which is a VGG-16 model and is trained on face image dataset, a domain related to periocular recognition. Our selection is based on the observation that periocular region is a part of face and CNN learns hierarchy of features in such a way that higher level layers encode object parts.

The structure of Very deep VGG-Face model is shown in Fig. 3; it comprises 5 blocks of convolutional (CONV) layers, each followed by max pooling layer and three fully connected (FC) layers. Each CONV layer generates feature maps (activations) by convolving the input with a bank of linear filters, learned during training and is followed by a rectification layer (ReLU), which applies ReLU non-linearity on the features maps. The feature maps of low CONV layers are large in size but small in number but those of higher CONV layers are smaller in size but larger in number. The size of feature maps of first CONV layer is 224x224 and their number is 64, whereas the size of feature maps in last CONV layer is 14x14 and their number is 512. This structure encodes hierarchy of features. The last three layers are FC layers, which process the features with linear operation followed by ReLU non-linearity. The output of the first two FC layers is a 4,096-dimensional vector each and the last FC layer yields a vector of dimension 2,622, which is passed to a softmax layer to compute the class posterior probabilities. The input to the network is an RGB image of size 224×224. This network was trained using a huge dataset (2.6M face images of 2.6K people).

Fig. 3

The architecture of VGG-Face.

Activations of different layers of a CNN model encode very rich information and can be employed for extracting discriminative features. CONV layers encode low, medium and high level features along with their spatial information, whereas fully connected (FC) layers embed global features discarding the spatial information. Layer selection is the key success factor to extract discriminative features for a certain application. We propose to extract two types of features: (1) global, extracted from an FC layer (2) local, extracted from a CONV layer.

2.1.1 Local Deep CNN Features

Most of the pre-trained CNN based feature extraction methods use the activations of FC layers as representation; it has been shown that the activations of CONV layers are more informative [33, 34]. Unlike FC layers, CONV layers encode hierarchy of local features and retain the spatial information and can lead to more discriminative representations.

For local features, we focus on CONV layers. In very deep VGG-Face, low level CONV layers encode low level facial features such as micro-texture features, and high level CONV layers encode facial parts like lips, nose and different parts of periocular region. As such high CONV layers (Conv5_1, Conv5_2, Conv5_3) are the best choices for extracting local deep features. One simple way is to vectorise each feature map p_i (i = 1, 2, ... , n) of a CONV layer to x _i∈R^d (d = n₁×n₂, the size of p_i), and then concatenate them into a feature vector x = ${[x_{1}^{T}, x_{2}^{T}, \dots, x_{n}^{T}]}^{T} \in R^{D}$ (D = n₁×n₂×n, n being the number of feature maps in the CONV layer), as shown in Fig. 4(a). This leads to the curse of dimensionality problem; the dimension of the feature space becomes excessively large, e.g. in Conv5_2, the total number of feature maps is 512 and the size of each feature map is 14×14, this results in a feature vector of dimension 100,352. We observe that there is a large amount of sparsity in feature maps. The Singular Value Decomposition (SVD) of the matrix P = [x₁, x₂, ⋯ , x₅₁₂] ∈ R^196×512, where the dimension of each vector is 196 (14×14), for Conv5_2 gives the singular values which are shown in Fig. 4(b); the majority of the singular values are almost zero which is an indication of the sparsity in feature maps. It follows from this observation that the feature vector x can be compressed significantly without losing noticeable amount of information. One simple and effective approach for compression is Principle Component Analysis (PCA). As in this case, the dimension of feature vectors is 100,352 but only a small number (≤2000) of eigenvalues of covariance matrix C ∈ R^D×D (D = 100,352 in this case) are significant, see Fig. 5, so we employ the trick used in eigenfaces approach to compute the principal eigenvalues and the corresponding principal eigenvectors to compute the PCA transformation matrix M ∈ R^D×K, where K is the number of features in the compressed feature vector $\hat{x} \in R^{K}$ . In this case the computational complexity to compute depends on the number of training images, as the number of training images (e.g. 1582 images for left eyes of Samsung Daylight) is much less than the dimension (e.g. 100,352 for Conv5_2) of the local deep features, so the computation of M is efficient in respect of time and space complexity.

Fig. 4

(a) Computation of local deep CNN features. (b) The singular value decomposition of the feature maps of an image.

Fig. 5

The proportion of the variance of the principle components for Samsung, iphone and oppo with day light condition from VISOB dataset.

2.1.2 Global Deep CNN Features

The activations of FC layers of VGG-Face model are taken as global features, the spatial information is not retained in FC layers. There are three FC layers in the model, we examined the effect of each layer, and the detail is provided in Section 4.1.1.

2.2 Sparsity Augmented Collaborative Representation based Classifier (SA-CRC)

Using the extracted features (global/local) from the enrolment set, we form the dictionary φ = [φ₁, φ₂, …, φ_c] ∈ R^K×N, where the sub-matrix φ_i ∈ R^K×n_i is computed from feature vectors (normalized using l₂-norm) corresponding to n_i examples of the ith class, c is the total number of classes and $N = \sum_{i = 1}^{c} n_{i}$ . In sparse representation based classification (SRC) approach, for a test example y ∈ R^K, first a collaborative representation vector α ∈ R^N is computed such that y ≈ φα, then class-specific residuals r_i (y), (i = 1, 2, ... , c) are computed and finally y is assigned the label of the class for which the residual r_i(y) is minimum [35]. According to a study, the success of this approach is due to the collaboration of the training examples from all the classes. A recent work by Naveed et al. [36] analysed that though the real gain comes from collaboration, the sparseness cannot be ignored, it further improves the classification performance. Based on this analysis they proposed SA-CRC method, which employs both sparseness and collaboration. For classification, we employed this method. In this method, first, $\overset{ˇ}{α}$ is computed by solving the problem (collaboration problem [36]) $\overset{ˇ}{α} = min_{α} | | y - φ α | |_{2}^{2} + λ | | α | |_{2},$ (1)

Then $\hat{α}$ is calculated as the solution of the problem (sparseness problem [36]) $\hat{α} = min_{α} | | y - φ α | |_{2}^{2} s . t . | | α | |_{0} \leq k,$ (2) where || α ||₀ denotes the l₀-norm, which counts the number of non-zero coefficients in φ. Though l₀-minimization is NP-hard and can be approximated with l₁-minimization [42], l₀-minimization allows to define explicitly the sparsity level k and can be solved efficiently using the Orthogonal Matching Pursuit (OMP), which is a greedy algorithm for approximating the solution of sparse representation with l₀-minimization. Since we are using l₀-norm, the sparse representation has at most k (sparsity threshold) non-zero coefficients.

Finally, these solutions are augmented as follows [36]: $α = (\overset{ˇ}{α} + \hat{α}) / | | \overset{ˇ}{α} + \hat{α} | |_{2}$ (3) and y is assigned the label of ith class for which the component q_i of the vector Q = L_α attains maximum value, where L is the label matrix, for detail, see [36]. The collaborative representation $\overset{ˇ}{α}$ is computed using the regularized least squares method, while the sparse representation $\overset{ˇ}{α}$ is obtained by solving Equation 2 using OMP.

Algorithms 1, 2 and 3 present the detailed description of the steps involved in enrolment, verification and identification processes; respectively.

Algorithm 1: Enrolment Process of ConvSRC

Algorithm 2: Verification Process of ConvSRC
Input: Probe image I and label l.
Global: PCA matrix M, mean vector $\bar{x}$ , dictionary Φ and label
matrix L.
Output: Matching score s.
Processing:
1. Conv5-2 =VGGNet (I)
2. x = vectorize(Conv5-2)
3. $y = M^{T} (x - \bar{x})$
4. $\overset{ˇ}{α} = min_{α} \| \| y - φ α \| \|_{2}^{2} + λ \| \| α \| \|_{2}$
5. $\hat{α} = min_{α} \| \| y - φ α \| \|_{2}^{2} s . t . \| \| α \| \|_{0} \leq k$
6. $α = (\overset{ˇ}{α} + \hat{α}) / \| \| \overset{ˇ}{α} + \hat{α} \| \|_{2}$
7. s = (Lα) _l/n_l, where n_l is the number of enrolled images of
subject l

Algorithm 3: Identification Process of ConvSRC
Input: Probe image I.
Global: PCA matrix M, mean vector $\bar{x}$ , dictionary Φ and label
matrix L.
Output: Label l.
Processing:
1. Conv5-2 =VGGNet (I)
2. x = vectorize(Conv5-2)
3. $y = M^{T} (x - \bar{x})$
4. $\overset{ˇ}{α} = min_{α} \| \| y - φ α \| \|_{2}^{2} + λ \| \| α \| \|_{2}$
5. $\hat{α} = min_{α} \| \| y - φ α \| \|_{2}^{2} s . t . \| \| α \| \|_{0} \leq k$
6. $α = (\overset{ˇ}{α} + \hat{α}) / \| \| \overset{ˇ}{α} + \hat{α} \| \|_{2}$
7. Label $l = arg max_{i} {q_{i}}$ , where q_i denotes the ith coefficients
of Q = Lα

3 Evaluation protocol

We implemented the system using MATLAB R2015a on a PC with Intel (R) Core ™ i7-3610QM CPU @ 2.30 GHz and 12 GB RAM. The experiments were conducted on VISIT 1 of VISOB dataset [30], which is available in public domain. The identification performance of the system is reported in terms of accuracy, whereas the performance of verification is reported in terms of Genuine Match Rate (GMR) at False Acceptance Rate (FAR)=10^-2 and Equal Error Rate (EER). Further, the performance of the system for verification has also been presented in terms of Receiver Operating Curves (ROC).

Visible Light Mobile Ocular Biometric Database (VISOB) contains ocular images from 550 healthy adult volunteers acquired using front facing cameras of three different smartphones i.e. Samsung Note 4, iPhone 5 s and Oppo N1. The Oppo and Samsung devices captured images at 1080p resolution, while the images captured by iPhone are at 720p resolution. The dataset is divided into enrollment and validation sets (see Table 1). It was presented as a challenge dataset in ICIP2016 for periocular recognition, the detail can be found in [30]. For evaluation, we used the recommended protocol.

Table 1
Detail of the enrolment and validation sets of VISOB Dataset

Mobile device Number of images

Enrollment set Validation set

iPhone 14077 13208

Oppo 21976 21349

Samsung 12197 12240

Mobile device	Number of images
iPhone	14077	13208
Oppo	21976	21349
Samsung	12197	12240

4 Model selection

The final design of ConvSRC involves various design decisions: global vs local deep features and the number of principal components (PCs). For these decisions, the experiments are conducted on one subset of VISOB dataset i.e. Samsung left periocular images under day light condition.

4.1 Global vs local deep features

First, we address two design questions: which CONV/FC layer is suitable for extracting local/global deep CNN features? Which type of features (local/global) is more discriminative for periocular recognition? To answer these questions, we conducted extensive experiments for identification and verification tasks.

4.1.1 Global deep CNN features

VGG-Face contains three fully connected layers: FC6, FC7 and FC8. The FC6 and FC7 layers yield 4096 features whereas the FC8 layer extract only 2622 features. The verification and identification results shown in Tables 2 and 3, respectively, indicate that the features extracted from FC6 provide the best results. Also, we notice from Table 3 that PCA reduces the dimension significantly (from 4096 to 500), but the decrease in accuracy is negligible. We tested with fixed number (500) of principal components (PCs) and the number of PCs, which save 99% of the information.

Table 2
Verification results using features extracted from FC6, FC7 and FC8 and SA_CRC

Layer Size-FV Results

EER GMR at FAR = 0.1 GMR at FAR = 0.01 GMR at FAR = 0.001

FC6 4096 0.88 99.13 98.32 94.42

FC7 4096 1.21 98.66 96.77 85.47

FC8 2622 1.89 97.65 96.1 77.81

Layer	Size-FV	Results
FC6	4096	0.88	99.13	98.32	94.42
FC7	4096	1.21	98.66	96.77	85.47
FC8	2622	1.89	97.65	96.1	77.81

Table 3

Identification accuracies (%) using features extracted from FC6, FC7 and FC8

Layer	Size-FV	Accuracy (%)
		KNN(Euc,1)	SA-CRC
FC6	PCA-500	94.21	97.11
FC6	PCA-394 (save 99%)	94.21	96.83
FC6	4096	94.21	97.17
FC7	4096	93.54	95.96
FC8	2622	92.62	93.96

4.1.2 Local deep CNN features (ConvSRC)

There are thirteen CONV layers in VGG-Face architecture. We selected the last three CONV layers (i.e. conv5_1, conv5_2 and conv5_3) for our experiments on local deep CNN features. The reason for this selection is that the first layers encode texture information whereas last layers capture higher level features which represent the rich object based structural information in a better way and result in discriminative description. Each of these CONV layers contains 512 feature maps of size 14×14 units. The results obtained with local features extracted using the local deep CNN feature extraction technique described in Section 2.1.1 are shown in Tables 4 and 5; these results indicate that conv5_2 outperforms the other two layers in terms of EER and GMR, there is significant difference, especially, in EER and GMR at FAR = 0.001. PCA reduces the dimension of the feature space significantly without declining the verification and identification performance as is obvious from Table 5, the feature vectors of dimensions 100352 and 500 (reduced using PCA) from Conv5_2 give almost the same rank-1 identification result.

Table 4
Verification results of features extracted from conv5_1, conv5_2 and conv5_3

Layer Size-FV Results

EER GMR@ FAR = 0.1 GMR@ FAR = 0.01 GMR@ FAR = 0.001

Conv5_3 PCA-980 0.74 99.66 98.79 92.74

Conv5_2 PCA-1300 0.27 100 99.46 96.08

Conv5_1 PCA-876 0.81 99.33 98.59 91.12

Layer	Size-FV	Results
Conv5_3	PCA-980	0.74	99.66	98.79	92.74
Conv5_2	PCA-1300	0.27	100	99.46	96.08
Conv5_1	PCA-876	0.81	99.33	98.59	91.12

Table 5

Identification accuracies (%) of features extracted from conv5_1, conv5_2 and conv5_3

Layer	Size-FV	Accuracy (%)
		KNN(Euc,1)	SA_CRC
Conv5_3	100352	96.10	97.98
Conv5_2	100352	96.16	98.18
Conv5_2	PCA-500	96.01	97.91
Conv5_1	100352	95.49	97.91

There is significant difference in the performance of the three CONV layers. As we move higher in the hierarchy of the CONV layers, the low level features are composed into higher level abstractions, the results indicate that the composition of features at Conv5_2 results in the most discriminative features, which are further combined into higher level abstractions by Conv5_3 that defuses the features relevant to periocular regions. This effect further spreads up to FC layers, and one can see that the performance of FC layers is even worse that Conv5_3, look at EER values yielded by Conv5_3 in Table 4 and those by the FC6, FC7 and FC8 layers in Table 2. It indicates that Conv5_2 composes the lower level features into the representation that keep the structural information, which is important for discrimination of periocular region. From now onward, ConvSRC means the periocular recognition method that employs Conv5_2 for feature extraction.

4.2 Combining global and local features

In the previous sections, we presented and discussed the individual effects of global and local deep CNN features. The discussion revealed that local features form better representation than global features. Next question is whether the fusion of local and global features can improve the recognition performance. To address this question, we tested the effect of fusing local and global features. As FC6 and Conv5_2 result in the best performances among FC and CONV layers, we fused the features from the FC6 layer and the local features from the Conv5_2 layer. The features are fused after standardizing them so that each feature has zero mean and unit variance. The results are shown in Tables 6 and 7, which indicate that there is no improvement; instead, there is some deterioration in performance, which is due to the reason that the FC6 layer defuses the relevant features. A comparison between global, local and their fusion for verification and identification is given in Fig. 6. It further validates that fusion of global and local features does not improve the performance.

Table 6
Verification results of the combination of local and global features

Global feature Local feature Results

EER GMR at FAR = 0.1 GMR at FAR = 0.01 GMR at FAR = 0.001

FC6 ConvSRC 0.81 98.66 97.76 97.23

Global feature	Local feature	Results
FC6	ConvSRC	0.81	98.66	97.76	97.23

Table 7

Identification accuracies (%) of the combination of local and global features

Global feature	Local feature	Size-FV	Accuracy (%)
			KNN(Euc,1)	SA_CRC
FC6	ConvSRC	PCA-500	96.444	97.646

Fig. 6

Comparison of the verification (a) and identification (b) results using different feature extraction methods.

4.3 Tuning the number of principle components in PCA

The proposed approach involves dimension reduction using PCA. In PCA, the critical point is the selection of principal components (PCs) corresponding to the largest eigenvalues. Many criteria have been proposed in the literature in order to find the optimal number of dimensions in PCA. Due to the computational complexity of such techniques, we used a simple method which results in good result. We tried different numbers of components using three light conditions of the left eye of Samsung dataset. Figure 7 gives the detail; as the number of PCs increases EER decreases and it continues to decrease until the number of PCs is 1300, where it attains minimum value for all the three cases. It indicates that this number is a suitable choice for PCs, for onward experiments, we fix the number of PCs to be 1300.

Fig. 7

Equal Error Rate (EER) results of using different PCs on Samsung left eye dataset.

4.4 The level of sparsity

OMP algorithm is used to solve the optimization problem of equation (2) to get the sparse representation. OMP iteratively selects k dictionary vectors to represent the test samples and the sparse representation has at most k non-zero coefficients. To fine-tune the value of the sparsity level k, we performed experiments with different values of k on Samsung left periocular images under office light condition. The verification results and the time complexity are shown in Table 8. This table indicates that as the sparsity level increases (the sparsity decereases), the time required by OMP algorithm increases, while the performance in terms of EER and GMR at FAR = 0.01 decreases; the best value of GMR at FAR = 0.01 is achieved with k = 50. In view of this observation, the best time-performance trade-off is given by k = 50 and we used this value of k in our experiments.

Table 8
Verification results and the average time needed by OMP algorithm using different sparsity level k

Sparsity level (k) EER GMR @ FAR = 0.01 Time in Milliseconds

10 0.493 98.489 0.697

30 0.503 98.662 0.828

50 0.530 98.705 1.185

70 0.544 98.662 1.624

90 0.575 98.533 2.262

110 0.592 98.489 3.116

130 0.649 98.576 4.097

150 0.662 98.576 5.263

Sparsity level (k)	EER	GMR @ FAR = 0.01	Time in Milliseconds
10	0.493	98.489	0.697
30	0.503	98.662	0.828
50	0.530	98.705	1.185
70	0.544	98.662	1.624
90	0.575	98.533	2.262
110	0.592	98.489	3.116
130	0.649	98.576	4.097
150	0.662	98.576	5.263

5 Experimental results and discussion

In the previous section, we presented the discussion about model selection; based on this discussion the model of the proposed method consists of local deep CNN features extracted from Conv5_2 with 1300 PCs and sparsity threshold k = 50; and SA_CRC (ConvSRC). Using the model, this section reports results for verification and identification problems and discuss the performance of the proposed method in comparison with the existing methods.

5.1 Verification problem

First, we report the verification results of the proposed system; Table 9 shows the performance in terms of GMR and Figs. 8, 9 and 10 show the ROC curves for the left and right periocular regions captured by three types of smartphones under different lighting conditions. The results show the outstanding performance of the proposed method; it can be noted from Figs. 8 to 10 that it gives better results for day-light for the three devises and these results are almost similar. The performance is slightly poor for dim-light in case of iPhone, and for office-light in case of Samsung and Oppo. Almost similar results can be noted from Table 9 in terms of GMR.

Table 9
Verification performance (GMR at FAR = 10^–2) on VISOB dataset

Feature type GMR @ FMR = 10^-2 p-value

IPhone Oppo Samsung iPhone Oppo Samsung iPhone Oppo Samsung

Left Right Left Right Left Right Left Right Left Right Left Right Left Right Left Right Left Right

Capture condition: Day light Capture condition: Dim light Capture condition: Office short light

Block BSIF 45.77 42.69 46.22 49.40 47.63 48.56 40.05 35.93 26.62 51.77 44.44 48.56 29.47 30.71 26.62 23.30 24.45 30.27 0.0001070

Block HoG 0.11 0.18 0.35 0.51 0.09 0.16 1.19 1.07 0.49 0.54 0.22 0.16 0.36 0.54 0.49 0.77 0.10 0.51 0.0001068

BSIF 60.11 61.52 54.40 53.45 54.39 62.93 43.74 48.61 28.00 56.76 54.82 62.93 42.14 44.45 28.00 30.68 34.29 39.64 0.0001070

HoG 0.04 0.03 0.19 0.18 0.06 0.13 0.41 0.53 0.32 0.30 0.36 0.13 0.28 0.33 0.32 0.37 0.24 0.34 0.0001064

L PQ 1.65 1.99 2.75 2.65 1.41 9.88 6.70 7.73 2.88 5.15 8.70 9.88 3.53 2.71 2.88 3.95 1.96 1.73 0.0001070

DSparse Filters 93.04 86.26 96.64 97.56 90.22 95.03 89.63 89.47 87.49 87.08 91.06 93.18 88.62 86.62 87.49 80.09 79.72 90.92 0.0001070

MR Filters 92.04 91.34 92.55 92.70 93.14 92.29 92.15 92.92 93.85 93.98 93.38 92.64 88.83 90.08 93.57 92.49 89.94 90.63 0.0001068

Hybrid 95.98 94.82 94.21 94.81 92.44 92.97 96.09 96.14 96.31 96.15 93.12 93.61 93.54 93.89 90.79 90.23 90.45 91.53 0.0001070

Method in[43] 96.36 96.06 97.66 97.25 97.15 98.47 96.69 96.54 97.28 97.07 97.19 98.04 94.97 95.88 91.55 90.75 94.30 94.71 0.0001068

Proposed 99.88 99.84 98.91 99.43 99.46 99.31 99.58 99.65 99.59 99.05 99.59 99.63 99.78 99.60 99.01 99.60 98.71 99.01

Feature type	GMR @ FMR = 10^-2	p-value
Block HoG	0.11	0.18	0.35	0.51	0.09	0.16	1.19	1.07	0.49	0.54	0.22	0.16	0.36	0.54	0.49	0.77	0.10	0.51	0.0001068
BSIF	60.11	61.52	54.40	53.45	54.39	62.93	43.74	48.61	28.00	56.76	54.82	62.93	42.14	44.45	28.00	30.68	34.29	39.64	0.0001070
HoG	0.04	0.03	0.19	0.18	0.06	0.13	0.41	0.53	0.32	0.30	0.36	0.13	0.28	0.33	0.32	0.37	0.24	0.34	0.0001064
L PQ	1.65	1.99	2.75	2.65	1.41	9.88	6.70	7.73	2.88	5.15	8.70	9.88	3.53	2.71	2.88	3.95	1.96	1.73	0.0001070
DSparse Filters	93.04	86.26	96.64	97.56	90.22	95.03	89.63	89.47	87.49	87.08	91.06	93.18	88.62	86.62	87.49	80.09	79.72	90.92	0.0001070
MR Filters	92.04	91.34	92.55	92.70	93.14	92.29	92.15	92.92	93.85	93.98	93.38	92.64	88.83	90.08	93.57	92.49	89.94	90.63	0.0001068
Hybrid	95.98	94.82	94.21	94.81	92.44	92.97	96.09	96.14	96.31	96.15	93.12	93.61	93.54	93.89	90.79	90.23	90.45	91.53	0.0001070
Method in[43]	96.36	96.06	97.66	97.25	97.15	98.47	96.69	96.54	97.28	97.07	97.19	98.04	94.97	95.88	91.55	90.75	94.30	94.71	0.0001068
Proposed	99.88	99.84	98.91	99.43	99.46	99.31	99.58	99.65	99.59	99.05	99.59	99.63	99.78	99.60	99.01	99.60	98.71	99.01

Fig. 8

ROC curve of the proposed method on iPhone data.

Fig. 9

ROC curve of the proposed method on Samsung data.

Fig. 10

ROC curve of the proposed method on Oppo data.

The comparison of the verification performance of ConvSRC with the state-of-the-art periocular verification methods is given in Table 9. It can be observed that ConvSRC performs better than the state-of-the-art methods for the three mobile devices and all light conditions. To check whether the difference in the performance of the proposed method and the state-of-the-art methods is statistically significant, we used the Wilcoxon signed rank test with 95% confidence level with null hypothesis that the performance of the proposed method is equal to that of a state-of-the-art method and alternative hypothesis that the proposed method is better than other methods. It can be noticed from last column of Table 9 that the proposed method is better and the difference is significant at 95% level (p-values<0.05 for methods).

ConvSRC significantly outperforms MR Filters [19] and Deep Sparse Filters [17], which are the first and the second winners, respectively, of ICIP2016 periocular competition on VISOB dataset. MR Filters, Deep Sparse Filters and ConvSRC employ learning based feature extraction methods. These results show that learning based features outperform the hand-engineered features, and among learned features, the features based on Deep CNN (ConvSRC) performs better because of its deeper structure, which captures the hierarchy of features in a better way and composes the discriminative description.

Moreover, we compare the performance of ConvSRC with that of MR Filters (i.e. the first winner) in detail. Figure 11 shows the detailed comparison of the three devices and the three light conditions of VISOB dataset. For fair comparison with MR filters, we reported the results in terms of GMR at FAR = 10^-3 similar to that used for MR Filters. It can be observed that there is a large gap of performance that the proposed method achieved over MR Filters; the reason for better performance is that the deeper architecture of CNN captures the more discriminative description.

Fig. 11

Comparison of the verification performance of the proposed method with MR Filters (1^st winner of ICIP2016 competition)

5.2 Identification results

The VISOB dataset was prepared for verification purpose. To adapt it for identification, we only consider the images of the subjects that are available in both verification and enrolment sets. We observe from Fig. 12 that the proposed method performs well for identification task. Also, we notice that the office light condition is the most challenging set in identification in case of all of the three mobile devices.

Fig. 12

Identification performance of the proposed method on VISOB dataset.

5.3 Effectiveness of CNN features

The question arises whether the performance of the proposed method is due to SA-CRC classification. To this end, we passed pixel values of the original image as features to SA-CRC and compared the results with those obtained using CNN features. For this experiment, we used iPhone left under day light condition dataset. The results have been shown in Table 10; there is a significant difference between the results, especially EER and GMR at FAR = 10^-3. It indicates that CNN features are effective in performance improvement.

Table 10
Effectiveness of CNN

Extraction method Results

EER GMR @ FAR = 0.1 GMR @ FAR = 0.01 GMR @ FAR = 0.001

ConvSRC 0.08 99.96 99.88 97.25

Original image 0.7 99.47 98.69 88.57

Extraction method	Results
ConvSRC	0.08	99.96	99.88	97.25
Original image	0.7	99.47	98.69	88.57

5.4 Effectiveness of SA-CRC

The next question is about the impact of SA-CRC. For this purpose, we compare the results obtained using SA-CRC with those achieved employing KNN and ridge regression. Also, we investigate the effect of fusing the collaborative (CRC) and sparse (SRC) representations in different ways to highlight the effectiveness of the augmentation method used in SA-CRC. The identification results on the Samsung left under office light condition (see Table 11) show that SA-CRC achieved the best result.

Table 11
Effectiveness of SA-CRC

Identification accuracy (%)

KNN Ridge regression SRC*CRC Max(SRC,CRC) SA-CRC

93.48 96.37 96.97 96.20 96.98

Identification accuracy (%)
93.48	96.37	96.97	96.20	96.98

5.5 The effect of the domain of the pre-trained model

We employed pre-trained VGG-16 model, which was trained on face dataset and so it encodes the periocular regions. The question arises whether we can get the similar performance using a pre-trained VGG-16 model, which is trained on a dataset of natural images from different domain. For this, we used VGG16 model that was pre-trained on ImageNet dataset [38]. A comparison of the results obtained with VGG16-Face and VGG16-ImageNet is depicted in Fig. 13; it is surprising to note that the results are almost similar; it indicates that ConvSRC is independent of the pre-trained model domain. For further investigation, we randomly selected 2 subjects and 10 features created with VGG16-Face and VGG16-ImageNet models, and created the boxplots of features generated by the two models corresponding to each subject, see Fig. 14. The boxplots corresponding to subject 1 indicate that the distributions of features generated by VGG16-Face are different and those of features generated by VGG-ImageNet are also different. Similar is the observation about the features of subject 2. It indicates that though the features created by each model do not have similar distributions, they are discriminative in each case. It demonstrates that the domain of the pre-trained model is not important for extracting discriminative features for periocular recognition as long the domain consists of natural images.

Fig. 13

Comparison of the verification performance of ConvSRC using VGG16-Face and VGG16-imagnet pre-trained models.

Fig. 14

Comparison of the features extracted from two subjects with pre-trained (a) VGG16-Face and (b) VGG16-imagnet models.

5.6 Cross-light and cross-device authentication

To further investigate the consistency and generalization ability of ConvSRC, we performed cross-light and cross-device experiments. In cross-device experiments, the gallery and probe sets are captured under different light conditions, whereas in cross-devise experiments, the gallery and probe sets are captured with different devices.

The results of cross-light experiments are shown in Table 12. Overall, the results for native-light (the same light conditions for gallery and probe sets) and cross-light authentication are similar. There is no significant degradation in the results for cross-light cases in terms of GMR@FAR=0.1. However, there is a slight decrease in performance in terms of GMR@FAR=0.001, but it is not due to cross-light, but due to light condition whether bright or dim. When day light (bright light) is used for gallery or probe, it gives better results for both native-light and cross-light authentication and vice versa for office light.

Table 12
The verification results of Same-Light versus Cross-Light on Samsung left

Gallery Probe Results

EER GMR @ FAR = 0.1 GMR @ FAR = 0.01 GMR @ FAR = 0.001

Office Day 0.58 99.56 99.17 98.60

Dim 0.71 99.30 98.79 97.48

Office 0.53 99.53 99.01 97.02

Day Day 0.27 100 99.46 96.08

Dim 0.94 99.13 97.84 88.81

Office 1.01 98.99 97.07 85.75

Dim Day 0.47 99.73 99.33 98.80

Dim 0.4 99.9 99.59 97.76

Office 0.83 99.21 98.47 83.28

Gallery	Probe	Results
Office	Day	0.58	99.56	99.17	98.60
Dim	0.71	99.30	98.79	97.48
Office	0.53	99.53	99.01	97.02
Day	Day	0.27	100	99.46	96.08
Dim	0.94	99.13	97.84	88.81
Office	1.01	98.99	97.07	85.75
Dim	Day	0.47	99.73	99.33	98.80
Dim	0.4	99.9	99.59	97.76
Office	0.83	99.21	98.47	83.28

Table 13 shows the results of cross-device experiments. There is no significant difference between native-device (the same device is used for gallery and probe) and cross-devise authentication in terms of GMR@FAR=0.1. However, there is a degradation in the performance of cross-device cases (Samsung, Oppo), (Samsung, iPhone), (iPhone, Samsung) and (iPhone, Oppo) in terms of GMR@FAR=0.001 i.e. when Samsung and iPhone are used for gallery. It is due to the reason that manufacturing noises of the three devises are different and ConvSRC is not completely insensitive to these noises.

Table 13

The verification results of Same-Device versus Cross-Device on left Day condition

Gallery	Probe	Results
		EER	GMR @ FAR = 0.1	GMR @ FAR = 0.01	GMR @ FAR = 0.001
Oppo	Samsung	0.60	99.40	98.86	97.59
iPhone	0.17	99.91	99.83	97.78
Oppo	0.13	99.74	98.91	98.75
Samsung	Samsung	0.27	100	99.46	96.08
iPhone	0.28	99.91	99.31	88.90
Oppo	0.75	99.37	98.22	88.33
iPhone	Samsung	1.36	98.57	98.08	88.45
iPhone	0.09	99.96	99.88	97.25
Oppo	0.47	99.60	98.85	87.38

The above discussion indicates that CovSRC gives equally good performance for native-light and cross-light as well as native-device and cross-devise verification with relaxed condition on FAR (0.1). However, there is a slight degradation in results for cross-light and cross-device verification when strong condition on FAR (0.001) is imposed.

5.7 Time complexity

To give an idea of the computational time of the proposed method, we computed the average time required for feature extraction and matching. The time in seconds is reported in Table 14, it shows the efficiency of the proposed method, the total time needed for a query is about half second.

Table 14
Average time needed by the proposed method

Time in seconds

Pre-processing CNN features PCA Matching total

0.045 0.437 0.0047 0.0363 0.5663

Time in seconds
0.045	0.437	0.0047	0.0363	0.5663

6 Conclusion

We proposed a deep learning based periocular recognition, which employs a pre-trained CNN model for discriminative feature extraction and Sparsity Augmented Collaborative Representation based Classifier (SA-CRC). Taking into account the wealth of information and sparsity embedded in the activations of the convolutional layers and using principle component analysis, an efficient and robust method has been proposed for feature extraction. We evaluated the performance of the system using convolutional layers and fully connected layers for feature extraction and found that features extracted from convolutional layers are more discriminative and robust than those obtained from FC layers for periocular recognition; the convolutional layers at the last level result in the most discriminative representation. Through extensive experiments, we have shown that pre-trained CNN features can be generalized well to periocular recognition tasks (verification and identification). To determine the impact of domain on pertained CNN model, we examined the performance of the system using two different models VGG-Face (pertained on face data) and VGG-Net (pertained on ImageNet data); the results indicate that the system gives equally good performance when we use a CNN model pre-trained on any related domain i.e. ConvSRC is independent of the domain of the trained CNN model. The use of SA-CRC classifier plays a vital rule in the performance of the proposed method. We compared usefulness of SA-CRC method with the state-of-the-art KNN, ridge regression methods; the results pointed out that SA-CRC stands out in recognition performance. The comparison with the state-of-the-art methods reveals that ConvSRC outperforms significantly even the winner of ICIP2016 completion with GMR of over 99% at FMR = 10^-3.

Although OMP algorithm solves the optimization problem to approximate the sparse representation effectively, it needs the sparsity level which is not known in advance and finding its suitable value empirically is not guaranteed to be the best value for this problem. In future work, to overcome this problem, we will consider alternative approaches which do not need a prior knowledge on the sparsity level to obtain the sparse solution such as sparse Bayesian learning (SBL) [44, 46] and Basis Pursuit (BP) [45].

Conflict of interest statement

There is no conflict of interests.

Footnotes

Acknowledgments

The research was supported under Researchers Supporting Project number (RSP-2019/109) King Saud University, Riyadh, Saudi Arabia.

References

Park

, Jillela

, Ross

and Jain

, Periocular biometrics in the visible spectrum, IEEE Transactions on Information Forensics and Security 6(1) (2011), 96–106.

Rattani

and Derakhshani

, “Ocular biometrics in the visible spectrum: A survey,” Image Vis. Comput. 59 (2017), 1–16.

Nigam

, Vatsa

and Singh

, Ocular biometrics: A survey of modalities and fusion approaches, Information Fusion 26(0) (2015), 1–35.

Alonso-Fernandez

and Bigun

, A survey on periocular biometrics research, Pattern Recognition Letters, 2015.

Dalal

Navneet

and Triggs

Bill

, “Histograms of oriented gradients for human detection,” in IEEE, Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on 1 (2005), 886–893.

Park

, Ross

and Jain

(2009), Periocular biometrics in the visible spectrum: A feasibility study. In: IEEE 3rd Intl Conf. on Biometrics: Theory, Applications, and Systems. 2009, pp. 1–6.

Woodard

D.L.

, Pundlik

S.J.

, Lyle

J.R.

and Miller

(2010), Periocular region appearance cues for biometric identification. In: IEEE Computer Vision and Pattern Recognition Biometric Workshop (CVPRW). 2010, pp. 162–169.

, Cha

, Heyman

J.L.

, Venugopalan

, Abiantun

and Savvides

(2010), Robust local binary pattern feature sets for periocular biometric identification. In: IEEE Intl Conf. on Biometrics: Theory Applications and Systems (BTAS). 2010, pp. 1–8.

Bakshi

, Sa

P.K.

and Majhi

, Phase intensive global pattern for periocular recognition, in Proc. INDICON 2014, pp. 1–5.

10.

Kishore Kumar

and Pavani

, “Periocular Region-Based Biometric Identification Using Local Binary Pattern and Its Variants,” in Information and Communication Technology for Competitive Strategies, 2019, pp. 581–590.

11.

Bakshi

, Sa

P.K.

, Wang

, Barpanda

S.S.

and Majhi

, “Fast periocular authentication in handheld devices with reduced phase intensive local pattern,” Multimed. Tools Appl. 77(14) (2018), 17595–17623.

12.

Alonso-Fernandez

and Bigun

, Best regions for periocular recognition with nir and visible images. In: IEEE Intl. Conf. on Image Processing, 2014, pp. 4987–4991.

13.

Raja

Kiran B

, Raghavendra

Ramachandra

, Vemuri

Vinay Krishna

and Busch

Christoph

, “Smartphone based visible iris recognition using deep sparse filtering,” Pattern Recognition Letters 57 (2015), 33–42.

14.

Kannala

Juho

and Rahtu

Esa

, “Bsif: Binarized statistical image features,” in Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012, pp. 1363–1366.

15.

Ojansivu

Ville

and Heikkilä

Janne

, “Blur insensitive texture classification using local phase quantization,” in Image and signal processing, 2008, pp. 236–243.

16.

Raja

Kiran B

, Raghavendra

Ramachandra

, Stokkenes

Martin

and Busch

Christoph

, “Smartphone authentication system using periocular biometrics,” in Biometrics Special Interest Group (BIOSIG), 2014 International Conference of the, 2014, pp. 1–8.

17.

Raja

K.B.

, Raghavendra

and Busch

, “Collaborative representation of deep sparse filtered features for robust verification of smartphone periocular images,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 330–334.

18.

Nie

, Kumar

and Zhan

, “Periocular recognition using unsupervised convolutional RBM feature learning,” in 2014 22nd International Conference on Pattern Recognition, 2014, pp. 399–404.

19.

Raghavendra

and Busch

, “Learning deeply coupled autoencoders for smartphone based robust periocular verification,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 325–329.

20.

Luz

, Moreira

, Zanlorensi Junior

L.A.

and Menotti

, “Deep periocular representation aiming video surveillance,” Pattern Recognit. Lett. 114 (2018), 2–12.

21.

Kim

M.C.

, Koo

J.H.

, Cho

S.W.

, Baek

N.R.

and Park

K.R.

, “Convolutional neural network-based periocular recognition in surveillance environments,” IEEE Access 6 (2018), 57291–57310.

22.

Zhao

and Kumar

, “Improving periocular recognition by explicit attention to critical regions in deep neural network,” IEEE Trans. Inf. Forensics Secur. 13(12) (2018), 2937–2952.

23.

Santos

and Hoyle

, A fusion approach to unconstrained iris recognition, Pattern Recognition Letters, Noisy Iris Challenge EvaluationII – Recognition of Visible Wavelength Iris Images Captured At-a-distance and On-the-move 33(8) (2012), 984–990.

24.

Padole

and Proenca

, Periocular recognition: Analysis of performance degradation factors. In: 5th IAPR Intl Conf. on Biometrics (ICB). March 2012, pp. 439–445.

25.

Miller

P.E.

, Rawls

A.W.

, Pundlik

S.J.

and Woodard

D.L.

, Personal identification using periocular skin texture. In: ACM Symposium on Applied Computing 2010, pp. 1496–1500.

26.

Bharadwaj

, Bhatt

H.S.

, Vatsa

and Singh

, Periocular biometrics: When iris recognition fails. In: 4th IEEE Intl Conf. on Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth. pp. 1–6.

27.

Akhtar

, Micheloni

and Foresti

G.L.

, Mobile ocular biometrics in visible spectrum using local image descriptors: A preliminary study. In: IEEE Intl Conf. on Image Processing (ICIP) 2016, Challenge Session on Mobile Ocular Biometric Recognition, 2016.

28.

Smereka

J.M.

, Kumar

B.V.K.V.

and Rodriguez

, Selecting discriminative regions for periocular verification. In: IEEE Int. Conf. on Identity, Security and Behavior Analysis, 2016, pp. 1–8.

29.

Dong

and Woodard

D.L.

, Eyebrow shape-based features for biometric recognition and gender classification: A feasibility study. In: 2011 Intl Joint Conf. on Biometrics (IJCB). 2011, pp. 1–8.

30.

Rattani

, Derakhshani

, Saripalle

S.K.

and Gottemukkula

, “ICIP 2016 competition on mobile ocular biometric recognition,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 320–324.

31.

Krizhevsky

, Sutskever

and Hinton

, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural In! Process. Syst. 2012, pp. 1106–1114.

32.

Razavian

A.S.

, Azizpour

, Sullivan

and Carlsson

, “CNN features off-the-shelf: An astounding baseline for recognition,” Proc.Workshop of IEEE Can! Camp.Vis. Patt. Recogn., 2014.

33.

Liu

, Shen

and van den Hengel

, “The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification,” ArXiv14117466 Cs, Nov. 2014.

34.

Agrawal

, Girshick

and Malik

, “Analyzing the performance of multilayer neural networks for object recognition,” in Proc. Eur. Can! Camp. Vis, 2014.

35.

Deng

, Hu

and Guo

, In defense of sparsity based face recognition. In: 2013 IEEE Conference on ComputerVision and Pattern Recognition (CVPR), pp. 399–406, 2013.

36.

Akhtar

Naveed

, Shafiat

Faisal

and Mian

Ajmal

, Efficient Classification with Sparsity Augmented Collaborative Representation, Pattern Recognition, https://dx-doi-org.web.bisu.edu.cn/10.1016/j.patcog.2016.12.017

37.

Deep face recognition, Parkhi

O.M.

, Vedaldi

and Zisserman

, Proceedings of the British Machine Vision Conference (BMVC), 2015.

38.

Simonyan

and Zisserman

, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” ArXiv14091556 Cs, Sep. 2014.

39.

, Xia

G.-S.

, Hu

and Zhang

, “Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery,” Remote Sens. 7(11) (2015), 14680–14707.

40.

Bakshi

, Sa

P.K.

, Wang

, Barpanda

S.S.

and Majhi

, “Fast periocular authentication in handheld devices with reduced phase intensive local pattern”, Multimedia Tools and Applications 77(14) (2018), 17595–17623.

41.

Ahuja

, Islam

, Barbhuiya

F.A.

and Dey

, “Convolutional neural networks for ocular smartphone-based biometrics,”, Pattern Recognit. Lett. 91 (2017), 17–26, 2017.

42.

Wright

, Yang

A.Y.

, Ganesh

, Sastry

S.S.

and Ma

, “Robust Face Recognition via Sparse Representation,” IEEE Trans. Pattern Anal. Mach. Intell 31(2) (2009), 210–227.

43.

Garg

, Baweja

, Ghosh

, Singh

, Vatsa

and Ratha

, “Heterogeneity Aware Deep Embedding for Mobile Periocular Recognition,” 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), Redondo Beach, CA, USA, 2018, pp. 1–7. doi: 10.1109/BTAS.2018.869855.

44.

Chen

and Donoho

, October. Basis pursuit. In Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers 1994, (Vol. 1, pp. 41–44). IEEE.

45.

Tipping

M.E.

, “Sparse Bayesian Learning and the Relevance Vector Machine,” J Mach Learn Res 1 (2001), 211–244.

46.

Shekaramiz , et al., Bayesian compressive sensing of sparse signals with unknown clustering patterns, Entropy 21(3) (2019), 247.

Mobile device	Number of images
	Enrollment set	Validation set
iPhone	14077	13208
Oppo	21976	21349
Samsung	12197	12240

Identification accuracy (%)
KNN	Ridge regression	SRC*CRC	Max(SRC,CRC)	SA-CRC
93.48	96.37	96.97	96.20	96.98

Time in seconds
Pre-processing	CNN features	PCA	Matching	total
0.045	0.437	0.0047	0.0363	0.5663

ConvSRC: SmartPhone-based periocular recognition using deep convolutional neural network and sparsity augmented collaborative representation

Abstract

Keywords

1 Introduction

2.2 Sparsity Augmented Collaborative Representation based Classifier (SA-CRC)

Table 1 Detail of the enrolment and validation sets of VISOB Dataset Mobile device Number of images Enrollment set Validation set iPhone 14077 13208 Oppo 21976 21349 Samsung 12197 12240

4.1 Global vs local deep features

4.1.1 Global deep CNN features

Table 2 Verification results using features extracted from FC6, FC7 and FC8 and SA_CRC Layer Size-FV Results EER GMR at FAR = 0.1 GMR at FAR = 0.01 GMR at FAR = 0.001 FC6 4096 0.88 99.13 98.32 94.42 FC7 4096 1.21 98.66 96.77 85.47 FC8 2622 1.89 97.65 96.1 77.81

Table 4 Verification results of features extracted from conv5_1, conv5_2 and conv5_3 Layer Size-FV Results EER GMR@ FAR = 0.1 GMR@ FAR = 0.01 GMR@ FAR = 0.001 Conv5_3 PCA-980 0.74 99.66 98.79 92.74 Conv5_2 PCA-1300 0.27 100 99.46 96.08 Conv5_1 PCA-876 0.81 99.33 98.59 91.12

Table 6 Verification results of the combination of local and global features Global feature Local feature Results EER GMR at FAR = 0.1 GMR at FAR = 0.01 GMR at FAR = 0.001 FC6 ConvSRC 0.81 98.66 97.76 97.23

5.1 Verification problem

Table 10 Effectiveness of CNN Extraction method Results EER GMR @ FAR = 0.1 GMR @ FAR = 0.01 GMR @ FAR = 0.001 ConvSRC 0.08 99.96 99.88 97.25 Original image 0.7 99.47 98.69 88.57

Table 11 Effectiveness of SA-CRC Identification accuracy (%) KNN Ridge regression SRC*CRC Max(SRC,CRC) SA-CRC 93.48 96.37 96.97 96.20 96.98

Table 14 Average time needed by the proposed method Time in seconds Pre-processing CNN features PCA Matching total 0.045 0.437 0.0047 0.0363 0.5663

Conflict of interest statement

Footnotes

Acknowledgments

References

Table 1
Detail of the enrolment and validation sets of VISOB Dataset

Mobile device Number of images

Enrollment set Validation set

iPhone 14077 13208

Oppo 21976 21349

Samsung 12197 12240

Table 2
Verification results using features extracted from FC6, FC7 and FC8 and SA_CRC

Layer Size-FV Results

EER GMR at FAR = 0.1 GMR at FAR = 0.01 GMR at FAR = 0.001

FC6 4096 0.88 99.13 98.32 94.42

FC7 4096 1.21 98.66 96.77 85.47

FC8 2622 1.89 97.65 96.1 77.81

Table 4
Verification results of features extracted from conv5_1, conv5_2 and conv5_3

Layer Size-FV Results

EER GMR@ FAR = 0.1 GMR@ FAR = 0.01 GMR@ FAR = 0.001

Conv5_3 PCA-980 0.74 99.66 98.79 92.74

Conv5_2 PCA-1300 0.27 100 99.46 96.08

Conv5_1 PCA-876 0.81 99.33 98.59 91.12

Table 6
Verification results of the combination of local and global features

Global feature Local feature Results

EER GMR at FAR = 0.1 GMR at FAR = 0.01 GMR at FAR = 0.001

FC6 ConvSRC 0.81 98.66 97.76 97.23

Table 10
Effectiveness of CNN

Extraction method Results

EER GMR @ FAR = 0.1 GMR @ FAR = 0.01 GMR @ FAR = 0.001

ConvSRC 0.08 99.96 99.88 97.25

Original image 0.7 99.47 98.69 88.57

Table 11
Effectiveness of SA-CRC

Identification accuracy (%)

KNN Ridge regression SRC*CRC Max(SRC,CRC) SA-CRC

93.48 96.37 96.97 96.20 96.98

Table 14
Average time needed by the proposed method

Time in seconds

Pre-processing CNN features PCA Matching total

0.045 0.437 0.0047 0.0363 0.5663