Occluded face recognition using NonCoherent dictionary

Abstract

Today’s world is facing threats from terrorism, for safety concerns system needs to strengthen security. Security is a challenging task and it can be strengthened by technology such as biometric and surveillance cameras. These technologies are deployed everywhere but it is the need of the days a strong automatic face recognition applications so they can be used to recognize the person in an unconstrained environment. In an unconstrained environment, images are affected by occlusion such as a scarf, goggle, random but these variations decrease the performance of face recognition. Also, the accuracy of face recognition depends on the number of labeled samples and variation available in the training dataset. But some applications of face recognition such as passport verification, identification of these applications have fewer training samples without or with very less occlusion hence, it is not enough to solve the issue of unconstrained conditions. This problem has been targeted by many researchers using an occlusion based training dataset where common variation exists in both training and testing datasets. This paper tackles the occlusion issues by designing a NonCoherent dictionary. The proposed dictionary is designed by two steps firstly it extracts the occlusion from the face image and secondly creates NonCoherent samples. The extensive experimentation is done on benchmark face databases and compared the results on state-of-the-art SRC methods by using NonCoherent and normal dictionary also compared the sparse coefficients of each method. The results show the effectiveness of proposed model.

Keywords

Face recognition sparse representation occlusion dictionary

1 Introduction

For the past three decades, face recognition has attracted more attention of researchers in computer vision due as it is the need of the day. Several face recognition based biometric systems and surveillance cameras are deployed in the home and corporate offices for an authentication purpose. In the present-day scenario, where the surveillance cameras are installed in the unconstrained environment, it is difficult to achieve images like in constrained environment. In unconstrained conditions, images are affected by occlusion such as scarf, goggle, and random. The subspace of the same person changes due to occlusion, when an image is acquired through surveillance camera it poses a complex problem in identification of a person.

Researchers have published numerous feature extraction method [1 –4] and classification methods [6 –9] for face recognition and have achieved state-of-the-art results in restricted condition. The existing methods work well in constrained environments and work weakly for unconstrained condition. The performance of face recognition in an unconstrained environment still needs improvement and hence it can be considered as an open issue for research. As the performance of the basic face recognition is highly sensitive to occlusion and needs a large number of training samples with a variation for better performance. Undersampled face recognition has the most important real-time applications where very few training samples of each person are available. Examples are law enforcement, passport identification, and authentication for various purposes. The undersampled methods are divided into three categories: The first category of methods is patched based representation [18], which divides the image into a number of patches. The second category of methods created virtual images of the training samples. The third category of methods [19] uses generic training dataset.

Recently, representation models have gained much attention in face recognition due to its ability to solve the issue of occlusion and noise. The models names as follows: Sparse representation based classification (SRC) [9], Collaborative representation based classification (CRC) [10], and Linear Regression based Classification (LRC) [11]. SRC [9] represents a test sample by a linear selection of dictionary atoms from all classes and select the class which produces minimum residual. The residual of representation model measured by using ℓ¹-norm and ℓ²-norm. Sparse representation has achieved better performance among all representation models. SRC performs worst when each class has less number of samples to represent a test sample. Few researchers tried to resolve this problem. In Extended SRC (ESRC) [12], the author created an intra-variant dictionary by subtracting the centroid image of each class from its respective class samples. other SRC based methods also created intra-variant dictionary such as SSRC [13], SDR [14], S³RC [15]. But these methods fail to work under unconstrained environments like occlusion, expression and illumination variation.

This paper proposes a new NonCoherent dictionary which incorporates samples of occlusion variation. The existing study shows that the researchers have never discussed the issue of different colors and shapes of occlusion, they have considered same color of occlusion. Also they have reconstructed images by eliminating the occlusion, but reconstructed images can be loose discriminative features and results in poisoning effect. The aforementioned issue is tackled by designing a NonCoherent dictionary and dictionay design accomplished in two steps firstly it extracts the occlusion from the face image and secondly creates NonCoherent samples. The objective of this paper is to extract occlusion from a face image and design NonCoherent dictionary which handles different color and shape of occlusion. The substantial experimentation is done on benchmark face databases and compared the results on state-of-the-art SRC methods by using NonCoherent and normal dictionary also compared the sparse coefficients of each method. The experimental results on NonCoherent dictionary have shown better performance compared to the existing dictionary.

The remainder of the paper follows as. In section 2 review the existing method of sparse representation classification. Section 3 discusses Design of NonCoherent dictionary Section 4 gives detail experimental results and analysis. and lastly, section 5 draws the conclusion of the paper.

2 Related work

Initially, SRC [9] was introduced on face recognition by J Wright et al. in 2009. SRC finds the sparsest representation of a test image from the dictionary which is constructed by all the training samples and classifies test image to the class which gives a less residual error. SRC outperformed on corrupted and occlusion images but these types of variation samples must be in training data set. SRC had used training samples as dictionary atoms and its time complexity depends on the size of the dictionary. Let A = [A₁, A₂, . . , A_k] ε $R^{m * n}$ be the set of training samples of all classes or called dictionary. Each dictionary column is referred to as an atom. There is a total n number of atoms in the dictionary. k represents number of classes and let A_i = [A_i,1, A_i,2, . . . . , A_i,N] represents the set of samples that belongs to the i^th class. N is a number of samples in a class. $x ε R^{n}$ is a sparse vector which has very few non zero elements. A test sample y can be linearly represented in terms of all training samples by Equation 1 [9].

${\hat{x}}_{1} = \arg \min_{x} ∥ x ∥_{1} s . t ∥ y - Ax ∥_{2} \leq ɛ$ (1)

SRC needs a number of training samples including a different variant to gain high accuracy. Sparsity constrained ∥x ∥ ₁ and signal fidelity ||y - Ax||₂ are two important terms in Equation 1 and these term may vary the performance of sparse based face recognition. Several extended SRC versions have been developed by modifying sparsity constrained, signal fidelity term, dictionary and fusing new methods into SRC. The methods [16 –18] had been developed by modifying sparse coefficient, [19] by dictionary and [5] by fusing existing methods into SRC. Several extended algorithms have been developed by modifying these two terms. The distribution of residual e = y - Ax may not properly follow Gaussian or Laplacian distribution when they are corrupted and there are other variations in images. Robust sparse coding (RSC) [22] induce new weight to the signal fidelity term ||y - Ax||₂ of Equation 1 and show better performance on variant images. But still, RSC and many SRC based methods of face recognition require a large number of training samples with all required variation. Thus, to apply SRC for an undersampled face recognition new algorithm needs to be developed for the undersampled dictionary-based application. There are two ways to construct an auxiliary dictionary: the first method to create a dictionary by subtracting the centroid of each class from the samples of the same class or external samples [12, 13] and the second way to create a dictionary is by learning algorithm [19].

The fewer training samples can lead to misclassification for a disguised test sample. Authors in Extended SRC (ESRC) method proposed by Deng et al. [12] try to overcome undersampled problems. In the ESRC method, authors have introduced a new extra intraclass variant dictionary, which is created by subtracting the prototype or centroid of a class from the samples of the same class. The linear representation of Equation 1 is modified by introducing an intraclass dictionary G in Equation 2 [12] and this intraclass variant matrix that represents the local expression, environmental illumination, and occlusion of the face image. $\begin{matrix} (\begin{matrix} \hat{x_{1}} \\ \hat{β_{1}} \end{matrix}) = \arg \min_{x} ∥ (\begin{matrix} x \\ β \end{matrix}) ∥_{1} \\ s . t ∥ (\begin{matrix} A & G \end{matrix}) (\begin{matrix} \hat{x} \\ \hat{β} \end{matrix}) - y ∥_{2} \leq ɛ \end{matrix}$ (2)

ESRC method represents a test sample by two dictionary A and G. The only first dictionary A takes part in the classification process of a test sample and the second dictionary is not considered for classification because its atoms are not associated with any class label but it can be used to generate a more stable sparse vector. The extra variant dictionary is constructed from the variation of the samples of the same class but when variation is less in training samples then test sample may be classified as the other class. ESRC needs a number of training samples with wide variation otherwise intraclass variant dictionary may collapse. The aforementioned problem having less variation is solved by the same author with the help of new method called a superposed SRC method [13]. In this method, the first part of the dictionary atoms of each class is replaced by a single prototype or centroid of a class and intraclass variate dictionary constructed from set of generic samples of other dataset. However, this method does not provide a reliable classification due to the single prototype or centroid of each class. The centroid of each class is used to represent and determine the label of a test sample. Iliadis et al. [21] combined sparsity approach of ESRC with least square and proposed a new method, SR + RLS that first finds sparse coefficient x using ESRC and then constructs new small dictionary by selecting the dictionary atoms with respect to indexes of sparse vector which has nonzero value. In this method, the authors used two classification mechanisms to find a correct class instead of using any learning mechanism for a proper prototype of the respective classes. The SR _ RLS method has improved performance over ESRC but failed to resolve the issues of ESRC. Issues of ESRC are resolved by Jiang et al. by proposing the SDR [14] method that represents test sample by a collaboration of class-specific dictionary and the dense combination of the non-class specific dictionary (intraclass variation dictionary). ESRC has taken the variation from same class to represent a test sample, whereas SDR takes the collaborative variation of different classes. In SDR, first-class specific component is extracted from natural images. [12 , 21] methods need to know in advance about test sample variation and require more variation in samples to create an intra-variate dictionary. The SSRC method uses a linear operation to construct a prototype for each class and hence method is not effective to tackle the nonlinear variation issues of face recognition. In case of single labeled sample per person (SLSPP), the existing methods have not shown performance up to the benchmark. Gao et al. [15] proposed S³RC method, which uses the ESRC framework with modification of gallery dictionary. S³RC has used two dictionaries namely: gallery dictionary and intra-variant dictionary. Intra variant dictionary helps to rectify linear variation and the gallery dictionary is used to reduce nonlinear variation. ESRC and SSRC methods did not use any learning mechanism to construct or learn gallery dictionary to resolve the nonlinear issue. S³RC has used probability-based GMM to find a single prototype for each class using labeled and unlabeled samples. The prototype of GMM has learned for each class using semi-supervised EM algorithm. Long-time ago the GMM has been used for face recognition but did not attract much attention in the field of face recognition. The reason is that it works on the distribution of pixel value and the distribution changes with respect to the variation of sample. Hence, may not possible to create a discriminative single prototype for each class.

So far reviewed the existing methods based on sparse representation. Some researchers worked on the detection and elimination of occlusion and it plays a big role to reconstruct the new image. Y. Li et al. proposed a new technique [17] to eliminate occlusion from the face image. The occlusion elimination is carried out in two steps: first occlusion detection and other is image reconstruction. Downsampled SRC has used to detect occlusion and reconstruct the image by linear reconstructive subspace. Then, the reconstructed samples are used for the face recognition process. S. Zhao et al. proposed a DLSR [16] method, which tackles the issue of occlusion, this method divides an image into four blocks, so dictionary can be converted into 4-subdictionary. The unit dictionary is added to each subdictionary to estimate the occlusion pixel and it is detected based on the portion which has a maximum value. In DLSR method, authors do not describe that if the occlusion is available in two blocks then how to reconstruct the image and other one is an estimation of occlusion pixel. D. Lin et al. have proposed a method [20], which detects occlusion of the face by measuring the skin color area ratio (SCAR). This method is not effective because skin color can be changed due to different illumination and light effects. To tackle the problem of occluded face recognition with undersampled data, a better method based on SRC have been proposed which can deal with occlusion of different colors.

3 Methodology

As has already been observed in the review that at the existing SRC based methods need more variation in training samples or external samples to create intra class variant dictionary. In most of the existing methods, researchers have created common intra variant samples for all classes. So, it is shared by all training samples and many individuals having similar features may be misclassified. To resolve this problem, a NonCoherent dictionary is created. The proposed approached is depicted in Fig. 3. The proposed approached is categorized into three parts: 1. Extraction of occlusion 2. Design of NonCoherent dictionary and 3. Recognition.

3.1 Extraction of occlusion

The face is a complex image where detection and extraction of occlusion is difficult task. The Performance of face recognition could not be affected by 5% of occlusion but more occlusion can reduce the performance. The occlusion can be handled in two ways: first the image is reconstructed by removing occlusion and second by adding occlusion variation in training samples. In this paper, second approach has been used to tackle the occlusion issue. The face is mostly occluded by goggle and scarf but sometimes it might be occluded by random occlusion.

3.1.1 Occlusion due to scarf and goggle

Goggle and scarf almost occupied 20% and 40% space of face respectively. Mouth and eyes of face image are detected by the voila Jones algorithm [23] and identify whether the test image is occluded or not. Existing work [26 –29] have considered same color of occlusion but it may not be the same in a practical scenario, hence occlusion color is detected by the frequency of pixel. The process of extraction of goggle from face image as shown in Fig. 1 and it is extracted using Algorithm 3.1.3. The goggle reflects the intensity of source light, hence some part is missing while the extraction of occlusion as shown in the third row of Fig. 1. The morphological operations have been used to fill the missing and remove unwanted parts using Equation 3, 4, and 5. $I \oplus S = {z | (\hat{S})_{z} \cap I \neq \emptyset}$ (3) $I ⊖ S = {z | (S)_{z} \cap I \subseteq \emptyset}$ (4) $I • S = (I \oplus S) ⊖ S$ (5) where I is an any image, S is a strusturing element and z is a an elements of an image A and it make relationship with S.

Fig. 1

Extraction of goggle from the face image.

3.1.2 Random occlusion

The random occlusion is occurred due to different obstacle and it is difficult to find out from the face image. From the previous work [26 –29], it is observed that the authors did not consider the shape and color of the occlusion but in a practical scenario, it might be possible. Generally, a face has only two colors: hair and skin color, so it is possible to form two clusters. If random occlusion is available in an image, then it can be formed 3 clusters as shown in Fig. 2.

Fig. 2

Cluster1 a. Input Image b. Clusters c. Object in cluster 1 d. Object in cluster 2 e. Object in cluster 3.

Figure 2 have shown how to extract occlusion from the face image. For random occlusion, input images are taken with different colors and shapes. Then apply Algorithm 3.1.3 to extract occluded segment or component from the face image as shown in Fig. 4.d.

Fig. 3

Model to design a NonCoherent dictionary.

Fig. 4

AR face dataset samples of one individual a. Neutral face b. Illumination c. Goggle occlusion d. Scarf occlusion.

3.1.3 Occlusion extraction algorithm

The extract the occlusion based on pixel values. Initially apply kmean clustering on image to create different clusters. $dist (p, q) = \sqrt{(p - q)^{2}}$ (6) ${mask}_{i ε k} = \arg min_{i ε k} dist (c_{i}, x_{j})_{j ε d}$ (7)

Let c₁ . . . c_k initial cluster centers and each pixel of image assign to its closest cluster c_i. k is a number of clusters needed to be formed. mask_i holds the pixels of cluster i and assigned unique label to all pixels using Equation 15. d is a number of pixels in an image. For each cluster c_i, update its center by estimating the mean of all the points x_j that have been assigned to it. $\underset{1 \leq i \leq k}{{mask}_{i}} = {\begin{matrix} 1 & x_{j} > 0 \\ . \\ . \\ k & x_{j} > 0 \end{matrix}$ (8) where, mask_i is a i^th cluster of image, where all the nonzero pixels are labeled by i value, x_j is a pixel value of cluster mask_i.

$\begin{matrix} \underset{1 \leq i \leq k}{{mask}_{i}} (x \pm j, y \pm j)_{- 1 \leq j \leq 1} \\ = {\begin{matrix} setflag, x = x_{1} andy = y_{1}; \\ if ({mask}_{i} (x, y) = = i) \\ Count_Component + +; \\ if ({mask}_{i} (x, y) = = 0) \end{matrix} \end{matrix}$ (9) where, (x, y) is coordinate of a current pixel of mask_i cluster and (x₁, y₁) is a coordinate of next pixel of component. A flag variable indicates the pixel is already used and it could not be considerd for other component. Count _ Component counts the number of components in each cluster. Cluster have a number of component and each component has different numbers of pixel. The component which has less than 5% pixels of the image can be rejected using Equation 14. ${mask}_{i} = {\begin{matrix} 0 & Count_Pixel_Component \leq 5 %, \\ Count_Component - - \\ 1 & otherwise \end{matrix}$ (10) where, Count _ Pixel _ Component counts the number of pixels available in component and 5% indicates the percentages of pixels from the input image.

Algorithm 1:Extraction of occlusion from face image.
1:	Intialize number of clusters and its centroid by randomly or with the help frequency count.
2:	Apply clustering technique using Equation 11.
3:	Assign unique labeled to each cluster using Equation 15.
4:	Each cluster may have number of components, remove the components which have less than 5% of pixels by using 14.
5:	Find and count the connected component of each cluster using Equation 16.
6:	if Count _ Component = = 1 & & Count _ Pixel _ Component≤ 30 % then
	Return cluster
7:	else
	Repeat steps (1)-(4) for each segment.
8:	end if
9:	Extracted occlusion helps to create NonCoherent samples.

where, Count _ Component be the number of segments present in each cluster. Count _ Pixel _Component is the number of pixels avilable in segment.

3.2 Design NonCoherent dictionary

Let A = [A₁, A₂, . . . . , A_k] is a dictionary which contain all samples of training data set. k is a number of a person or classes in the training data set. In this section, Design a NonCoherent dictionary from NonCoherent samples which consist of goggle, scarf and random occlusion.

3.2.1 Design NonCoherent samples for each class

let A_i = [A_i,1, A_i,2, . . . . , A_i,N] represent the set of samples that are belongs to the i^th class of training data set. Each class has N number of samples. Created different NonCoherent samples from each sample of all classes using Alogithm 3.1.3. Let $D_{i, j} = [A_{i, j}^{1}, A_{i, j}^{2}, . . . ., A_{i, j}^{p}]$ where, D_i,j is a j^th sample of i^th class. p is a number of NonCoherent samples created from A_i,j training samples where it creates scarf, goggle and random occluded samples. Each class has N number of samples, so (N * p + N) NonCoherent samples are created. As a number of samples increases, it also increases the size of dictionary and hence time complexity also increases. So, here we select 2N number of NonCoherent samples from (N * p + N) samples by selecting most similar samples to test image y as discussed in 3.2.2.

3.2.2 Measure similarity between test image and NonCoherent sample

Need to represent compact and discriminative dictionary, so by creating NonCoherent samples, it increases the size of the dictionary. Here, the main intention is to reduce the size of the dictionary, hence, the most similar 2N samples selected from the (N * p + N) NonCoherent samples. Euclidean distance has been used to measure the similarity between test image y and NonCoherent samples of a class. Let ${DNC}_{i} = [A_{i, 1}^{1}, A_{i, 2}^{2}, . . . ., A_{i, N}^{N}]$ where DNC_i is a NonCoherent dictionary for i^th class. ${DNC}_{i} = \min | | y - A_{i, j}^{p} | |_{2} s . t . size ({DNC}_{i}, 2) < N$ (11)

3.2.3 Create NonCoherent dictionary for training data set

The NonCoherent dictionary is created by combining 2N most similar samples from all classes, which is estimated by Equation 11. Let D_ncoh = [DNC₁, DNC₂, . . . . , DNC_N] be the NonCoherent dictionary. The linear system equation can be optimized by Equation 12. $\hat{x} = \arg \min_{x} ∥ x ∥_{1} s . t ∥ y - {xD}_{ncoh} ∥_{2}$ (12) where, x = [0, 0, 0, α_i,1, α_i,2, . . . . , α_i,n, 0, 0, 0, 0] ^T is sparse vector which has very few nonzero elements in a vector.

3.3 Recognition

A test sample is assigned to the class that minimizes the residual error. Sparse representation assumes that the samples from the same class lie on the same subspace. Hence, test sample y classified based on minimal residual or the class which has more number of nonzero entries in sparse coefficient vector $x ε R^{n}$ , the residual is computed by Equation 13. $\min r_{i} (y) = ∥ y - D_{ncoh} δ_{i} (\hat{x}) ∥_{2}$ (13)

Where, r_i (y) is a residual of test sample y with respect to the i^th class, $δ_{i} (\hat{x})$ sparse coefficient of i^th class.

4 Experimental results

This section, provides the experimental results on benchmark face database AR [24], LFW [25] to demonstrate the performance of the NonCoherent dictionary. For the experimentation consider two databases because they have more variation for occlusion than the other databases. The results are compared using NonCoherent and normal dictionary on state-of-the-art SRC methods such as SRC, ESRC, SSRC,SR _ RLS, and SDR. The aforementioned methods have shown good results on AR and LFW databases because they have considered half of the images for training and remaining for testing as well as test and train dataset share a common variation. This experimentation, we have considered neutral images for training and occluded images for testing, and both the dataset does not share common variation Hence the experimental results of these methods are showing less compared to their published results. Validation is done using NonCoherent dictionary by testing the experiments on samples with scarf, goggle, mixed scarf+goggle, and random occlusion with different shape and color. The image resolution 50 × 50 is considered for experimentation.

Section 4.1, introduces the publically available face database; Sections 4.2, 4.3, and 4.4 explain the experimental result of a scarf, goggle, and random occlusion. Section 4.5, analyzes the experimental results.

4.1 Face database

4.1.1 AR database

AR [24] database contains color 4000 images of 126 individuals having 26 samples of each individual. These images are captured with a variety of illumination, occlusion, and facial expression. The samples of an individual have been shown in Fig. 4.

4.1.2 LFW

The LFW [25] face database images that are taken from the web of 1680 individuals. These images are varying in illumination, pose, expression and different background. LFW database is a benchmark for face recognition applications. The samples of an individual have been shown in Fig. 5.

Fig. 5

LFW face dataset samples of one individual.

4.2 Face recognition with scarf occlusion

Total 100 individuals of AR database, 1 individual has 12 samples, select randomly 2,3,4,5, and 6 neutral samples per individual for training and 6 samples occluded with scarf samples per individual for testing. The performance of the scarf occlusion samples have tested and results are depicted in Table 2.

Table 1
Experimental results on AR database with scarf occlusion

Tested by Normal dictionary NonCoherent dictionary

2 3 4 5 6 2 3 4 5 6

SRC 13.0 15.7 15.1 15.3 16.8 25.3 32.3 33.6 35.2 37.3

ESRC 14.2 14.8 14.7 16.2 18.8 32.6 35.6 37.3 40.7 41.3

SSRC 14.7 14.5 15.0 17.3 19.7 33.3 35.9 37.9 41.2 42.5

SR_RLS 10.6 15.1 18.0 17.0 17.6 23.7 30.3 34.4 36.5 37.7

SDR_SLR 28.0 33.8 34.4 38.1 32.6 40.3 45.3 47.5 53.7 56.3

Tested by	Normal dictionary	NonCoherent dictionary
SRC	13.0	15.7	15.1	15.3	16.8	25.3	32.3	33.6	35.2	37.3
ESRC	14.2	14.8	14.7	16.2	18.8	32.6	35.6	37.3	40.7	41.3
SSRC	14.7	14.5	15.0	17.3	19.7	33.3	35.9	37.9	41.2	42.5
SR_RLS	10.6	15.1	18.0	17.0	17.6	23.7	30.3	34.4	36.5	37.7
SDR_SLR	28.0	33.8	34.4	38.1	32.6	40.3	45.3	47.5	53.7	56.3

Table 2

Experimental results on AR database with goggle occlusion

Tested by	Normal dictionary					NonCoherent dictionary
	2	3	4	5	6	2	3	4	5	6
SRC	25.0	23.9	25.6	26.4	25.8	33.7	37.4	39.4	42.5	45.3
ESRC	22.5	21.4	22.0	22.2	22.8	37.5	41.3	45.5	47.3	54.3
SSRC	22.2	21.2	22.3	21.2	20.4	37.7	43.4	46.7	48.4	55.6
SR_RLS	22.0	19.8	18.9	19.3	19.7	31.5	35.7	41.5	43.8	46.3
SDR_SLR	16.5	18.4	19.0	18.1	18.9	41.5	43.6	47.4	51.5	54.8

The Scarf occludes almost 40% of the face part, it is difficult to recognize on the basis of remaining 60% pixels. SRC based methods required similar occlusion samples in dictionary, hence its results getting reduces. A NonCoherent dictionary provides all variations that are required for better performance of SRC methods.

4.3 Face recognition with sunglass occlusion

Total 100 individuals of AR database, 1 individual has 12 samples, select randomly 2,3,4,5, and 6 neutral samples per individual for training and 6 samples occluded with sunglass per individual for testing. The performance of the scarf occlusion samples have tested and results are depicted in Table 3.

Table 3
Experimental results on AR database with scarf and goggle occlusion

Tested by Normal dictionary NonCoherent dictionary

2 3 4 5 6 2 3 4 5 6

2 3 4 5 6 2 3 4 5 6

SRC 16.9 19.1 20.1 20.7 21.4 29.5 34.9 36.5 38.9 41.3

ESRC 17.6 18.3 19.7 19.9 20.8 35.1 38.5 41.4 44.0 47.8

SSRC 17.3 17.8 18.8 19.4 19.7 35.5 39.6 42.3 44.8 49.1

SR_RLS 16.7 17.3 18.5 19.0 19.9 27.6 33.0 38.0 40.2 42.0

SDR_SLR 21.5 25.1 26.4 26.4 26.3 40.9 44.5 47.5 52.6 55.6

Tested by	Normal dictionary	NonCoherent dictionary
2	3	4	5	6	2	3	4	5	6
SRC	16.9	19.1	20.1	20.7	21.4	29.5	34.9	36.5	38.9	41.3
ESRC	17.6	18.3	19.7	19.9	20.8	35.1	38.5	41.4	44.0	47.8
SSRC	17.3	17.8	18.8	19.4	19.7	35.5	39.6	42.3	44.8	49.1
SR_RLS	16.7	17.3	18.5	19.0	19.9	27.6	33.0	38.0	40.2	42.0
SDR_SLR	21.5	25.1	26.4	26.4	26.3	40.9	44.5	47.5	52.6	55.6

The Sunglass occludes almost 20% of the face part, and it changes the variation of whole pixels, hence the results of SRC based methods reduces. A NonCoherent dictionary provides all variations that are required for better performance of SRC methods.

4.4 Face recognition with sunglass and scarf occlusion

Total 100 individuals of AR database, 1 individual has 12 samples, Select randomly 2,3,4,5, and 6 neutral samples per individual for training and 12 samples occluded with sunglass and goggle per individual for testing. The performance of the scarf occlusion samples have tested and results are depicted in Table 4.

Table 4
Experimental results on AR dataset with random occlusion

Tested by Normal dictionary NonCoherent dictionary

2 3 4 5 6 7 2 3 4 5 6 7

SRC 51.9 58.2 63.9 67.1 70.1 72.1 57.0 62.2 65.7 70 73.6 75.6

ESRC 54.5 61.6 67.9 71.2 75.5 76.7 62.0 67.8 75.2 80.6 84.9 86.1

SSRC 54.5 62.2 67.8 70.5 74.5 76.3 62.5 67.7 75.8 80.8 84.8 86.3

SRRLS 51.3 58.5 64.5 66.8 71.7 74.4 61.2 66.1 74.2 79.6 84.5 87

SDRSLR 59 68.4 71.6 70.2 73.2 78.3 71.5 74.7 83.1 88.2 92.5 93.1

Tested by	Normal dictionary	NonCoherent dictionary
SRC	51.9	58.2	63.9	67.1	70.1	72.1	57.0	62.2	65.7	70	73.6	75.6
ESRC	54.5	61.6	67.9	71.2	75.5	76.7	62.0	67.8	75.2	80.6	84.9	86.1
SSRC	54.5	62.2	67.8	70.5	74.5	76.3	62.5	67.7	75.8	80.8	84.8	86.3
SRRLS	51.3	58.5	64.5	66.8	71.7	74.4	61.2	66.1	74.2	79.6	84.5	87
SDRSLR	59	68.4	71.6	70.2	73.2	78.3	71.5	74.7	83.1	88.2	92.5	93.1

The occlusion changes the discriminative feature of an image. The existing work of SRC methods concluded that as a number of training samples increases respectively performance also increases But in case of occlusion it seems stable and does not affect by a number of training samples. The proposed dictionary provides all variations that are required for better performance of SRC methods and the results also increase as training samples increases.

4.5 Face recognition by random occlusion with different shape and color

Total 100 individuals of AR database, 1 individual has 14 samples, Select randomly 2,3,4,5,6, and 7 neutral samples per individual for training and 7 samples for testing and it is occluded manually with different colors and shapes. The performance of random occlusion samples have tested and results are depicted in Table 5.

Table 5
Experimental results on LFW database with random occlusion

Tested by Normal dictionary NonCoherent dictionary

2 3 4 5 6 7 2 3 4 5 6 7

SRC 18.7 23.3 28.0 31.9 36.9 37.6 25.6 30.2 34.6 41.6 46.7 54.6

ESRC 20.9 27.1 31.8 36.2 42.3 44.5 26.3 32.4 37.6 45.3 51.4 59.7

SSRC 21.2 28.3 32.0 38.6 43.7 42.7 27.0 31.3 38.7 46.7 53.3 60.3

SR _ RLS 18.3 23.3 28.6 31.7 34.4 36.1 23.3 27.7 32.7 43.2 44.3 50.3

SDR _ SLR 27.6 35.0 41.1 42.4 46.8 48.4 32.9 42.3 48.7 56.7 63.4 67.8

Tested by	Normal dictionary	NonCoherent dictionary
SRC	18.7	23.3	28.0	31.9	36.9	37.6	25.6	30.2	34.6	41.6	46.7	54.6
ESRC	20.9	27.1	31.8	36.2	42.3	44.5	26.3	32.4	37.6	45.3	51.4	59.7
SSRC	21.2	28.3	32.0	38.6	43.7	42.7	27.0	31.3	38.7	46.7	53.3	60.3
SR _ RLS	18.3	23.3	28.6	31.7	34.4	36.1	23.3	27.7	32.7	43.2	44.3	50.3
SDR _ SLR	27.6	35.0	41.1	42.4	46.8	48.4	32.9	42.3	48.7	56.7	63.4	67.8

LFW database

Total 40 individuals of LFW database, 1 individual has 9 samples, Select randomly 2,3,4,5,6, and 7 neutral samples per individual for training and remaining samples for testing and it is occluded manually with different colors and shapes. The performance of random occlusion samples have tested and results are depicted in Table 6

The occlusion of the test image increases misclassification due to unaware of its position, color, and shape. Test images are occluded manually with different colors and shapes of 5% –30% as shown in Fig. 6. The occlusion is detected by Algorithm 3.1.3. The results of the NonCoherent dictionary is impressive and shown in Tables 5 and 6.

Fig. 6

Random occlusion.

4.5.1 Result analysis

As a percent of occlusion increases in test images then SRC methods decrease the performance. Less than 5% of occlusion can not affect the performance of face recognition.

NonCoherent dictionary is more effective for random and contiguous occlusion and Algorithm 3.1.3 in Section 3.1.3 has successfully identified the occlusion with shape and color.

The comparison of the sparse vector of SRC method on random occluded sample is illustrated in Fig. 7. SRC classifies the correct class using NonCoherent dictionary whereas misclassified using a normal dictionary. It is found that the sparse vector of predicted class is high and has very few nonzero coefficients using NonCoherent dictionary. Square box in Fig. 7 indicates the sparse value that is used to predict the class.

The comparison of sparse vector of ESRC and SSRC are shown in Figs. 8 and 9 respectively. These methods classified a test sample correctly using both the dictionaries. In the case of the normal dictionary, a sparse vector is dense in the region of the intra-variant whereas sparse vector is very sparse in case NonCoherent Dictionary, it means variation of NonCoherent samples has more similarity. The coefficient values are high in the predicted class by using a NonCoherent dictionary.

SR _ RLS method estimates the coefficient on a newly constructed face dictionary, which is created with the help of nonzero sparse coefficient of ESRC. A NonCoherent dictionary classifies the correct class whereas normal dictionary misclassify. But in this method coefficient is spread across all classes using proposed dictionary. The comparison of sparse coefficient of this method on random occluded sample is illustrated in Fig. 10 and this figure concludes that NonCoherent dictionary have more sparsity.

Similarly, SDR also predicts the correct class using a NonCoherent dictionary. NonCoherent dictionary sparse coefficient of correct class is higher than other incorrect classes whereas it is vice-versa in the case of normal dictionary. The comparison of sparse coefficient of this method on random occluded sample is illustrated in Fig. 10. From figure we can conclude that NonCoherent dictionary is showing proper result.

Fig. 7

Comparison of sparse coeffient using normal dictionary and NonCoherent dictionary on SRC.

Fig. 8

Comparison of sparse coeffient using normal dictionary and NonCoherent dictionary on ESRC.

Fig. 9

Comparison of sparse coeffient using normal dictionary and NonCoherent dictionary on SSRC.

Fig. 10

Comparison of sparse coeffient using normal dictionary and NonCoherent dictionary on SR _ RLS.

Fig. 11

Comparison of sparse coeffient using normal dictionary and NonCoherent dictionary on SDR.

5 Conclusion

This paper proposed an algorithm for occlusion extraction and design a NonCoherent dictionary which incorporates different occlusion variation. This dictionary improves the performance of occluded face recognition. The dictionary is designed in two steps: first step is to detect and extract the occlusion from the test image using proposed an occlusion extraction algorithm. The second step is to create NonCoherent samples that help to design NonCoherent dictionary. The performance of the proposed dictionary has tested on different occlusion databases and shown remarkable improvement over existing dictionary. The results of state-of-the-art SRC methods outperformed using NonCoherent dictionary and also compared the sparse coefficient of occluded sample. In case of NonCoherent dictionary, sparse coefficient value is higher and predicted a correct class. In future try to extract occlusion which is similar to skin or hair color and reduced time complexity.

Funding

This study was funded by the Ministry of Electronics and Information Technology (India) (Grant No.: MLA/MUM/GA/10(37)B).

Conflict of interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Footnotes

Acknowledgment

The work was supported by Visvesvaraya PhD scheme, Govt of India.

References

Ahonen

, Hadid

and Pietikäinen

, Face Description with Local Binary Patterns: Application to Face Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence28 (2006), 2037–2041.

Bartlett

M.S.

, Movellan

J.R.

and Sejnowski

T.J.

, Face recognition by independent component analysis, IEEE Transactions on Neural Networks13 (2002), 1450–1464.

, Yan

, Hu

, Niyogi

and Zhang

H.-J.

, Face recognition using Laplacianfaces, IEEE Transactions on Pattern Analysis and Machine Intelligence27 (2005), 328–340.

Sharma

, Yadav

R.N.

and Arya

K.V.

, Pose-invariant face recognition using curvelet neural network, IET Biometrics3 (2014), 128–138.

Yang

and Zhang

, Gabor Feature Based Sparse Representation for Face Recognition with Gabor Occlusion Dictionary. In: Daniilidis K., Maragos P., Paragios N. (eds) Computer Vision âĂŞ ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6316. Springer, Berlin, Heidelberg (2010).

Phillips and Jonathon

, Support Vector Machines Applied to Face Recognition, Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems II (1999), 803–809.

Cover

and Hart

, Nearest neighbor pattern classification, IEEE Transactions on Information Theory13 (1967), 21–27.

S.Z.

and Lu

, Face recognition using the nearest feature line method, IEEE Transactions on Neural Networks10 (1999), 439–443.

Wright

, Yang

A.Y.

, Ganesh

, Sastry

S.S.

and Ma

, Robust Face Recognition via Sparse Representation, IEEE Transactions on Pattern Analysis and Machine Intelligence31 (2009), 210–227.

10.

Zhang

, Yang

and Feng

, Sparse representation or collaborative representation: Which helps face recognition?2011 International Conference on Computer Vision, Barcelona, (2011), pp. 471–478.

11.

Naseem

, Togneri

and Bennamoun

, Linear Regression for Face Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence32 (2010), 2106–2112. 10.1109/TPAMI.2010.128.

12.

Deng

, Hu

and Guo

, Extended SRC: Undersampled Face Recognition via Intraclass Variant Dictionary, IEEE Transactions on Pattern Analysis and Machine Intelligence34 (2012), 1864–1870.

13.

Deng

, Guo

and Hu

, In Defense of Sparsity Based Face Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition (2013), 399–406.

14.

Jiang

and Lai

, Sparse and Dense Hybrid Representation via Dictionary Decomposition for Face Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence37 (2015), 1067–1079.

15.

Gao

, Yuille

A.L.

and Ma

, Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples, IEEE Transactions on Image Processing26 (2017), 2545–2560.

16.

Zhao

and Hu

Z-p.

, Double layers sparse representation for occluded face recognition, Optik126(21) (2015), 3016–3019, ISSN 0030-4026, https://doi.org/10.1016/j.ijleo.2015.07.060.

17.

and Feng

, Reconstruction based face occlusion elimination for recognition, Neurocomputing101 (2013), 68–72, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2012.04.031.

18.

Zhu

, Hu

, Shiu

S.C.K.

and Zhang

, Multi-scale Patch Based Collaborative Representation for Face Recognition with Margin Distribution Optimization. In: Fitzgibbon A., Lazebnik S., Perona P., Sato Y., Schmid C. (eds) Computer Vision âĂŞ ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7572. Springer, Berlin, Heidelberg.

19.

Yang

, Van

and Zhang

, Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person, 2013 IEEE International Conference on Computer Vision, Sydney, NSW, (2013), 689–696. doi: 10.1109/ICCV.2013.91

20.

Lin

and Liu

, âĂ IJFace occlusion detection for automated teller machine surveillance, âĂİ Lecture Notes in Computer Science4319 (2006), 641–651.

21.

Iliadis

, Spinoulas

, Berahas

A.S.

, Wang

and Katsaggelos

A.K.

, Sparse representation and least squares-based classification in face recognition, 2014 22nd European Signal Processing Conference (EUSIPCO), (2014), 526–530.

22.

Yang

, Zhang

, Yang

and Zhang

, Robust sparse coding for face recognition, CVPR 2011, Colorado Springs, CO, USA, (2011), pp. 625–632.

23.

Viola

and Jones

M.J.

, Robust Real-Time Face Detection, Int J Comput Vision57(2) (2004), 137–154. DOI: https://doi.org/10.1023/B:VISI.0000013087.49260.fb

24.

Martinez

A.M.

and Benavente

, The AR Face Database, CVC Technical Report, (1998).

25.

Huang

G.B.

, Ramesh

, Berg

and Learned-Miller

, Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, University of Massachusetts, Amherst, (2007), 7–49.

26.

Sharma

, Prakash

and Gupta

, An efficient partial occluded face recognition system, Neurocomputing116 (2013), 231–241, ISSN 0925-2312.

27.

, You

, Tao

, Zhang

, Tang

and Zhu

, Robust face recognition via occlusion dictionary learning, Pattern Recognition47(4) (2014), 1559–1572, ISSN 0031-3203.

28.

, Luan

, Gou

, Zhou

, Xiao

, Xiong

and Zeng

, Robust discriminative nonnegative dictionary learning for occluded face recognition, Pattern Recognition Letters107 (2018), 41–49, ISSN 0167-8655.

29.

Zhao

and Hu

Z-p.

, A modular weighted sparse representation based on Fisher discriminant and sparse residual for face recognition with occlusion, Information Processing Letters115(9) (2015), 677–683, ISSN 0020-0190.