Hybrid feature descriptors to detect face spoof attacks

Abstract

Face Recognition is widely used applications such as of mobile phone unlocking, credit card authentication and person authentication in airports. The face biometric authentication system can be easily spoofed by printed photograph, replay video of the legitimate user and 3D face mask. This paper proposes hybrid feature descriptors to detect the face spoofing attack (printed photograph and replay video attacks). The proposed method extracts three different feature descriptors such as Color moment, Haralick texture and Color Local Binary Pattern (CLBP) feature descriptors. The extracted features are concatenated and classified by Logistic Regression. The performance of the proposed method is evaluated on the Michigan State University Mobile Face Spoofing Database (MSU-MFSD) dataset and found to achieve better results than state-of-the-art methods.

Keywords

Face recognition spoof detection color moment Local Binary Pattern (LBP)Haralick texture

1 INTRODUCTION

Automatic Face Recognition System (FRS) is utilized as a user authentication system in many access control applications such as mobile phone unlocking, smart attendance, surveillance etc. Unlike other biometric systems, FRS does not require any additional sensors; all it needs is a quality frontal camera which can capture the face image. The frontal camera based 2D face recognition system have to address various challenges such as occlusion, facial expression, illumination and variation in pose. Though there are several methods to address these issues by many researchers, still the face spoofing attack is a major vulnerability in face biometric system. The issue is formally addressed by the Security and Vulnerability Research Team at Blackhat [1], a well known security conference. Here, researchers demonstrated how a face of a legitimate user can be spoofed by the fake facial photographs of the same user. According to Boulkenafet et al. [1] 39% of images available in social media can be used for print and replay attacks. Hence, a novel scheme needs to be developed to overcome the spoofing attack.

Face spoofing is a bluffing attack where the attacker uses the fake facial information such as images or videos of legitimate user to spoof the security system. For example, printed photographs, videos, masks, screenshots, images shown in mobile gadgets can be used for face spoofing. In general, spoof attacks against a face recognition system consists of i) Printed image attack ii) Replay video attacks iii) Mask attacks. The print image attack is a 2D face spoof attack performed using printed photographs. Replay video attacks also belong to same category where it can be launched using a photograph or video of the target subject using a smart phone. Mask attacks are highly sophisticated and requires 3D fabricated face mask (target face).

A typical counter measure to overcome the spoofing attacks is face liveness detection, which aims at detecting face signs such as eye blinks, emotions, mouth movements, etc. Detecting eye blinks is one of the important key characters in face liveness detection. Many methods have been reported on the automatic detection of eye blink from video frames. In general, Viola Jones [2] operator is applied to detect faces and eye landmarks,followed by adaptive thresholding to estimate the optical flow in the eye area. Finally, the motion in the eye is estimated based on a correlation matching template (open and closed eye).

In this paper, a novel approach for face spoof detection is proposed by concatenating three different feature descriptors namely, Haralick texture, Color moment and Color Local Binary Pattern (CLBP). To differentiate the color distribution between genuine and spoof face color moments feature descriptors are extracted for HSV color model, Feature histogram are extracted for LBP of Cb and Cr channels to distinguish the real and printed photograph. Finally, Haralick features are extracted for gray scale image in order to perform spoof detection in different color models.

The paper is organized as follows: Section 2 reviews the state of the art methodologies. Section 3 describes the detailed view about the proposed hybrid face spoof detection technique. Section 4 deal with Results and Discussion. Finally, Section 5 concludes the paper.

2 Related works

Face recognition is the second largest biometric system deployed throughout the globe [3]. The face is also one among the most experimented biometric system like fingerprint where most spoofing related research is carried out. In this section the state of the art methods proposed recently in the field of face anti-spoofing are presented. Figure 1 presents the taxonomy of face liveness detection.

Fig.1

Taxonomy of Face Liveness detection.

A.K Singh et al. [4] proposed a face recognition system with liveness detection using eye and mouth movement. The methodology is divided into two parts, namely liveness detection and face recognition. First, it will check for liveness of the user, and if the person is live, then the system will recognize the person. The system fails to detect liveness of the person when the attackers use replay video of the legitimate user.

A. Agarwal et al. [5] proposed face Anti-spoofing using Haralick texture features. The cropped face region is split into the individual RGB channel and it is divided into non-overlapping blocks. For each block, Redundant Discrete Wavelet Transform (RDWT) is performed to obtain four sub-bands such as approximation (Ha), horizontal (Hh), vertical (Hv), and diagonal (Hd). Haralick texture features are extracted for four sub bands of all the blocks. Haralick features are also computed for the original cropped face image. The final feature vector is formed by concatenating the original Haralick features and Haralick features for four sub bands of all the blocks. The concatenated features are classified by Support Vector Machine (SVM).

K. Patel et al. [6] described face spoof detection scheme for smart phones. Two different features are extracted such as Local Binary Patterns (LBP) and Color Moments for normalized face image. The extracted features are concatenated and classified by SVM.

Z.Boulkenafet et al. [7] presented Face Spoofing detection method using colour texture analysis. The cropped RGB face image is converted into a YCbCr color model and it is split into individual channels. Color Local Binary Pattern (CLBP) texture features are extracted for each channel in the YCbCr color model. The extracted features are concatenated and classified by SVM.

Di Wen et al. [8] introduced a face spoof detection with image distortion analysis. Four different features are extracted for normalized face image, namely specular reflection, blurriness features, chromatic moment feature and color diversity feature. The extracted features are concatenated and classified by the ensemble of classifiers.

Based on the extensive literature survey, it was identified that there is a significant amount of work done in anti-spoofing techniques. The field has reached almost many advances in attacking methodologies and detection strategies. Still, some of the challenges like replay video attack are yet to be addressed. In this paper, face spoofing with replay video is detected using three different feature descriptors namely Color Moments features from HSV color model, Haralick texture features for gray scale image and CLBP for YCbCr color model.

3 Ensemble of face spoof detection method

This paper proposes an Ensemble of feature descriptors to detect the spoof attacks for face biometric authentication system. Histogram of Oriented Gradients (HOG) [9] based face detection is applied on every frame in the video. Face Alignment is performed based on the localized facial feature points on the detected face image using Ensemble of Regression Tress (ERT) [10] and Affine transformation. Ensemble of feature descriptors are extracted for aligned face image and classified by logistic regression. The architecture of proposed face liveness detection is shown in Fig. 2.

Fig.2

Proposed face liveness detection.

3.1 Face detection

HOG is a feature descriptor widely used in object detection and various other optimization problems. HOG based descriptors generate a robust feature set that allows the human face to be accurately discriminated from the input video, even in unconstrained environments. HOG and Linear SVM is applied in each frame of a video to detect the face region. In HOG descriptor, the first step is to compute image gradient in both x and y direction using the Equations (1) and (2). $G_{x} = I * D_{x}$ (1) $G_{y} = I * D_{y}$ (2) where I is the input image, D_x is the filter for x direction, D_y is the filter for y direction. The magnitude of the gradient image is calculated by using Equation (3). $G (x, y) = \sqrt{(G_{x}^{2} + G_{y}^{2})}$ (3)

The orientation of each pixel for the gradient image is calculated by using Equation 4. $θ = \tan^{- 1} (\frac{g_{y}}{g_{x}})$ (4)

The magnitude and orientation of the gradient image are computed and divided into cells and blocks. For each cell, the histogram is constructed based on the gradient magnitude |G| and orientations θ. The orientations represent the size or bins in the histogram and gradient magnitude represents the weight of each bin. The cells are grouped into blocks. The histogram of each block is calculated by concatenating gradient histogram of the cell present in the block. All the blocks are concatenated to form the final feature vector. The extracted features are classified using Linear SVM to detect the faces in the video stream or images. Figure 3 represents the final feature vector of the HOG feature descriptor.

Fig.3

HOG descriptor of a face.

3.2 Face alignment

For each detected face in the image, the pre-trained facial landmark detector of Dlib [11] library is used to estimate the location of 68 coordinates that map to facial structures on the face. The facial landmark detector used in Dlib library is based on Ensemble of Regression Trees (ERT) [10]. The positions of the 68 coordinates are shown in Fig. 4.

Fig.4

68 Facial landmarks coordinate.

Among 68 facial landmark points, 12 eye facial feature points (6 points for left eye and 6 points for right eye) are considered for face alignment. Left and right eye centers are calculated based on the Equation (5) and (6). $lefteye_c entre = \frac{1}{6} \sum_{i = 37}^{42} p_{i}$ (5) $Righteye_c entre = \frac{1}{6} \sum_{i = 43}^{48} p_{i}$ (6) where p_i is the left and right eye (x,y) coordinates.

The angle of the face image is computed by taking the difference between left and right eye centres based on the Equation (7), (8) and (9). $dx = lefteye_center . x - righteye_c enter . x$ (7) $dy = lefteye_c enter . y - righteye_c enter . y$ (8) $angle = {tan}^{- 1} (\frac{d y}{d x}) * \frac{180}{Π}$ (9)

The face image is rotated based on calculated angle value by Affine transformation.

3.3 Color Local Binary Pattern (CLBP) feature descriptor

The aligned RGB face image is converted to YCbCr color model, CLBP texture features are extracted for each channel in the YCbCr color model. LBP is a texture descriptor introduced by Ojala et al. [12], LBP computes a local representation of texture. This local representation is performed by comparing each pixel with its surrounding neighbourhood of pixel values. It is used to characterize the texture object and pattern of an image. For each pixel in the Y, Cb, Cr color channel, a neighbourhood of size p is selected with radius r surrounding the center pixel. LBP image is computed for each pixel in the YCbCr color model based on Equations (10) and (11). ${LBP}_{p},_{r} (x_{c}, y_{c}) = \sum_{p = 0}^{p - 1} s (g_{p} - g_{c}) 2^{p}$ (10) $s (x) = {\begin{matrix} 1, x \geq 0 \\ 0, x < 0 \end{matrix}}$ (11) where g_p is the neighbourhood pixel value, r is the radius of the circle, g_c is the center pixel value and p is the number of neighbourhood pixels.

Figure 5 shows the computation of Local Binary Pattern (LBP). LBP image is obtained by comparing its gray scale value with its neighbouring pixels. If the intensity of the center pixel is greater than or equal to its neighbour, then it’s set to 1 otherwise its set to 0.

Fig.5

LBP Computation.

3.4 Haralick texture feature descriptor

Haralick features are used to describe the texture of an image [13]. Haralick texture features are derived from the Gray-Level Co-occurrence Matrix (GLCM). This matrix characterizes the texture by recording how often pairs of adjacent pixels with specific values occur in an image. Four different directions of adjacency are recorded for Haralick texture features such as left to right, top to bottom, top-left to bottom-right, and top-right to bottom-left. Figure 6 shows the four different directions of Gray Level Co-occurrence Matrix (GLCM). Totally 13 features are extracted for Haralick texture as shown in Table 1.

Fig.6

Levels of adjacency in GLCM.

Table 1

Haralick texture feature descriptors

Features	Equations
Contrast	$\sum_{i, j = 0}^{N - 1} p_{i, j} (i - j)^{2}$
Dissimilarity	$\sum_{i, j = 0}^{N - 1} p_{i, j} \| (i - j) \|$
Homogeneity (Inverse Difference Moment)	$\sum_{i, j = 0}^{N - 1} \frac{p_{i, j}}{1 + (i - j)^{2}}$
Angular Second Moment (ASM)	$\sum_{i, j = 0}^{N - 1} p_{i, j}^{2}$
Energy	$\sqrt{ASM}$
Entropy	$\sum_{i, j = 0}^{N - 1} p_{i, j} (- \ln p_{i, j})$
GLCM Mean (μ)	$μ_{i} = \sum_{i, j = 0}^{N - 1} i (p_{i, j})$
	$μ_{j} = \sum_{i, j = 0}^{N - 1} j (p_{i, j})$
Variance (σ²)	$σ_{i}^{2} = \sum_{i, j = 0}^{N - 1} (p_{i, j}) (i - μ_{i})^{2}$
	$σ_{j}^{2} = \sum_{i, j = 0}^{N - 1} (p_{i, j}) (j - μ_{j})^{2}$
Standard deviation (σ)	$σ_{i} = \sqrt{σ_{i}^{2}}$
	$σ_{j} = \sqrt{σ_{j}^{2}}$
Correlation	$\sum_{i, j = 0}^{N - 1} (p_{i, j}) \frac{(i - μ_{i}) (j - μ_{j})}{\sqrt{σ_{i}^{2}} \sqrt{σ_{j}^{2}}}$

where p_i,j is the normalized GLCM matrix, i and j are the position GLCM matrix and n is the number of pixels in the row and column for GLCM matrix.

3.5 Chromatic moment features

Recaptured face images tend to show a different color distribution compared to colors in the genuine face images. Color distribution between genuine and spoof face are differentiated by Chromatic moment features [8]. The chromatic moment features are utilized based on the research work reported by D. Wen et al. The aligned RGB face image is converted to HSV (Hue, Saturation, and Value) color model, mean and standard deviation features are extracted for each channel in the HSV color model. Figure 7 shows the Hue, Saturation, Value (HSV) histogram between genuine and spoof face image. The first column shows Hue, Saturation and Value channel image of spoof face, the second column shows the histogram of hue, saturation and value channel for photo printed face image. Third and fourth columns show the real face image and its histogram for Hue, Saturation and Value channel. Finally, the extracted features Color Local Binary Pattern, Haralick texture and Color moment are concatenated and classified by logistic regression. Table 2 presents the proposed face liveness detection algorithm.

Fig.7

Difference between genuine and spoof face for HSV color model.

Table 2

Proposed Face Liveness Detection Algorithm

Input: Video frames
Output: Spoof or Genuine face
For each frame in the video (F)
Apply HoG and Linear SVM for F ={f1, f2, . . . . . . , fn}
While (face detection = = True)
Localize the 68 facial feature points P ={P1, P2, . . . . . . , P68}
Compute the mean eye center points (left eye, right eye)
Calculate the distance ← Left_eye to Right_eye
dx = lefteye_center.x - righteye_center.x
dy = lefteye_center.y - righteye_center.y
Distance = sqrt ((dx) ² + (dy) ²)
Calculate angle (θ)
$θ = \tan^{- 1} (\frac{dy}{dx}) * \frac{180}{π}$
Perform face alignment (θ)
Extract features ← (Color Local Binary Pattern ⌢ Haralick texture ⌢ Color moments)
Apply Linear Regression ←∀ extracted features
End
End

4 Results and discussion

The proposed system is implemented in HP z420 workstation with Intel Xeon processor, 28 GB of RAM. OpenCV and Dlib Python based library are used for implementation of the face spoof detection algorithm. MSU-MFSD is used for validating the proposed hybrid face spoof detection method. Figure 8 shows the sample face images of MSU-MFSD. First two rows represent the spoof face images and last two rows show real face images.

Fig.8

Sample face images of MSU-MSFD.

Figure 9 shows the estimation of color moment features for real and spoof face image. First row gives the real face image mean value for the HSV color model. Second row presents the photo printed face image mean value for the HSV color model. The third row shows the real face image Standard Deviation value of the HSV color model and fourth row indicates the photo printed face image Standard Deviation value for the HSV color model. From Figure 9, it is observed that the mean and standard deviation of HSV color model for genuine and spoof face images are different which validates the detection of spoof images.

Fig.9

Color moment features for real and spoof face image.

Table 3

Recognition accuracy of MSU-MFSD

Methods	Precision	Recall	F1-score
Haralick	92	90	90
CLBP	99	99	99
Proposed (Haralick + CLBP	99.45	99.45	99.45
+ Color Moments)

Figure 10 shows the difference between genuine and spoof face images for YCbCr color model. From Fig. 10, it is confirmed that the recaptured face image has less color information than genuine face image. The performance of a proposed ensemble of feature descriptor algorithm to detect the spoof attack is measured with precision, recall and F1-score. Table 3 shows the recognition accuracy of MSU-MFSD. It can be seen that the proposed face spoof detection method performs better in terms of precision, recall and F1-score.

Fig.10

Different between genuine and spoof face for YCbCr color model.

5 Conclusion

This paper proposed an ensemble of feature descriptors to detect the face spoof attacks, using Color moments, Haralick texture and CLBP resulting in 93 dimensional feature vector. From the experimental results it is observed that the proposed algorithm with Logistic Regression classifier achieves 99.45% accuracy in classifying the real and spoofed faces. Moreover the proposed algorithm is robust enough to be deployed in any real time face biometric authentication system.

Footnotes

Acknowledgments

This research project is supported by DAE-BRNS, Department of Atomic Energy (Sanction No: 2013/36/41), Government of India. The authors would like to extend their sincere thanks to DAE-BRNS and DST - PURSE for their support.

References

Duc

N.M.

and Minh

B.Q.

, Your face is not your password, in Proc Black Hat Conf1 (2009), 1–16.

Viola

and Jones

, Robust real-time face detection, International Journal of Computer Vision57(2) (2004), 137–154.

Galbally

, Marcel

and Fierrez

, Biometric antispoofing methods: A survey in face recognition, in IEEE Access2 (2014), 1530–1552.

Singh

A.K.

, Joshi

and Nandi

G.C.

, Face recognition with liveness detection using eye and mouth movement, 2014 International Conference on Signal Propagation and Computer Technology (ICSPCT 2014), Ajmer, 2014, pp. 592–597. doi: 10.1109/ICSPCT.2014.6884911

Agarwal

, Singh

and Vatsa

, Face Anti-spoofing using Haralick Features, Applications and Systems, In Proceedings of IEEE International Conference on Biometrics: Theory, 2016.

Patel

, Han

and Jain

A.K.

, Secure face unlock: Spoof detection on smartphones, in IEEE Transactions on Information Forensics and Security11(10) (2016), 2268–2283.

Boulkenafet

, Komulainen

and Hadid

, Face spoofing detection using colour texture analysis, in IEEE Transactions on Information Forensics and Security11(8) (2016), 1818–1830.

Wen

, Han

and Jain

A.K.

, Face spoof detection with image distortion analysis, IEEE Transactions on Information Forensics and Security10(4) (2015), 746–761.

Dalal

and Triggs

, Histograms of oriented gradients for human detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 886–893.

10.

Kazemi

and Sullivan

, One Millisecond Face Alignment with an Ensemble of Regression Trees, CVPR, 2014.

11.

King

D.E.

, Dlib-ml: A Machine Learning Toolkit, Journal of Machine Learning Research10 (2009), 1755–1758.

12.

Ojala

, PietikÃd’inen

and Mäenpää

, Multiresolution gray scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence24(7) (2002), 971–987.

13.

Haralick

R.M.

, Shanmugam

and Dinstein

, Textural features for image classification, in IEEE Transactions on Systems, Man, and CyberneticsSMC-3(6) (1973), 610–621. doi: 10.1109/TSMC.1973.4309314