Abstract
Face detection has been widely studied by researchers. However, detection and extraction of human face features is very important as it plays a vital role in variety of applications involving automated face processing. This article focuses on extraction of face parts such as eyes, nose, lips, mustache, and beard on Indian people, for which we have prepared our own face dataset containing variety in faces, from both urban and rural areas. This study focuses on how a detected face part becomes useful in detecting other face parts. We implement our approaches of detecting face parts and evaluate them on our dataset. We exploit YCbCr color model, Viola Jones technique, landmark detection, and level set evolution technique in our approaches of face part detection and extraction. We found that our approaches are effective on extracting face boundary, eyes, nose, and lips and provide comparable results.
Keywords
Introduction
Computer based automatic extraction of human facial features is an important research topic due to its widespread applications such as face recognition, face grouping, facial expression recognition, etc. Extracting facial features [1] involves discovering exact locations of different face parts, such as eyes, nose, mouth, eyebrows, beard, moustache, chin, etc., and then separating these portions for further required processing. Extraction of human face parts plays an important role in human face analysis [2], visual interpretation, and human face recognition [3, 4]. Face detection has attracted much interest since a long and has progressed drastically over past few decades [5, 6, 7]; however, detection of human face parts is of prime importance in a wide variety of applications such as computer vision, facial animation, face recognition, facial expression detection, face image database management, etc. A human being can identify almost a thousand of faces in one’s entire life and can recognize faces without any trouble. However, for computer programs, it is a challenging task.
Many researchers, e.g., in [8], have attempted face detection and have shown its applications in various systems [9, 10, 11, 12]. Various researchers have studied detection and extraction of individual face parts. Berbar et al. [13] carried out detection of faces in color images. Oravec et al. [14] used skin color based segmentation for extraction of face and proposed methods for eye localization and mouth localization. A research work by Mahoor et al. [15] improved Active Shape Model (ASM), which can be used for extraction of facial features. Shih et al. [16] detected face candidates, which are segmented using Gaussian skin-color model, by its size and shape. Their work [16] created an ellipse model to roughly locate eyes and mouth, but then used SVM to classify them. All these studies are limited to two to three face parts. Moreover, calculating accurate distance measures among face parts is also essential for practical use, e.g., in query driven face retrieval system.
Extracting face features involves two major steps: (1) locate approximate position of the desired face part and (2) extract precise shape of the face part. Many image processing operations are available for processing of color images. However, to identify exact face parts such as eyes, lips, etc., the usage of appropriate image processing technique can provide useful answer, which requires practical study on various face images. In this article, we try to study and improve extraction of useful face parts. Furthermore, our study is on faces of Indian people, which have different skin color than those already studied in the literature.
This article intends to study available, applicable techniques for extraction of face parts on our face dataset of 221 frontal face images, including 163 male and 58 female (in all 13 are children). Various researchers have studied extraction of individual face parts. However, our study focuses on how to utilize a detected face part in detecting and extracting other face parts, e.g., using nose to locate lips. We aim at extraction of five face parts, including face boundary, (1) eyes, (2) nose, (3) lips, (4) beard, and (5) mustache. We also detect eye points, nose tip, and lip center to measure distances between two eyes, nose and eyes, eyes to lips, and nose to lips. These selected five face parts and the measures can provide substantial feature information to face classification, clustering, and query based face retrieval, which might become useful to crime department, various government organizations, matrimonial organizations, etc.
Background and related work
Background
In today’s era, there is a need of large amount of human face image data to be maintained and used. In various organizations, such as crime department, government department, there arises need to efficiently manage and use human image data. Different departments have many human images to be managed and retrieved according to requirements. There can be a smart system to manage and retrieve human images using specified face features. A human face contains many face features or face parts including eyes, eye color, eyebrows, chin, nose, lips, beard, mustache, hair, hair color, skin color, spectacles, etc. Extraction of such face parts from a face image requires image processing operations. Generally, in image processing [17], an image has to pass through some stages such as image acquisition, image pre-processing, image segmentation, feature extraction, etc. Extraction of face features involves two important operations: face detection and face part detection. Face detection is the process of detecting face from an image. Some techniques such as Viola Jones [18], skin color detection, and morphological operations are used for detecting face region.
Face part detection is the process of detecting facial parts from face region. For extracting different face parts, different techniques are available. Generally, any image processing technique falls into two domains: spatial domain and transform domain. Spatial domain techniques directly deal with the image pixels. Spatial domain techniques used for facial feature extractions are Local Derivative Patterns (LDP), Local Binary Patterns (LBP), edge detection techniques, smoothing techniques for preprocessing, gray scale manipulation, etc. For face part extraction, the following techniques of spatial domain are used: Principal Component Analysis (PCA), edge detection, skin color segmentation, Active Shape Models (ASM), and Gabor features.
Transformation or frequency domain techniques are based on the manipulation of the orthogonal transform of the image rather than the image itself. Transformation domain techniques are suited for processing the image according to the frequency content. Frequently used techniques for feature detection of face are Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), 2D-DCT, DT-DWT and Hough transform. Circular Hough transform is used for detecting iris from eye window. This transform technique finds circular shape in eye region and locates iris. The circular Hough transform gives accurate result for iris detection.
Proposed work for extraction of human face features.
Khan et al. in [19] proposed an algorithm that performs multiclass semantic segmentation of face parts using Conditional Random Fields (CRF). They segment six regions, including hair, eyes, nose, mouth, skin, and background using position, HSV color, and shape information in CRF model. Work of Happy and Routray in [20] recognized six universal facial expressions automatically using features of salient facial patches. They localize face as well as the facial landmark points and extract all major active regions on the face. Their work employed SVM with one-against-one classification method to classify an expression as Anger, Fear, Disgust, Happiness, Sadness, or Surprise. They exploited localization of facial landmark such as eyes, nose, lip corners, and eyebrow corners.
In recent years, researchers have also attempted to apply deep learning in the area of face and facial parts detection. For example, Ranjan et al. [21] proposed an algorithm using deep convolutional neural networks (CNN) that allows multi-task learning. They proposed a single CNN model, called HyperFace, that allows simultaneous face detection, landmarks localization, pose estimation, and gender recognition. Similarly, Yang et al. [22] proposed a deep CNN, called Faceness-Net, that can detect faces even under severe occlusion and unconstrained pose variations. Their work employs local facial parts based supervision and computes faceness score based on spatial arrangement of face parts. However, for face detection, the work of Ranjan et al. [21] and Yang et al. [22] do not focus on extracting exact face boundary. Another work by Zhang et al. in [23] proposed Tasks-Constrained Deep Convolutional Network (TCDCN) that uses auxiliary information such as head pose estimation, gender classification, age estimation, and facial expression recognition to optimize facial landmark detection. Their work focuses on land-marking eye, mouth, and nose.
Researchers have also tried to improve 3D facial recognition; however, facial recognition in 2D is more challenging than in 3D. Boukamcha et al. in [24] presented a method using landmark point detection for 3D frontal face. Their work exploits face segmentation and surface curvature information. Another work by Dhahri and Belaid in [25] attempted to remove beard from 3D human face model. Their work segments the face to locate beard area and then estimate whether beard is present or not based on their proposed measure, compute of angle between face normal vectors (SANV). Once the beard is located, it is removed by applying Taubin smoothing with regression of SANV.
In our work, we consider extraction of total five face parts: eyes, nose, mouth, beard, and mustache. Prior to this work, we have carried out survey on techniques of extraction of different facial features [26]. In our survey [26], we considered total five face parts: eyes, nose, mouth, beard, and mustache. In our survey, we consider total four approaches of facial feature detection. Different approaches of face parts detection are as follows: geometry based approach, appearance based approach, color based approach, and template based approach. We also considered different criteria of comparison such as used dataset, approach for feature detection, and accuracy of results in our survey. Detailed survey and analysis are presented in [26].
Through the survey of literature [26], we conclude that geometry and color based approaches are more useful for finding face features. For image pre-processing, techniques such as image normalization, noise removal, and background removal technique are used. Face detection techniques such as Viola Jones, skin color detection, and morphological operations are used to detect a face from an image. For eye detection, circular Hough transform, YCbCr color model, gabor filter, and Viola Jones techniques are used. In mouth detection, techniques such as Snake models, ASM, Viola Jones technique, and multi-state mouth model are used to detect mouth region and shape. For nose detection, Viola Jones, edge detection, and geometric templates are used. Active shapes models and skin color information are used to detect beard and mustache.
Amongst various methods, Viola Jones algorithm gives good result for detection of face and face parts with high accuracy and low computational power on frontal view images. Face detection with skin color information gives good result, but sometimes it considers other skin color portions such as neck and hands as face region. PCA, neural network, and eigen faces give very good result but these all are appearance based methods and take high computational power and needs lots of data for training purpose. Thus, VJ algorithm is better to use for face detection and face parts detection as compared to all other techniques because of its low computational power and accurate results for frontal-view color images.
Proposed work to extract human face features
Figure 1 shows the architecture of our proposed work to extract face features from frontal face images. Furthermore, we propose to demonstrate how a detected face part becomes useful in detecting other face parts. Detailed methodology and detailed steps to extract various face parts are discussed in Section 4. This section discusses higher-level steps, which are as follows: Image acquisition, Image pre-processing, Face region detection, Detecting each face parts, and Feature extraction of each face part.
Image acquisition
There are various types of human images such as posed and un-posed images, frontal and non-frontal view images, images with uniform and non-uniform background, etc. For this research, we use human face images which contain frontal view with uniform background. As we aimed at extraction of face features of Indian people, we collected passport size photographs from a local photographer (from Kalol city of Gujarat, India). For purpose of face recognition and facial feature extraction, researchers used datasets such as CVL [27], AR dataset [28], LFW [29], and FERET [30], etc.; we used some images of CVL dataset to maintain variation in skin color in our database. Furthermore, we also selected some images available on the Internet. For detection of face parts, we have total 221 images with frontal face and uniform background with different lighting conditions. Our dataset contains 163 male and 58 female images (in all 13 are children). The images in our dataset are in .jpg or .png format.
Image pre-processing
To get better results in further steps, image pre-processing is required. In our work, images were pre-processed by resizing or normalizing to bring them in a same size. We used inbuilt imresize() and imcrop() function of Matlab [31] for resizing images. We also used Lighting Compensation (LC) technique, such as used in [32], to deal with different lighting conditions.
Face region detection and face parts detection
For detection of human face, different algorithms are available such as Viola Jones (VJ) Algorithm [18], Edge detection techniques like canny edge detection, sobel edge detection, skin color information, morphological operations, geometric templates, etc. However, VJ only gives bounding box and skin color detection detects face region; however, skin color detection also considers neck portion into skin region. Therefore, to overcome this problem, we propose to use combination of skin color detection and landmark detection techniques to get an exact face boundary. After getting the face boundary from an image, the next step is to find the face parts such as eyes, nose, lips, beard, and mustache. To locate eyes, nose, and mouth, we used VJ technique. However, to locate mustache and beard, we propose to use our own simple way.
Results of Lighting Compensation (LC) technique.
Block diagram of skin detection process.
To extract features of each face part, different techniques can be applied. For eye point detection, we used a method which is based on color segmentation on YCbCr color space. Detection of nose region is conducted by using Viola-Jones technique, which provides the bounding box for it. The nose tip is at the center of the bounding box. To detect lip point from mouth region, we apply face landmark detection. However, in many images, the technique fails to provide correct boundary of lips. In order to improve the detection accuracy, we use bounding box that we get by applying Viola Jones. Thus bounding box can localize the lip landmarks. In order to detect mustache area, we select area that lies between detected mouth and nose tip. Next, we search pixels of skin color and select other pixels assuming of mustache and beard. Based on non-skin color, we can extract any mustache from a face. To detect beard, we set a rule that beard can only have area below mouth. Therefore, any non-skin area below mouth corresponds to beard. We also apply a filter that filters out any blobs smaller than a fixed threshold.
Result of skin detection from a sample image.
In this section, we detail our methodology and present results on our dataset. We have implemented our ways of face and face parts detection using Matlab tool.
Image pre-processing
Image resizing and cropping
We used imresize() and imcrop() functions of Matlab to normalize the image and to crop the image, respectively.
Result of landmark detection and face boundary extraction.
The images in our dataset have varying lighting conditions. The lighting compensation (LC) algorithm is very efficient in enhancing and restoring the natural colors into the images, which are taken in darker and varying lighting conditions. Therefore, lighting compensation has been used in skin and face detection, and this algorithm is indispensable for robust skin-tone color detection [33]. Figure 2 shows the result of LC technique on a sample image.
Results of face boundary extraction on sample images.
Process of detecting eye centers, adapted from [36].
Results of eye point detection and distance between two eyes.
To get the exact boundary of the face from an image, we used a combination of two techniques: 1) Skin detection and 2) Landmark detection.
Skin detection
Human skin segmentation aims to locate skin regions in an unconstrained input image. Most existing skin segmentation approaches use skin color as a basis of segmentation. In frontal face images, there is a substantial large portion similar to skin color. Therefore, for face detection, many authors have used skin color based methods due to their suitability and quick detection. The technique helps to detect faces from different environmental variations. Figure 3 shows the steps to detect skin color.
Diagram of process of nose tip detection.
We use color based segmentation for skin detection. In this research, we used YCbCr color space as a basis to detect human skin, as used in [34]. We use skin-color map on the chrominance components of the input image and use it to detect pixels that fall in the range of skin color. In the YCbCr color space, we use the following ranges, shown in Eq. (1), of Cb and Cr, as used by [34], that are illustrative for the skin:
We also add
Then detected skin is converted to binary image where white pixels indicate skin and black pixels represent non-skin. Morphological operations like dilation and erosion are applied next. Next, we remove small blobs using a filter. At the end, we use the biggest blob as a mask on the original image to get skin region. Figure 4 shows result of skin detection from a sample image.
After skin detection, we applied landmark detection [35] technique on the face region, which we obtained using skin detection. Landmark detection method gives 66 face landmarks that surround lower face-boundary and the facial features like eyes, nose, mouth, and eyebrows. We found that on original images, landmarks are not successfully detected. To improve the result, we use only detected skin portion from previous step and feed it to this method. Since only at the bottom of the face there is a possibility of neck region having similar skin color, we use only the bottom landmarks of the face to create the face boundary.
Figure 5 shows result of landmark detection and face boundary extraction. Figure 6 shows results of face boundary extraction on sample images of our database.
Feature extraction
We extracted facial features such as eye centers, distance between two eyes, height and width of a face, nose, lip shape, distances between nose to eyes, distances between eyes to mouth, beard, and mustache region.
Results of nose detection.
For detection of eye centers, we adapt method of Nasiri et al. [36] with removing overhead of various geometric tests, and exploit Viola-Jones method, which can detect a bounding box of the eye pair. Their method [36] finds out candidate eye pairs on whole face and then selects appropriate one based on four geometric tests: (1) eye-pair distance, (2) eye-center distance, (3) eye-angle test, and (4) eye-shape test; however, we do not need such tests in our approach. Figure 7 shows our way of detecting eye centers. We first, detect a bounding box of eyes using Viola-Jones method. We then divide the bounding box, obtained using Viola-Jones method, into two parts: right eye and left eye portions. For each eye portion, we use steps indicated in [36] with our modifications. Each portion is converted from RGB to YCbCr color space. Next, as per a step indicated in [36], we make two eye maps: EyeMapC (from the chrominance components) and EyeMapL (from the luminance component) and merge (AND operation) them to obtain a final map: EyeMap, details and formula are omitted for simplicity, interested readers can refer [36] for further details. As stated earlier, we do not perform any geometric test used by [36], since we first straight away detect eye bounding box hence we do not get candidate eye pairs and thus we do not need to perform any geometric tests. Next, we apply morphological operations such as dilation on binary image to get the exact locations of the centers of the eyes. Figure 8 shows results of eye detection and distance between two eyes.
Flowchart of process of lip shape extraction.
Detection of the nose tip is conducted by using Viola-Jones without any additional method. In many cases, this method can detect nose area with high accuracy as long as we can feed the method with appropriate input. In this research, we use image portion below center of eyes. Figure 9 shows process of nose tip detection and Fig. 10 shows the result of nose tip detection, which is used to measure distances between eyes and nose tip.
Lip point detection and shape extraction
Figure 11 shows major steps of lip extraction.
In detecting mouth location, existing landmark detection method on whole face image fails to give the correct boundary of lips in many images. In order to improve the detection accuracy, we assume that the mouth is at the area below nose tip, and therefore we take image portion below nose. We, then, apply Viola-Jones method on the extracted image portion to detect mouth. Next, we apply face landmark detection on the mouth bounding box to get lips boundary. The landmark detector actually implements many lips models on the face to see which one of the models that suits mouth form. Next, we find the centroid of mouth using regionprop() function and then find mouth corners and calculate height and width of mouth.
Figure 12 shows the result of lip point detection, lip shape extraction, and distance between different face features on a sample image.
Mustache region extraction
Figure 13 shows steps of mustache region extraction. The detection of mustache is based on method available in [37]. This method actually is a segmentation method. It segments image according to initial input given to it. If we input skin area, then this method will search pixels that have similar color with skin. At the end, we will get areas that are similar with skin and non-skin. In this research, we automatically generate input for the method from a skin image, which is an area on nose. Then, the program searches skin color and puts all such pixels in skin portion and puts all other pixels like mustache, beard, and hair into non-skin portion. From non-skin portion, we can easily extract any mustache or beard from face. In order to select mustache area, we select non-skin area that lies between detected mouth and nose tip. Figure 14 shows the results of mustache region extraction.
Results of lip point detection and lip shape extraction on a sample image.
In some cases, any mustache detected based on previous steps overlaps previously detected mouth portion. Thus, the mustache will affect the measurement of mouth properties like width, height, and its center. Therefore, based on region of detected mustache, we refine the area of mouth by setting a new rule. Mouth cannot be over mustache region, so we must lower the boundary of mouth according to the lower portion of the mustache.
Steps of mustache region extraction.
Results of mustache region detection on a few sample images.
Steps of beard detection.
When we apply the method to detect mustache, we also get beard area at the same time. Therefore, we use the same approach to get the area of beard. We set a rule that beard can only have an area below mouth. Therefore, any non-skin area below mouth corresponds to beard. However, we also add a rule that the height of beard can have only more than 15 pixels, based on observations of beard images in our dataset. We also add morphological filter that filters out any blobs (non-skin area) that are less than threshold area. Figure 15 shows steps of beard region detection and Fig. 16 shows its results on two sample images.
Evaluation
Results of beard region detection.
Results of detection of all facial features.
This section presents quantitative results of accuracies in detection and extraction of different face parts. We performed all the experiments using MATLAB 2015a. Figure 17 shows images of different people with all extracted features such as eyes, nose, lips, beard, mustache, and distances between face parts. Table 1 shows accuracies of detection of the face parts. We observed that our results are comparable with others’ results and are compared in Table 4. For face extraction, our dataset contains 270 images having frontal, human faces. Accurate face boundary was successful on 221 images. Thus, the accuracy for face boundary detection was 81.85%. For extraction of face parts, our image dataset contains 221 faces having extracted face boundary. For further processing, i.e., extraction of face parts, we performed detection of face parts on these 221 images as per the methodology presented in Section 4.
Accuracy of detection of different face features
Different features (various distances) of a few face images
Faces that we considered as without mustache.
Faces for which we get False Positive (FP) beard.
Table 2 shows the extracted features of face parts. We use the following notation in the table: LE stands for Left Eye, RE stands for Right Eye, NTip stands for Nose Tip, MT stands for mouth, W stands for width, and H stands for height. The first column indicates the name of an image in our dataset, and all other columns represent various distances, in terms of number of pixels.
Our database included a variety of people, including from rural area and urban area. Generally, villagers do not pay attention on keeping clean shave. Therefore, to consider tiny grown hair on mustache region as mustache or not was a question for us, whose answer is subjective. We show a few faces in Fig. 18 that we considered as without mustache. We faced the same problem for beard also.
Table 3 shows results of beard and mustache detection and extraction. We observed that for mustache detection, we could get comparable results, however, for beard detection, we could get only 73.30% accuracy. The reason behind lower accuracy for beard is due to high false positive rate. Fig. 19 shows a few faces for which we got False Positive beard.
Table 4 shows the comparison between our results with other researchers’ work. For eye detection, the authors of [38] used circular Hough transform with unsupervised k-mean for eye ball detection. They used 415 eye images from FERET database and got 90% accuracy on testing. Vukadinovic and Pantic in [39] used VJ method to detect eye region and gabor filter for locating eye box. They used Cohn-Kanade database and got 93% accuracy. A recent work by Yang et al. [22] used convolutional neural network (CNN) for eye detection and achieved 95.87% accuracy for cropped dataset and 97.19% for uncropped dataset. Khan et al. in their recent work [19] used conditional random fields (CRF) for eye detection and got 91.87% and 84.56% accuracies for FASSEG V2 database and FASSEG V4 database, respectively. We used 221 color images of human face and achieved 94.57% accuracy on eye ball and eye center detection.
Performance measure of beard and mustache detection
Comparison between results of our approaches with others’ methods for eye detection, nose detection, mouth/lip detection, mustache detection, and beard detection
For nose detection, a work by Yin and Basu [40] used Geometric template and region growing method with 270 frames from real video sequences and got 93.33% accuracy. We used VJ technique to detect nose and midpoint of bounding box to detect nose tip and got 98.99% accuracy. For lip detection, Le and Savvides in [41] used active contour method on MBGC dataset and obtained 90% accuracy. Use of Gabor filter and VJ for mouth/lip detection in [39] could achieve 93.00% accuracy. Recently, for mouth detection, Yang et al. [22] used CNN and achieved 94.17% and 93.55% accuracies for cropped dataset and uncropped datasets, respectively. We used landmark detection and VJ method to detect lips shape and points and got 96.38% accuracy.
For beard and mustache detection Le et al. in [42] used modified active shape model on two datasets: MBGC and FERT. For mustache detection, they achieved 98.80% and 97.00% accuracies for MBGC and FERT databases, respectively. For beard detection, they achieved 96.20% and 95.80% accuracies for MBGC and FERT databases, respectively. Wang and Yau in [43] uses beard detection to recognize gender. They used geometric template and image binarization and achieved 89% accuracy on FERET dataset. We used level set evolution method to differentiate skin and non-skin region and morphological operation to detect beard and mustache region. We could achieve 92.30% accuracy for beard detection, but for mustache detection, accuracy was comparatively low, 73.30%.
From the extracted face parts, features can be built, e.g., face color, face shape, eye-ball color, nose shape, lips shape, etc., that can become useful in query driven face retrieval. Such extracted features of human face images can be used to design a system where user can input a query in form of different expected features of faces and system will retrieve the matching faces, which can become useful in variety of applications, e.g. in crime department. We are experimenting for improving mustahce and beard detection and extraction.
During this research, first we studied about the different methods used for extracting each facial feature: eyes, nose, mouth, beard, and mustache. After analyzing available techniques for face parts detection, we concluded that techniques from geometry based approach and color based approach can give accurate results for frontal-face images. In this article, we implemented the proposed work using Matlab tool. We prepared our own face dataset containing total 270 color images with human face. For face boundary extraction, we used combination of skin color detection and landmark detection technique. Skin color detection detects skin color and landmark points of the bottom of a face are helpful to remove neck portion detected by skin color.
Out of 270 images, for 221 images we extracted face boundary correctly. For detecting further features, we used Viola Jones method to get region of eyes, mouth, and nose. We used YCbCr color space and morphological operations to detect eye centers. We used median to detect nose tip. For lip shape detection, we used combination of landmark points and VJ technique. For mustache and beard detection, we used level set evolution technique to detect non-skin region from face, and applied some morphological operations and connected component technique. We achieved 81.85% accuracy on face boundary extraction. We achieved 94.57%, 98.99%, 96.38%, 80.54% accuracy for eye, nose, lip, and lip shape detection, respectively. Thus, we demonstrated that a detected face part can provide substantial input in detecting other face parts accurately. In future, we plan to study retrieving images based on the face features, which is useful in query driven image retrieval.
