Abstract
To develop the human-centric driver fatigue monitoring system for automatic understanding and charactering of driver’s conditions, a novel, efficient feature extraction approach, named Local Multiresolution Derivative Pattern (LMDP), is proposed to describe the driver’s fatigue expression images, and the Intersection Kernel Support Vector Machines classifier is then exploited to recognize three pre-defined classes of fatigue expressions, i.e., awake expressions, moderate fatigue expressions and severe fatigue expressions. With features extracted from a fatigue expressions dataset created at Southeast University, the holdout and cross-validation experiments on fatigue expressions classification are conducted by the Intersection Kernel Support Vector Machines classifier, compared with three commonly used classification methods including the k-nearest neighbor classifier, the multilayer perception classifier and the dissimilarity-based classifier. The experimental results of holdout and cross-validation showed that LMDP offers the better performance than Local Derivative Pattern, and the second order LMDP exceeds other order LMDP. With the second order LMDP and the Intersection Kernel Support Vector Machines classifier, the classification accuracies of the severe fatigue are over 90% in the holdout and cross-validation experiments, thus demonstrating the effectiveness of the proposed feature extraction method in automatically understanding the driver’s conditions towards the human-centric driver fatigue monitoring system.
Keywords
Introduction
European Transport Safety Council (ETSC) defines four levels of sleepiness based on behavioral terms, i.e., completely awake, moderate sleepiness, severe sleepiness, and sleep [1]. Studies show that 25% –30% of driving accidents are fatigue related, and most sleepy drivers tried to fight against sleep with moderate sleepiness, severe sleepiness [2]. When a driver is fatigued, certain physical and physiological phenomena can be observed, including changes in brain waves or EEG, eye activity, facial expressions, body sagging posture, gripping force on the steering wheel, and other changes in body activities. Recently, Human-centric Driver Fatigue Monitoring System (DFMS) is developed for monitoring the attention status of the driver, and different countermeasures, depending on the types and levels of fatigue, should be taken to maintain driving safety.
The Physical measures of driver’s fatigue include blink frequency, eye closure duration, nodding frequency, fixed gaze, and frontal face pose. In [3], the different linguistic terms and their corresponding fuzzy sets were distributed in each of the inputs using induced knowledge based on the hierarchical fuzzy partitioning method, and three variables (fixed gaze, PERCLOS, and ECD) were determined to be crucial cues for detecting a driver’s fatigue. Suzuki et al. derived the following three factors from the blinking waveform [4], and these factors were then weighted using a multiple regression analysis for each individual to calculate the drowsiness level. Eskandarian et al. utilized artificial neural network to analyze vehicle parameter data and eye-closure data to infer driver fatigue, and identify the potential variables that were correlated with drowsiness [5]. Orazio et al. used a mixture Gaussian model to model the “normal behavior” statistics from the ECD and frequency of eye closure for each person to identify anomalous behaviors [6]. Barr et al. reviewed and evaluated the noninvasive drivers’ monitoring technologies of vehicle-based operator alertness/drowsiness/vigilance [7], such as DD850 Driver Fatigue Monitor (DFM) designed by Attention Technology Inc., Driver State Monitor (DSM) developed by Delphi Inc., Seeing Machines faceLABTM, Smart Eye AB, InSightTM developed by SensoMotoric Instruments GmbH Inc., Video-based Eye Tracking Systems ETS-PC II developed by Applied Science Laboratories, Eyegaze Analysis System developed by LC Technologies Inc., Drowsy Driver Detection System designed by Johns Hopkins University Applied Physics Laboratory, RPI computer vision system for monitoring driver vigilance by Rensselaer Polytechnic University, Drowsiness Detecting System based on Artificial Neural Network developed by George Washington University. Friedrichs and Yang explored 18 features of eye movement for drowsiness detection [8], and chose the sequential floating forward selection algorithm to select the most promising features to construct a classifier. Fan et al. utilized a Gabor features representation of the face for fatigue detection [9], and then AdaBoost algorithm was used to extract the most critical features from the dynamic feature set and construct a strong classifier for fatigue detection. In [10], it also reported that sleep deprived drivers have a lower frequency of steering reversals, a deterioration of steering performance, a decrease in the steering-wheel reversing rate, more frequent steering maneuvers during wakeful periods, no steering correction for a prolonged period of time followed by a jerky motion during drowsy periods, low-velocity steering, large amplitude steering-wheel movements, and large standard deviations in the steering-wheel angle.
The biological measures of driver’s fatigue include Electroencephalography (EEG), Electrocardiogram (ECG), Electro-oculography (EOG), and Surface Electromyogram (SEMG). These signals can be collected through electrodes in contact with the skin of the human body. Recent research has proposed various methods of extracting features from a segment of raw EEG data for fatigue detection. Lin et al. established a linear regression model to estimate the drowsiness level from the independent component analysis of 33-channel EEG signals and could estimate the drowsiness level with 87% accuracy [11, 12], and they then implemented a real-time embedded EEG-based driver drowsiness estimate system, which adopted only four channels of EEG data. Damousis et al. selected eight eye activity features, extracted from EOG, to develop a fuzzy expert system for the detection of hypovigilance [13]. Yeo et al. trained SVM to classify EEG signals into four principal frequency bands and then to predict the transition from alertness to drowsiness [14]. Jap et al. accessed four EEG activities for 52 subjects during a monotonous driving session, and the results showed an increase in the ratio of slow wave to fast wave EEG activities over time [15]. Hu and Zheng employed a support vector machines to perform drowsiness prediction with 11 eyelid-related features extracted from EOG [16]. Yang et al. employed a dynamic BN with EEG and ECG to estimate fatigue, and a first-order HMM was employed to compute the dynamics of a BN at two different times slices [17]. In [18], kernel principal component analysis algorithm was employed to extract nonlinear features from the complexity parameters of EEG and improve the generalization performance of an HMM. Sibsambhu et al. presents a method based on a class of entropy measures on the recorded EEG signals of human subjects for relative quantification of fatigue during driving [19]. Rami et al. developed an efficient fuzzy mutual-information-based wavelet packet transform feature-extraction method for classifying the driver drowsiness state into one of predefined drowsiness levels [20]. Recently, a powerful operator, called Local Derivative Pattern (LDP) was proposed by Zhang for face recognition [21]. LDP encodes directional pattern features based on local derivative variations, and captures more detailed discriminative information of face. But LDP can’t obtain the relationship information of local high pattern for the intrinsic geometrical structures (locations and smoothed contours of mouth, eyes and eyebrows. This disappointing behaviour of LDP in fatigue expression indicates that more powerful representations are needed in higher dimensions.
With the aim to explore the intrinsic geometrical structure of fatigue expression images, we propose a novel, efficient features extraction approach, named Local Multiresolution Derivative Pattern (LMDP) for fatigue expressions descriptions of vehicle drivers. The paper is organized as follows. In Section 2, the background of SEU (Southeast University) fatigue facial expression dataset is outlined. In Section 3, Local Multiresolution Derivative Pattern (LMDP) for image feature description of fatigue expressions is introduced. Intersection Kernel Support Vector Machines (ISKVMs), compared with k-Nearst Neighbor (kNN), Multilayer Perceptron (MLP) and Dissimilarity classifier are presented in Section 4. Section 5 details the experiments and reports on the classification results for the fatigue expressions of vehicle drivers. Section 6 presents our conclusions.
Fatigue expression data acquisition
Traffic accidents are often related to moderate sleepiness and severe sleepiness. The sign of moderate sleepiness is that vehicle driver’s repeat yawning, and the sign of severe sleepiness is that vehicle drivers have difficulties in keeping eyes open and nodding off at the wheel [1]. We created a fatigue expressions dataset, named Southeast University (SEU) fatigue expressions dataset, which consists of three kinds of fatigue expressions images, i.e., awake expressions, moderate fatigue expressions and severe fatigue expressions. The SEU fatigue expression dataset was created using a side-mounted Logitech C905 CCD camera and includes 120 fatigue expressions images. There are 20 male drivers and 20 female drivers in SEU fatigue expression dataset, and the lighting conditions varied under the natural conditions, as the car was in an outdoor parking lot. In this paper, Viola-Jones face detection algorithm was used to detect the vehicle driver’s faces [22]. The major contributions of Viola-Jones face detection algorithm include the exploitation of Haar-like feature expression and AdaBoost learning. In our data acquisition and normalization, Viola-Jones face detection algorithm offers excellent performance in detecting the driver’s face even when a driver turns his/her face to look at rear-mirrors, or lower his/her head to operate the shift lever. At finally, SEU fatigue expression dataset consists of 40 subjects with 3 different sessions per subject and in total 120 sessions. The Fig. 1(a) shows example images of a driver inside the vehicle, and the corresponding face detection result is shown in Fig. 1(b).
Local Multiresolution Derivative Pattern
In this section, a brief review of Local Derivative Pattern (LDP) is presented, and then Local Multiresolution Derivative Pattern (LMDP) is introduced in details.
Local Derivative Pattern
Local Derivative Pattern (LDP) encodes the higher-order derivative information of an image [21], which is derived from a general definition of texture in a local neighbourhood. Given an image I (Z), the first-order derivative is denoted as I′ (Z). Let Z0 be a point in I (Z), and Z
i
, i = 1, …, 8 be the neighboring point around Z0. The first-order derivatives in at Z = Z0 can be written as
The second-order local derivative pattern, LDP2 at Z = Z0 is defined as
From Equation 3, it can be seen that the second-order LDP encodes the change of the neighbourhood derivative directions, which represents the second-order pattern information in the local region. To calculate the nth-order Local Derivative Pattern, the nth-order LDP is a binary string describing gradient trend changes in a local region of directional (n–1)th-order derivative images In-1 (Z) as
The high-order local patterns provide a stronger discriminative capability in describing detailed texture information than the first-order local pattern. The higher the order is, the more details the local pattern operator can extract from the image, but over-detailed patterns tend to be noise instead of identity information.
For a human visual system to capture the essential information of a natural scene, it is well-known that a computational image representation based on a local, directional and multiresolution expansion will be efficient. With this insight, the sparse expansion for fatigue expression image can be obtained by applying a Laplacian Pyramid (LP) with orthogonal filters [23], followed by a Directional Filter Bank (DFB) [24]. The LP with orthogonal filters is used to capture the point discontinuities, and the DFB is used to link point discontinuities into linear structures. In the frequency domain, the structure of LP with orthogonal filters, followed by a two-dimensional DFB, provides a multiscale and directional decomposition to obtain sparse expansions for images having smooth contours. And then a high-order Local Multiresolution Derivative Pattern (LMDP) can be used for capturing the detailed discriminative information of contours of mouth, eyes and eyebrows in sparse representation images of fatigue expressions. The structure of Local Multiresolution Derivative Pattern (LMDP) with 2 levels is presented in Fig. 2. Bandpass images from LP are fed into a DFB so that directional information can be captured. The scheme can be iterated on the coarse images, which can be decomposed into directional subbands at multiple scales. With orthogonal filters, the BFB is an orthogonal transform.
The Laplacian Pyramid (LP) decomposition with orthogonal filters at each level generates a downsampled lowpass version of the original and the difference between the original and the prediction, resulting in a bandpass image. In this paper, the “9-7” biorthogonal filters is adopted in the LP stage. Figure 3 depicts this decomposition process, where H, G and M are called (lowpass) analysis filter, symthesis filter and sampling matrix respectively. This process can be iterated on the coarse (downsampled lowpass) image, and the outputs are a coarse approximation a[n] and a difference b[n] between the original image and the prediction.
The orthogonal lowpass filter G in a multilevel LP is an orthogonal scaling function φ (t) ∈ L2 (R) that generates a multiresolution analysis (MRA) represented by a sequence of nested subspaces {V
j
} j∈Z, …V2⊂ V1 ⊂ V0 ⊂ V-1 ⊂ V-2 … with Closure (⋃ j∈ZV
j
) = L2 (R) and ⋂j∈ZV
j
={ 0 }. The scaling function φ is specified from the filter G via the two-scale equation:
Denote F
i
(0 ≤ i ≤ |
These functions also generate families of scaled and translated functions as
Using the two-scale equations for φ (t) and ψ (t), the coarser image c(j) [n] and the detail image d(j) [n] in j-th scale of the LP decomposed image using the “9-7” symmetric biorthogonal filters is the inner product of the input sequence c(j-1) [n] at the j - 1 scale can be written as
The directionality, which is a crucial feature for an efficient image representation, is supported by recent studies to identify the sparse components of natural images. A simplified construction for the two Dimensional Filter Bank (DFB) was proposed [24], which is efficiently implemented via an l-level binary tree decomposition that leads to 2 l subbands with wedge-shaped frequency partitioning. The simplified DFB is intuitively constructed from two building blocks, which are a two-channel quincunx filter bank with fan filters and a shearing operator. Using multirate identities, it is instructive to view an l-level tree-structured DFB equivalently as a 2 l parallel channel filter bank with overall sampling matrices, as shown in Fig. 4.
Consider a Multi-channel Directional Filter Bank (MDFB), which results from 2
l
channels with equivalent filters and diagonal sampling matrices . Obviously, all the channels in the MDFB have the same sampling density, which is equal to the number of channels, as det(S
k
) = 2
l
, for k = 0, …, 2
l
- 1. Each bandpass image d(j) [n] is further decomposed by the 2
j
level DFB into the bandpass directional images , k = 0, 1, …, 2
j
- 1. The “23–45” biorthogonal quincunx filters is adopted in the DFB stage.
The family obtained from the time-reversed version of the analysis filters E k , k = 0, …, 2 l - 1. The directional filter bank is a powerful mechanism for decomposing images into local and directional expansions. The filter bank is implemented efficiently with a tree structure.
Let is a center point in low frequency content of LP image, and , i = 1, …, 8, is the 8 neighboring points around , among which the point is the right side point of . Let is a center point in high frequency content of DFB image in k direction, and , i = 1, …, 8, is the 8 neighboring points around , among which the point is the right side point of . The first order Derivative Patterns (DP) of 8 neighboring points along the horizontal direction at point and point can be written as
The first-order Low Frequency Derivative Pattern (LFDP) at point and the first-order High Frequency Derivative Pattern (HFDP) at point at point are defined as the concatenation of the first-order derivatives at 8 neighboring points
The second-order LFDP and the second-order HFDP are defined as
In a general formulation, the nth-order LFDP n and the nth-order HFDP n are a binary string describing gradient trend changes in a local region of directional (n - 1) th-order derivative images LFDP n and HFDP n - 1
To extract the discriminative LMDP features of a fatigue expression image, the spatial histograms can be used to model the distribution of the high-order LMDP, because it is more robust against variations in an illumination than the holistic methods. In this paper, taking the spatial histograms of the subregions and concatenating them into an enhanced feature vector as the fatigue expression image descriptor. The special histogram of a fatigue expression image is represented as
To balance the identification accuracy and feature length, we selected the parameters of the 16×16 sub-regions with 32 histogram bins for representing the LMDP images. And the joint feature of histogram is 1×8192(16×16×32) dimension. LMDP offers a much richer set of directions and shapes, and thus they are more effective in capturing smooth contours and geometric structures in images. The visualized feature results of the 2th-order LDP and the 2-th-order LMDP for a fatigue expression image Fig. 1(b) are shown in Fig. 5.
Support vector machines (SVMs) are supervised learning models with associate learning algorithms that analyse data and recognize patterns, used for classification [25]. The k-nearest neighbour algorithm (k-NN) is a method for classifying objects based on closest training examples in feature space [29]. Multilayer perception (MLP) classifier is a modification of the standard linear perceptron [30], which can distinguish data that is not linearly separable. In Dissimilarity-based classifier [31], the dissimilarities computed between an object and its prototypes are used for object’s classification. The above four classifiers are most commonly used in pattern classification, and adopted to classify fatigue expressions in our study.
Intersection Kernel Support Vector Machines
Support Vector Machines (SVMs) were originally designed for binary-class classification problems, and the basic principle of binary-class SVMs is to find an optimal separating hyperplane (OSH) and separate two classes of patterns based on the training set and the decision boundary [25]. In order to solve the multi-class problem, a variety of schemes have been proposed in the literature [26, 27], such as One Against All (OAA), One Against One (OAO), Directed Acyclic Graph (DAA), Error Correcting Output Coding (ECOC), and Multiclass objective function by adding bias to the objective function. Recently, Maji et al. proposed a fast IKSVM with an approximation scheme whose time and space complexity is O (n), independent of the number of support vectors, and have been shown to be successful for the objects’ detection and recognition [28]. In this paper, we adopt the fast IKSVMs for the classification of fatigue expressions of vehicle drivers. The key idea of fast IKSVMs proposed by Maji et al. is that for a class of kernels including the intersection kernel, the classifier can be decomposed as a sum of functions, one for each histogram bin, each of which can be efficiently computed. For feature vectors
and classification is based on evaluating
Thus the runtime complexity of computing h (
k-nearest neighbor (kNN) classifier [29] is a method for classifying objects based on closest training examples in the feature space. KNN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification. The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of its nearest neighbor. The kNN rule is optimal in the asymptotic case, i.e., the error tends to the Bayes error when the size of the training set tends to infinity. The major drawback of the kNN algorithm is the computational complexity, caused by the large number of distance computations.
Multilayer perception (MLP) classifier
A Multilayer perception (MLP) classifier is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate output, and consists of a set of source nodes forming the inputlayer, one or more hidden layers of computation nodes, and an output layer of nodes [30]. The MLP utilizes a supervised learning technique called backpropagation for training the network and constructs input-output mappings that are a nested composition of nonlinearities with the form
In the dissimilarity-based classification, the dissimilarity measure D (x i , R) ={ d (x i , p1) , …, d (x i , p r ) } between an object x i ∈ T, 1 ≤ i ≤ n, and the prototypes R ={ p1, …, p r }, is a vector with r distance that associates x i with all objects in the representation set R [31]. Therefore, the proximity D (T, R) is a dissimilarity matrix of size n × r, which refers objects in the training set to all objects in the representation set. Given a test set S, its representation D (S, R) is obtained by calculating the distances between its objects and prototypes in R. The dissimilarity measure is small when the objects x i and p h are similar, but it should be larger when the objects are more different. The distance d (x i , p h ) = 0 when x i and p h are identical. In this paper, the Euclidean distances are used for the dissimilarity representation between the objects and prototypes.
Experiments
Two standard experimental procedures, named the holdout approach and the cross-validation approach, are used to evaluate the performance of LMDP versus LDP using IKSVMs classifier, compared with other three commonly used classifiers, namely, kNN classifier, MLP classifier and Dissimilarity-based classifier. SEU fatigue facial expression dataset, which is shown in Fig. 6, is used for the performance evaluation of the above two feature extraction approaches and four classifiers in the holdout and cross-validation experiments. There are 3 classes of fatigue expressions, i.e., awake, moderate fatigue and severe fatigue in SEU fatigue facial expression dataset. Each class of fatigue expressions consists of 40 images, which are captured in different time under the natural lighting conditions, as the car was in an outdoor parking lot.
Holdup experiments
Holdout experiments are based on randomly dividing feature vectors of fatigue expression images, extracted from images in SEU fatigue facial expression dataset, into a training dataset (80% feature vectors of fatigue expression extracted from images in SEU fatigue expression dataset) and a test dataset (the rest 20% feature vectors of fatigue expression extracted from the images in SEU fatigue expression dataset). Using the holdout experiment approach, only the test dataset is used to estimate the generalization error. We repeat the holdout experiment 100 times by randomly splitting the fatigue expression dataset, and recorded the classification results.
The comparative experiments between LMDP and LDP are first conducted in the first holdout experiments. The results of classification rate for fatigue expressions using LMDP versus LDP by IKSVMs classifier are displayed in the bar plots of Fig. 7(a) and box plots of Fig. 7(b). From Fig. 7, it is obvious that the average recognition accuracy of fatigue expressions is significantly improved using LMDP than using LDP. The experimental results in Fig. 7 also demonstrated that the high-order LMDP offers the better performance than the first-order LMDP, but the performance drops when the order of LMDP reach to the third-order and four-order. The experimental results reveal that the high-order local patterns, such as the second-order LMDP, can extract more detailed information than the first order, but it is incapable of dealing with further detailed information contained in the higher-order LMDP, such as the third-order and four-order LMDP.
In the second holdout experiment, the same set of training and testing are applied to IKSVMs, compared with kNN classifier, MLP classifier and Dissimilarity-based classifier, and their classification performances are simultaneously compared. The results of classification rate in average for fatigue expressions by IKSVMs with other three classifiers are displayed in the bar plots of Fig. 8(a) and box plots of Fig. 8(b). The average classification accuracies of IKSVMs classifier, kNN classifier, MLP classifier and Dissimilarity-based classifier, are 92.92% , 81.83% , 92.79% and 89.08% , respectively. From Fig. 8, it is obvious that IKSVMs classifier offers the best performance among the four classifiers in the second holdoutexperiments.
The confusion matrix, which represents the proportion of examples from one class classified into another class, is often used for further measuring the classification performance regarding the information about actual and predicted classifications acquired. In the holdout experiment, the confusion matrix that summarizes the detailed performance using the second-order LMDP and IKSVMs classifier is shown in Table 1. In the confusion matrix, the rows and columns indicate true and predicted class, respectively. The diagonal entries represent correct classification, while the off-diagonal entries represent incorrect ones. The rows and columns of confusion matrices express the fatigue expression classes of awake expressions, moderate fatigue expressions, and severe fatigue expressions. The corresponding classification accuracies are 88.24% , 98.77% and 90.46% , respectively. From confusion matrix of the holdout experiments, it is clear that the class of awake expression has the most recognition accuracy of three classes in the holdout experiments.
Cross-validation experiments
The k-fold cross validation approach is another commonly used technique that takes a set of m examples and randomly partitions them into k folds of size m/k. For each fold, the classifier is tested on one fold (consists of m/k examples) and trained on the other k–1 folds (consisting of m(1–1/k) examples) [32]. The cross-validation experiments are then repeated k times, with all of the k sub-samples used exactly once as the validation dataset, and the k experiment results from the folds are then averaged to produce a single classification rate. In this paper, 5-fold cross validation was used when comparing LMDP versus LDP using IKSVMs classifier and other three classifiers, i.e., kNN classifier, MLP classifier and Dissimilarity-based classifier. One set of feature vector corresponds to one fatigue expression image, and there are 120 sets of feature vectors extracted from the images of SEU fatigue facial expression dataset. The 120 sets of feature vectors of fatigue expression images are randomly divided into 5 disjoint subsets of equal size, and each subset consists of 24 sets of feature vectors. The cross-validation experiments were repeated 100 times by randomly splitting the fatigue expression dataset, and recorded the classification results in average.
In the first cross-validation experiment, the classification rates for fatigue expressions using LMDP versus LDP by IKSVMs classifier are displayed in the bar plots of Fig. 9(a) and box plots of Fig. 9(b), which shows again that the recognition accuracy of fatigue expressions in average is significantly improved using the proposed LMDP than using LDP. The experimental results in Fig. 9 also demonstrated that the performance of LMDP drops when the order reaches the third-order and four-order, and it reveals that the high-order LMDP can extract more detailed information than the first order. But the third-order and four-order LMDPs aren’t incapable of dealing with the further detailed information contained in a fatigue expression image.
In the second cross-validation experiment, the same set of training and testing are applied to the second-order LMDP and IKSVMs, compared with kNN classifier, MLP classifier and Dissimilarity-based classifier, and their classification performances are simultaneously recorded. The average classification accuracies of the 100 cross-validation experiments using the proposed LMDP and IKSVMS classifier, compared with kNN classifier, MLP classifier and Dissimilarity-based classifier, are displayed in the bar plots of Fig. 10(a) and box plots of Fig. 10(b). From Fig. 10, it is obvious IKSVMs classifier outperforms the other three classifiers, because it achieves the highest classification rates in the second cross-validation experiments.
In the cross-validation experiment, the confusion matrix that summarizes the detailed performance of the proposed second-order LMDP and IKSVMs classifier is shown in Table 2. The accuracies of three classes, (i.e., awake facial expression, etc.) are 89.80% , 99.57% , and 90.02% , and once again, it is clear that the class of awake expression has the most recognition accuracy of three classes in the cross-validation experiments.
Discussions
Most of the current techniques require that sensor devices must be attached to driver’s clothing or body, which is not a natural way to monitor a driver’s activities. It is unlikely that drivers would accept any tethered sensing solution, i.e. using wired sensors or wireless sensors attached to their bodies. The noninvasive technological approaches for detecting driver’s fatigue are only based on the local features of driver images, such as eye closure, eyelid movement, blink measurement, head pose and gaze direction. DD850 Driver Fatigue Monitor (DFM), designed by Attention Technology Inc., is a video-based drowsiness detection system for measuring slow eyelid closure, and the field of view is large enough to accommodate normal head movement. The disadvantages of DFM are that every driver has the individual eyes and it is hard to determine a universal drowsiness threshold. Driver State Monitor (DSM), developed by Delphi Inc., consists of the ForeWarn Drowsy Driver Alert system and the ForeWarn Driver Distraction Alert systems, and DSM analyses eye closures and head pose to infer fatigue or distraction level. InSightTM, developed by SensoMotoric Instruments GmbH (SMI), is a noninvasive computer-vision based operator monitoring system that measures head position and orientation, gaze direction, eyelid opening, and pupil position and diameter. InSightTM calculates PERCLOS to determine a driver’s state of alertness. Video-based Eye Tracking Systems ETS-PC II, developed by Applied Science Laboratories, and Eye gaze Analysis System, developed by LC Technologies Inc., are both the eye trackers-based systems utilizing the pupil reflection technique for measuring eye movements. Drowsy Driver Detection System, designed by Johns Hopkins University Applied Physics Laboratory, can monitor and quantitatively measure the speed, frequency, and duration of eyelid closure, rate of heartbeat and respiration, and pulse rate by analyzing the Doppler components in the reflected signal. RPI computer vision system for monitoring driver vigilance by Rensselaer Polytechnic University can simultaneously and unobtrusively monitor in real time several behaviors that typically characterize a driver’s level of alertness, and these visual cues include eyelid, gaze movements, pupil movement, head movement and facial expression. Drowsiness Detecting System based on Artificial Neural Network (ANN), developed by George Washington University (GWU), and the ANN observes the steering angle patterns and classifiers them into drowsy and non-drowsy driving intervals. GWU researchers trained and tested the ANN by conducting a driving simulator experiment. Different drivers have individual differences on eye blinking. Moreover, it is also affected by environment factors such as outside road lighting and oncoming headlights. Therefore, it is hard to determine the preset drowsiness threshold on eye blinking.
Image-based fatigue detection would be more applicable in a consumer car without the requirement of special markers or user intervention. A decisive step in developing image-based driver fatigue degree recognition system is to extract suitable features from the driver’s images and characterize differences between different driver fatigue expressions. When being fatigue, the drivers will repeat yawning, and have difficulties in keeping eyes open. The key features of fatigue expression images are locations and contours of mouth, eyes and eyebrows, which are localized in both location and direction. The images of fatigue expressions contain intrinsic geometrical structure (locations and contours of mouth, eyes and eyebrows), which are key features in visual information for fatigue expression recognition. In the paper, we presented a more reliable fatigue monitoring system based on comprehensive characterization of driver’s facial expressions, including eyes closure, position of eyelid, motion of cheek muscle and mouth movement. An efficient feature extraction approach, named as Local Multiresolution Derivative Pattern (LMDP), was proposed to describe the fatigue expressions of vehicle drivers. The experimental results of the holdout and cross-validation show that the recognition rates of driver’s fatigue expressions in average are above 92% .
The severe fatigue is the most dangerous situation since the driver may lose control of the vehicle and the chances of accidents could dramatically increase. Although our fatigue monitoring system obtains higher average recognition rates of driver’s fatigue expressions, the severe fatigue classification accuracy of 90% is lower than the class of moderate fatigue expressions. It is observed that the main difference between awake expressions of SEU image database is the openness of the eyes. So that the two types of driver’s expressions are likely to be misclassified to each other. In the future work, we will continue our investigations on feature extraction approaches of drivers’ fatigue expressions to enhance the eyes-around or eye-related features to further improve the performance of detecting the severe fatigue. The efficiency of our proposed system is also very important in the real situation, and how to improve the system efficiency of drivers’ fatigue recognition is worth our study in the future research.
Conclusions
Fatigue expressions recognition of vehicle drivers are investigated, and three contributions are presented in this paper. Firstly, in order to investigate the approach of recognizing the driver’s fatigue degree using comprehensive characterization of driver’s facial expressions, a fatigue expression dataset, named Southeast University (SEU) fatigue expressions dataset, is presented, and the SEU fatigue expression dataset consists of 40 subjects with 3 different sessions per subject and in total 120 sessions. Secondly, we proposed a novel local descriptor, named Local Multiresolution Derivative Pattern (LMDP), to describe fatigue expression images, and the experimental results show the more feasibility and effectiveness of high-order LMDP than high-order LDP for fatigue expression recognition. Thirdly, the confusion matrixes of the holdout and cross-validation experiments show that the class of moderate fatigue expressions has the most recognition accuracy of three, and the severe fatigue classification accuracy of 90% is lower than the class of moderate fatigue expressions.
Footnotes
Acknowledgments
This project is funded by National Natural Science Foundation of China (Project No. 51078087).
