Driver drowsiness detection system based on infinite feature selection algorithm and support vector machine

Abstract

In recent times, driver drowsiness is one of the major reasons for road accidents that leads to severe physical injuries, deaths and significant economic losses. Hence, the existing driver drowsiness detection systems require a countermeasure device for the prevention of sleepiness related accident. This research paper aims to perform drowsiness detection with the help of driver’s eye state, head pose, and mouth state information. Initially, the input data were collected from the public drowsy driver database. Then, the Camera Response Model (CRM) was applied to improve the quality of collected data. Also, viola-jones, and Kanade-Lucas-Tomasi (KLT) approaches were used to detect and track the driver’s face, eye, and mouth regions from the input video. In this research study, Online Region-Based Active Contour Model (ORACM) algorithm was used to segment the driver’s mouth region in order to obtain the threshold value. Successively, feature extraction; Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP) was applied to extract the features from the detected eye region. The extracted features of the eye region were combined with the threshold value of mouth region and head pose angle. After extracting the feature vectors, infinite approach was utilized to choose the relevant feature vectors. Finally, the selected features were classified using Support Vector Machine (SVM) for classifying the stages of drowsiness detection. Simulation outcome illustrated that the proposed system increased the classification accuracy up to 5.52% as related to hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM).

Keywords

Histogram of oriented gradients infinite feature selection algorithm Kanade-Lucas-Tomasi local binary pattern online region based active contour model support vector machine

1. Introduction

In recent decades, driver drowsiness is the biggest safety issues in road transportation. In order to prevent the on road or run-off road accidents, an on-board driver drowsiness detection system in vehicles is necessary [1, 2]. The drowsiness detection assesses different measures like visual features, vehicle behaviour, physiological features, etc. In vehicle based measures, a number of metrics included for detecting the driver drowsiness such as, steering wheel movement, lane departure and pressure on the acceleration pedal [3]. The main issue with this technique is that the accuracy depends on the individual properties of the vehicle and driver. So, the techniques based on visual features like yawning, facial expressions, head movement and eye state showed an effective performance in driver drowsiness detection, because of its non-contact in nature [4, 5, 6]. The visual features based techniques have emerged as the promising field of research for driver drowsiness detection. The techniques on the basis of yawning cannot predict the drowsiness onsets, because this feature does not represent the drowsiness [7, 8, 9]. In contrast, eye state information (eye close/eye open), head pose, and mouth state are well suited for drowsiness detection system, because the unusual eye blinking pattern and the opening and closing of eyes directly indicate the onset of the drowsiness [10, 11, 12]. This paper attempts driver drowsiness detection using behavioural methods based on machine learning approaches to classify the sub-stages of driver’s drowsiness.

In this work, a new system is proposed to improve the accuracy of driver drowsiness detection. Initially, the data were collected from public drowsy driver dataset from ACCV 2016 competition. After data collection, CRM was applied to enhance the collected data quality. Then, viola-jones, and KLT methods were used for detecting and tracking the drivers face, mouth and eye regions of the input video. In addition, ORACM algorithm was utilized for segmenting the driver’s mouth region for obtaining the threshold value (height and width of the mouth region). Then, feature extraction methods; HOG and LBP were applied to extract the features from the detected eye region. Feature extraction extracts the local features and global features from the detected eye region. The extracted features of eye region were combined with the threshold value of mouth region (detection threshold value is 90 on the input frame) and head pose angle. After obtain the feature and threshold values, infinite algorithm were used for diminishing the extracted features dimension by eliminating the irrelevant features. The selected feature values were classified by SVM classification approach to classify the stages of driver’s drowsiness; drowsy or non-drowsy. Finally, the performance of infinite algorithm with SVM was compared to hybrid CNN-LSTM in light of specificity, f-score, sensitivity and classification accuracy.

A few research papers on driver drowsiness detection are surveyed in Section 2. In Section 3, proposed system is explained briefly with mathematical expression. The simulation result of the proposed system is stated in the Section 4. Conclusion is detailed in the Section 5.

2. Literature survey

Researchers developed numerous researches on different stages of driver drowsiness detection. Here, some key contributions to the existing literatures is presented.

de Naurois et al., [13] utilized Artificial Neural Networks (ANNs) for detecting the driver’s drowsiness level or for predicting the onset of an impaired driving state. Here, two ANN based approaches were utilized for predicting the level of drowsiness and to detect the time period, how long driver takes to reach moderately drowsy state. Here, the drowsiness detection performance of developed methodology was improved approximately 80% in the detection and 40% in prediction compared to the existing systems. The subject-specific adaptation of driver’s data delivers a better response to the issues of high inter and intra-individual variability. Besides, the developed research work did not concentrate on different road conditions and time.

Panicker and Nair, [14] developed a new drowsiness detection system that comprises of three main phases. The first phase was face detection, which was accomplished using a template matching approach and elliptical approximation technique. In the second phase, iris-sclera pattern analysis was used to detect the open eye. In the third phase, PERCLOS measure was used to determine the driver drowsiness state. The developed system was independent to any datasets for eye or face detection. In this study, the developed system uses morphological and Laplacian operations for open eye detection. Hence, the iris was extreme right or left within the eye, while the driver looks at the outside. In such conditions, the developed system failed in detecting the sclera symmetry.

McDonald et al., [15] evaluated temporal and contextual algorithms for detecting drowsiness-related lane. The developed method uses pedal input, steering angle, acceleration, and vehicle speed as input. In this research study, acceleration and speed were utilized for developing a real-time measure of driving context. These measures were combined with a dynamic Bayesian network, which considered the time dependencies in transition between awake state and drowsiness. This research study includes a few problems; scope of the ground truth drowsiness, use of a driving simulator, and size of the test dataset.

Guo and Markoni, [16] developed hybrid classifier: CNN and LSTM for driver drowsiness prediction. The hybrid model performed with low computational cost and better classification accuracy. The developed model was tested on public drowsy driver database [12]. Simulation result showed that the developed model performance was investigated in light of classification accuracy. If the dimension of the extracted features were high, the classification was quite difficult. Additionally, the CNN model performance depends on the amount of input data, if the data was fewer the CNN model performs poorly.

Zhao et al., [17] developed a new system for recognizing the driver drowsiness expression utilizing Deep Belief Network (DBN) and facial dynamic fusion information. Initially, the textures and landmarks of the facial regions were extracted from the videos, which were captured by using a high-definition camera. Then, DBN was utilized for classifying the driver’s facial drowsiness expressions. The experimental outcome exhibited the superiority of this system. This research study did not concentrate on the occlusion and large head rotation, which significantly diminishes the efficiency of the developed system.

A new system is proposed in this paper to address the above-mentioned issues and for improving the detection of driver drowsiness.

3. Proposed system

In this research, the proposed system contains six stages such as collection of data, pre-processing, object detection and tracking, extraction of features, selection of optimal features and driver’s drowsiness classification. Figure 1 shows the flow diagram of the proposed system and it is briefly explained below.

Figure 1.

Flow diagram of infinite algorithm with SVM classifier.

3.1 Data acquisition

At first, the data are acquired from public drowsy driver database from ACCV 2016 competition [12]. In this dataset, the video frame captured by using D-Link DCS-932L with the resolution of 640 $\times$ 480 pixels and the size of the dataset is 5.246 Gb. Here, the video captured for several people with different activities like talking, still condition, yawning, etc. In addition, this dataset comprises of two sets, which are validation set and train set. The video length of the validation set ranges from 1 to 10 minutes. Correspondingly, the video length of the training sets ranges from 1 to 1.5 minutes. In addition, the collected dataset comprises of four types ground truth: eye ground truth, mouth ground truth, head ground truth, and drowsiness ground truth. Eye ground truth evaluates whether the eyes of the individuals is sleepy or normal. Mouth ground truth specifies whether the individuals mouth is yawing or talking, closed or still. Head ground truth represents the individual’s head action (looking aside, nodding or stillness). At last, the drowsiness ground truth indicates whether the individual is drowsy or not. The sample images of public drowsy driver dataset are denoted in Fig. 2.

Figure 2.

Sample images of public drowsy driver dataset.

3.2 Pre-processing of collected data

After data acquisition, pre-processing is performed using CRM in order to reduce the noise. Normally, the camera manufacturers use a few nonlinear features in the camera, for instance, demosaicing and white balance for enhancing the visual quality of the images. The CRM contains two major components; Brightness Transform Function (BTF) and Camera Response Function (CRF). The parameters of CRF are determined only by using the camera, where the BTF is determined by using exposure ratio and the camera. Initially, BTF is calculated based on the observation of two dissimilar exposure images. Then, derive the corresponding CRF by solving the comparametric equation. These two functions are mathematically described in the Eqs (1) and (2).

3.2.1 Brightness transform function

At first, brightness transform function selects two frames such as, $P_{0}$ and $P_{1}$ to calculate the BTF and its value in exposure. Then, construct a histogram for each and every color channel in the input image. According to the histogram plot, under-exposed image highly concentrated on low brightness area. Respectively, the gamma values represent the linear amplification of the resultant image pixels that are closely same as the real well-exposed image, which is mathematically described in the Eq. (1).

$\displaystyle P_{1}=g({P_{0},k})=\beta P_{0}^{\gamma}$ (1)

Where, $\beta$ and $\gamma$ are represented as the parameters of BTF that are related to the exposure ratio $k$ . The observation shows that dissimilar color channels have similar model parameters, because the response curves of colour channels are nearly similar.

3.2.2 Camera response function

The CRF calculates the relationship between BTF parameters such as $\beta$ and $\gamma$ . The CRF is derived by solving the following comparametric Eq. (2).

$\displaystyle f({kE})=\beta f(E)^{\gamma}$ (2)

If $\gamma=1$ , the CRF becomes a power function and the BTF becomes a simple linear function. As some camera manufacturers design $f$ to be a gamma curve, which fits the cameras perfectly. If $\gamma\neq 1$ , the CRF becomes a two-parameter function and BTF becomes a non-linear function. Sample pre-processed images of public drowsy driver dataset are shown in Fig. 3.

Figure 3.

Sample pre-processed images of public drowsy driver dataset.

3.3 Object detection and tracking

After pre-processing the collected data, viola-jones, and KLT methodologies used for detecting and tracking the drivers face, mouth and eye regions of the input video. Then, the head pose angle estimated from the drivers face region for every video frame. In addition, ORACM algorithm is utilized to segment the driver’s mouth region for obtaining the threshold value on the basis of height and width of the mouth region. ORACM is a region active contour method that does not require additional parameters, so the segmentation accuracy is very significant related to the conventional ACMs approaches. In every iteration, ORACM performs a sort of block thresholding procedure. Successively, a thresholding procedure generates many minor particles and rigid boundaries. Here, an effective morphological operation is implemented for obtaining proper and smooth object contour and also to eliminate the minor particles.

As similar to other ACM algorithms, ORACM utilizes a user-defined active contour at the initialization step and then continuously updates it. Level set function $\varphi({x,y})$ has different sign symbols such as, $-$ 1 and $+$ 1, which represents inside and outside of the contour. Unlike other ACM algorithms, ORACM utilizes a simple and significant level updating formulation, which is specified in Eq. (3).

$\displaystyle\frac{\partial\varphi}{\partial({x,y})}=H({\textit{spf}({I({x,y})% })})\times\varphi({x,y})$ (3)

Where, $H(.)$ is represents Heaviside function, $I({x,y})$ is an input image (video frame) and $\varphi({x,y})$ is denoted as current level set. Consecutively, feature extraction (LBP and HOG) was applied on the detected eye region for extracting the feature values. Figures 4 and 5 represents the extracted regions from the original video.

Figure 4.

a) Pre-processed image, b) extracted face region, c) extracted eye region, and d) extracted mouth region.

Figure 5.

a) Face image, b) eye blink, c) yawning, and d) head bending.

3.4 Feature extraction

After eye region detection, feature extraction is performed on the detected eye regions. In this study, a high level texture features (HOG and LBP) utilized for extracting the features from detected eye regions. The brief description of HOG and LBP are detailed below.

3.4.1 Histogram of oriented gradients

In HOG descriptor, a gradient operation $O$ is utilized for evaluating the gradient value in the input images. Additionally, the gradient points of the video frames are denoted as $G$ and the input video frames (images) are indicated as $I$ . By using Eq. (4), the gradient point of the image is calculated.

$\displaystyle G_{x}=O\times I({x,y})\text{ and }G_{x}=O^{T}\times I({x,y})$ (4)

Then, windows in the input images are categorized into several spatial regions, which are called as cells. In HOG feature descriptor, the gradient magnitude of the pixel is denoted with orientation of the edge. The gradient magnitude of the pixel $({x,y})$ is calculated by Eq. (5). In addition, the edge orientation of the pixel $({x,y})$ is calculated using Eq. (6).

$\displaystyle G_{x}({x,y})=\sqrt{G_{x}({x,y})^{2}+G_{y}({x,y})^{2}}$ (5) $\displaystyle\theta({x,y})=\text{tan}^{-1}\frac{G_{y}({x,y})}{G_{x}({x,y})}$ (6)

Where, $G_{y}$ is stated as gradient vertical direction and $G_{x}$ is denoted as gradient horizontal direction. After calculating the value of histogram, normalization is accomplished to enhance the illumination conditions and invariance of noise. The normalization is used to improve the contrast of the image and to measure the local histogram values In HOG feature descriptor, four normalization techniques are available like L2-Hys, L1-norm, L1-Sqrt and L2-norm. Among these techniques, L2-norm is efficient that delivers good performance in driver drowsiness detection, which is given in Eq. (7).

$\displaystyle L_{\textit{2-norm}}:f=\frac{h}{\sqrt{||h||}_{2}^{2}+e^{2}}$ (7)

Where, $||h||_{2}^{2}$ is stated as 2-norm of HOG, $f$ is denoted as extracted feature vectors, $h$ is represented as non-normalized function, and $e$ is denoted as small positive value. The output of HOG is graphically represented in Fig. 6.

Figure 6.

Output image of HOG.

3.4.2 Local binary pattern

On the basis of luminance value, LBP converts the images into labels. Hence, the gray scale invariance is a vital factor in LBP, which is based on texture and local patterns. The pixel position is stated as $x$ and $y$ in every frame $f$ that is calculated utilizing central pixel value $x_{c}$ of $x$ as the threshold value in order to signify the neighbor pixel $m$ . The pixel binary value is weighted utilizing the power of 2 and then summed for creating a decimal number for storing in the central pixel $x_{c}$ location that is given in the Eq. (3.4.2).

$\displaystyle\textit{LBP}({x,y})=\mathop{\sum}\limits_{i=0}^{m-1}f(x_{i}-x_{c}% )^{2i},$ $\displaystyle f(x)=\left\{{{\begin{array}[]{*{20}c}{1,x\geqslant 0}\\ {0,x\leqslant 0}\\ \end{array}}}\right\}$ (8)

Where, $x_{i}$ represented as central pixel of a local neighbourhood. The basic binary pattern operator is graphically represented as follows. For instance,

In LBP, p-neighbourhood delivers $2^{p}$ output, which leads to more possible patterns. When the texture area is small, the LBP descriptor is ineffective. While the jumping time increases, the uniform model of LBP is achieved and it is calculated utilizing the Eq. (3.4.2).

$\displaystyle U({\textit{LBP}({x,y})})=|{f({x_{c-1}-x_{i}})-f({x_{0}-x_{i}})}|$ $\displaystyle\quad{}+\mathop{\sum}\limits_{i=1}^{m-1}|{f({x_{c}-x_{i}})-f({x_{% c-1}-x_{i}})}|$ (9)

Where, $u$ is specified as maximum jumping time. The extracted features of eye region, head pose angle and the threshold value of mouth region are combined. This information is given as the input to the infinite feature selection algorithm in order to perform feature selection.

3.5 Feature selection

In this study, infinite algorithm is employed to choose the optimal features. Given a set of features $F=\{{f^{(1)},\ldots f^{(n)}}\}$ and $x\epsilon R$ signifies a sample of generic distributions $f$ . Then, an in-directed fully connected graph $G=({V;E})$ ; $V$ is developed for each feature distributions, where $E$ codifies the edges for pair wise feature distributions. Representing $G$ as an adjacency matrix and $A$ that specifies the nature of weighted edges: every element $a_{ij}$ of $A$ , $1\leqslant i$ ; $j\leqslant n$ states a pair wise energy term. Energies are represented as a weighted linear combination of two simple pair wise measures linking $f^{(i)}$ and it is defined in the Eqs (10)–(12).

$\displaystyle a_{ij}=\alpha\sigma_{ij}+({1-\alpha})c_{ij}$ (10)

Where,

$\displaystyle\sigma_{ij}=\max({\sigma^{(i)}-\sigma^{(j)}})$ (11) $\displaystyle c_{ij}=1-|{\textit{Spearman}({f^{(i)},f^{(j)}})}|$ (12)

Where, $\alpha$ is loading coefficient that ranges from [0; 1], $\sigma^{(i)}$ is indicated as standard deviation of the samples $\{x\}\epsilon f^{(i)}$ , and Spearman is denoted as spearman’s rank correlation coefficient.

In practice, $a_{ij}$ connects two feature distributions, accounting for the maximal feature dispersion and correlation. Note that the standard deviation is normalized by the maximum standard deviation over the set $F$ and $|{\textit{Spearman}({f^{(i)}f^{(j)}})}|\epsilon[{0,1}]$ , so the two measures are comparable in terms of magnitude. The idea is that, suppose $\alpha=0:5$ , a high $a_{ij}$ indicates at least one feature among $f^{(i)}$ and $f^{(j)}$ could be discriminant, since it covers a large feature space, and $f^{(i)}$ and $f^{(j)}$ are not redundant. Then, pair wise analysis of feature is attained to individuate the energy associated with sets larger two feature distributions.

Let $\gamma=\{{v_{0}=i,v_{1},\ldots,v_{1-1},v_{1}=j}\}$ , which is denoted as a path of length $l$ between the vertices $i$ and $j$ of features. No features are visited more than once, if the length $l$ of the path is lesser than the total number of features $n$ , and the path has no cycles. The energy $\gamma$ is defined in Eq. (13).

$\displaystyle\varepsilon_{r}=\mathop{\prod}\limits_{k=0}^{l-1}a_{v_{k},v_{k}+1}$ (13)

Where, $\varepsilon_{r}$ accounts for the pairwise energies of all the feature pairs that compose the path, and it is assumed as the joint energy of the subset features. Then, define $P_{i,j}^{l}$ that contains all the paths of length $l$ between $i$ and $j$ in order to account the energy of all the paths of length $l$ that is mathematically given in the Eq. (14).

$\displaystyle R_{l}({i,j})=\mathop{\sum}\limits_{\gamma\epsilon P_{i,j}^{l}}% \varepsilon_{r}$ (14)

The standard matrix algebra is given in the Eq. (15).

$\displaystyle R_{l}({i,j})=A^{l}({i,j})$ (15)

Now, $R_{l}$ consists of cycles by means of feature selection. By extending the path length to infinity, the probability of being part of a cycle is uniform for all the features and it is taken into account by the consideration of $R_{l}$ , so a sort of normalization comes into play. Then, find the single feature energy score at a given path length $l$ that is denoted in the Eq. (16).

$\displaystyle s_{l}(i)=\mathop{\sum}\limits_{j\epsilon V}R_{l}({i,j})=\mathop{% \sum}\limits_{j\epsilon V}A^{l}({i,j})$ (16)

Therefore, the first idea of the feature selection method is to eliminate the non-redundant feature sub-sets. Unfortunately, the computation of $s_{l}$ is expensive $({O({({l-1}).n^{3}})})$ , so the computation turn out to be $O({n^{4}})$ and becomes impractical for large set of features. Infinite feature selection addresses this problem by expanding the path length to infinity $l\to\infty$ using algebra notions for simplifying the calculations in the infinite case.

3.5.1 Infinite set of features

The passage to infinity implies for calculating a new type of single feature score that is mathematically given in the Eq. (17).

$\displaystyle s(i)=\mathop{\sum}\limits_{l=1}^{\infty}s_{l}(i)=\mathop{\sum}% \limits_{l=1}^{\infty}\left({\mathop{\sum}\limits_{j\epsilon V}R_{l}({i,j})}\right)$ (17)

Let, $S$ be the geometric series of matrix $A$ that is given in the Eq. (18).

$\displaystyle S=\mathop{\sum}\limits_{l=1}^{\infty}A^{l}$ (18)

Where, $S$ is utilized to obtain $s(i)$ as given in the Eq. (19).

$\displaystyle s(i)=\mathop{\sum}\limits_{l=1}^{\infty}s_{l}(i)=\left[\left({% \mathop{\sum}\limits_{l=1}^{\infty}A^{l}}\right)e\right]_{i}=[S_{e}]_{i}$ (19)

Where, $r$ is represented as real valued regularized factor, and $r^{l}$ is interpreted as the weight for paths of length $l$ . For an algebraic point of view, $\check{s}(i)$ is efficiently computed by utilizing the convergence property of the geo-metric power series of a matrix as given in the Eq. (20). Matrix $\check{S}$ encodes all the information about the energy of feature sub-sets.

$\displaystyle\check{S}=({I-rA})^{-1}-I$ (20)

Then, obtain final energy scores for each feature by marginalizing the quantity as given in the Eq. (21).

$\displaystyle\check{s}(i)=[\check{S}_{e}]_{i}$ (21)

A rank for the feature is selected by decreasing the order of $\check{s}(i)$ energy scores. The ranking is utilized for determining the number of features $m$ to be selected by adopting SVM classifier and feeding it with a subset of the ranked features, starting from the most energetic one downwards, and keeping the $m$ for ensuring the highest classification score. The selected optimal feature vectors are given as the input for SVM classifier.

3.6 Classification

After the selection of optimal features, classification is accomplished by utilizing SVM to classify the non-drowsy and drowsy driver. By developing a relaxed classification error bound, the SVM classifier reduces the size of resulting dual problem. In addition, SVM classifier speeds up the testing and training processes by preserving a competitive classification accuracy. The SVM is a discriminative classification approach, which is represented by a separate hyper-plane. In recent decades, the SVM classification methodology is extensively utilized in many applications such as signal processing, bio-informatics, computer vision fields, etc., because it has the ability to perform in high dimensional data. Though, SVM classifier does well in solving the two-class issue that is associated with vapnik-Chervonenkis theories and structure principles. The formula to calculate the linear discriminant function is represented as $w.x+b=0$ . In SVM classifier, an optimum hyper plane is used between the two classes (drowsy and drowsiness) in order to distinguish the samples without noise, which is mathematically represented in Eq. (22).

$\displaystyle pi[{w.x+b}]-1\geqslant 0,i=1,2,..N$ (22)

Then, reduce $||w||^{2}$ in Eq. (22), so the optimization issue is resolved by the saddle point of a Lagrange function with Lagrange multipliers $\alpha_{i}$ . The ideal discriminant function is given in Eq. (23).

$\displaystyle f(x)=\textit{sign}{(w^{*}x)+b^{*}}=\textit{sign}\left\{\mathop{% \sum}\limits_{i=1}^{N}\alpha_{i}^{*}.pi({x_{i}^{*}-x})+b^{*}\right\}$ (23)

Finally, interchange the interior product $({x_{i}^{*}-x})$ by a linear kernel function $k({x,{x}^{\prime}})$ in Eq. (23) for diminishing the computational complexity in higher dimensional data. In this manner, the linear separability of estimated samples are improved and the discriminant function is re-written as denoted in Eq. (24).

$\displaystyle f(x)=\textit{sign}\left\{\mathop{\sum}\limits_{i=1}^{N}\alpha_{i% }^{*}.pi.k({x,x_{i}})+b^{*}\right\}$ (24)

4. Experimental study

For experimental investigation, MATLAB (2018a environment) was applied with 3.2 GHz, windows 10 operating system and i5 Intel core processor. In this research work, the proposed infinite algorithm with SVM classifier performance was related with hybrid CNN-LSTM [16] to analyse the efficiency of the proposed system. In this study, the proposed infinite algorithm with SVM classifier was analysed in light of f-score, specificity, accuracy and sensitivity on a reputed database: public drowsy driver dataset from ACCV 2016 competition. Mathematical expressions of f-score, specificity, accuracy and sensitivity are indicated in the Eqs (25)–(28).

$\displaystyle\textit{F-score}=\frac{2TP}{2TP+FP+FN}\times 100$ (25) $\displaystyle\textit{Specificity}=\frac{TN}{TN+FP}\times 100$ (26) $\displaystyle\textit{Accuracy}=\frac{TP+\mbox{?}TN}{TP+TN+FP+FN}$ $\displaystyle\quad{}\times 100$ (27) $\displaystyle\textit{Sensitivity}=\frac{TP}{TP+FN}\times 100$ (28)

Where, false negative is denoted as $F N$ , false positive is represented as $F P$ , true negative is indicated as $T N$ and true positive is specified as $T P$ .

Table 1

Performance investigation of the proposed system with dissimilar classifiers

Classifier	Subjects ID	Sensitivity (%)	Specificity (%)	F-score (%)	Accuracy (%)
Random forest	004	76	80	76.67	67.89
	022	78.90	80.45	79.09	76.92
	026	71.20	80.12	73.34	60.80
	030	79	84.44	75.55	60.60
KNN	004	89.98	82.22	85.39	76.51
	022	80.80	83.86	85	80.98
	026	85.4	84.43	83.09	83.98
	030	85	84.84	83	87.77
NN	004	78.15	81.78	82.54	80.22
	022	72.45	73.22	80.14	73.14
	026	79.32	81.47	84.23	81.24
	030	76.45	79.69	83.78	77.12
SVM	004	90	89.62	87.87	89.10
	022	92.56	90.59	90.44	88.13
	026	92.81	87.42	92.23	92.58
	030	91.45	91.57	91.90	91.67

4.1 Quantitative investigation on public drowsy driver database

In this scenario, public drowsy driver database is applied to analyse the performance of hybrid CNN-LSTM [16] and the proposed system. The public drowsy driver database comprises of 22 subjects in that four subjects (004, 022, 026, and 030) are utilized for testing evaluation and the residual subjects are used for training evaluation.

The mean classification accuracy of SVM is 90.37% and other classification methodologies such as random forest, K-Nearest Neighbour (KNN), and Neural Network (NN) delivers 66.55%, 82.31%, and 77.9% of mean classification accuracy. In addition, the mean sensitivity of SVM is 91.70% and other classifiers attain 76.27%, 85.29%, and 76.59% of mean sensitivity. Respectively, the mean specificity of SVM is 89.80% and other classifiers delivers 81.25%, 83.83%, and 79.04% of mean specificity. Additionally, the mean f-score of SVM is 90.61% and the existing classifiers such as random forest, KNN, and NN delivers 76.16%, 84.12%, and 82.67% of mean f-score. Tables 1 and 2 states that the infinite algorithm with SVM performs effectively in drivers drowsiness detection related to other classifiers on ACCV 2016 dataset. Graphical analysis of the proposed system performance is denoted in Fig. 7. The graphical analysis of the proposed system with dissimilar classifiers is stated in Fig. 8.

Table 2
Mean value of the proposed system with different classifiers

Mean value
Classifier	Sensitivity (%)	Specificity (%)	F-score (%)	Accuracy (%)
Random forest	76.275	81.25	76.16	66.55
KNN	85.295	83.83	84.12	82.31
NN	76.59	79.04	82.67	77.93
SVM	91.70	89.80	90.61	90.37

Figure 7.

Graphical representation of the proposed system performance.

Figure 8.

Graphical comparison of proposed system with dissimilar classifier.

Table 3 indicates the proposed system performance with infinite and without infinite algorithm. In with infinite algorithm, the SVM averagely enhanced the classification accuracy in driver drowsiness detection up to 2.86% related to with-out infinite algorithm. In this work, HOG and LBP effectively finds the linear and non-linear properties of drivers face, mouth and eye regions and also significantly preserves the relation between lower and higher level features. The performance measures (f-score, specificity, accuracy and sensitivity) confirm that the proposed infinite algorithm with SVM classifier performs well in driver’s drowsiness detection related to the existing system.

Table 3

Accuracy evaluation of proposed infinite algorithm with SVM classifier

Pre-processing	Feature extraction	Selection	Classifier	Subjects ID	Accuracy (%)
CRM	Combination of HOG, LBP, head pose angle,	Without infinite algorithm	SVM	004	87.22
	and threshold value of mouth region			022	86.11
				026	89.78
				030	86.96
		With infinite algorithm		004	89.10
				022	88.13
				026	92.58
				030	91.67

Table 4

Cross validation of proposed infinite algorithm with SVM classifier

Parameters	Validation
Database	Public drowsy driver database
Pre-processing	CRM
Feature extraction	Combination of HOG, LBP, head pose angle, and threshold value of mouth region
Feature selection	Infinite algorithm
Classifier	SVM
Total samples	22 samples
Training samples	18 samples
Testing samples	4 samples
Mean accuracy (%)	90.37
Mean sensitivity (%)	91.70
Mean specificity (%)	89.80
Mean f-score (%)	90.61

The cross validation of proposed infinite algorithm with SVM classifier is stated in Table 4. In this study, infinite algorithm with SVM classifier averagely attained 89.80% of specificity and 91.70% of sensitivity. Respectively, the average classification accuracy and f-score value of proposed system in driver drowsiness detection is 90.37%, and 90.61%. Simulation result of infinite algorithm with SVM classifier showed better contribution in active safety systems.

4.2 Comparative investigation

In Table 5, the comparative investigation of the proposed and existing system is presented. Guo and Markoni [16] developed a hybrid CNN-LSTM model for driver’s drowsiness detection. In this study, the developed system performance was evaluated on public drowsy driver database. In this literature, the hybrid CNN-LSTM attained 84.85% of accuracy in drowsiness detection. Related to hybrid CNN-LSTM model, the proposed infinite algorithm with SVM achieved 90.37% of classification accuracy, which was significantly higher than the hybrid CNN-LSTM model. In the proposed research, selection of optimal feature is a vital part of driver’s drowsiness detection. Every video sequence contains many feature vectors that leads to “curse of dimensionality” issue [18]. So, infinite algorithm is necessary for optimizing the extracted feature vectors, which is appropriate for better driver’s drowsiness detection. In addition, HOG and LBP effectively finds the linear and non-linear properties of video frames and also preserves the relation between lower and higher level feature vectors. Efficiency of infinite algorithm is indicated in Table 3.

Table 5
Comparative investigation of the proposed and existing system

Method	Dataset	Classification accuracy (%)
Hybrid CNN-LSTM [16]	Public drowsy driver dataset	84.85
Infinite algorithm with SVM		90.37

4.3 Contribution of the research

As discussed in the Section 3, feature selection is an integral part of driver drowsiness detection. Though, numerous feature values are obtained from HOG and LBP, so infinite algorithm is applied to choose the active or relevant feature values of driver drowsiness detection. The effect of feature selection is given in the Table 3, where the accuracy of with infinite algorithm is 2.86% better than the without infinite algorithm. Proposed model attained better classification performance related to the existing model in light of f-score, specificity, accuracy and sensitivity. The proposed model includes the advantage of preventing major and minor run-off road accidents.

5. Conclusion

In this work, a new system is proposed for detecting the driver’s drowsiness. The aim of this study is to propose a superior feature selection methodology to classify the stages of drowsiness detection (drowsy or non-drowsy). Here, infinite algorithm is developed to select the relevant feature vectors. The selected optimal feature vectors are classified utilizing SVM classifier. Related to hybrid CNN-LSTM model, the proposed system achieved a better performance in driver’s drowsiness detection by means of f-score, specificity, accuracy, and sensitivity. From the simulation outcome, the proposed system attained 90.37% of accuracy and speed of 12 frames per second on public drowsy driver database but the hybrid CNN-LSTM model obtained an accuracy of 84.85%. In future, a new system can be proposed with an optimization algorithm for further improving the accuracy of driver drowsiness detection.

Footnotes

Acknowledgments

This work is done under the scholarship of Visvesvaraya Ph.D Scheme for Electronics and IT, Government of India.

References

and Zheng

, Driver drowsiness detection with eyelid related parameters by Support Vector Machine, Expert Systems with Applications 36(4) (2009), 7651–7658.

Forsman

P.M.

Vila

B.J.

Short

R.A.

Mott

C.G.

and Van Dongen

H.P.A.

, Efficient driver drowsiness detection at moderate levels of drowsiness, Accident Analysis & Prevention 50 (2013), 341–350.

Flores

M.J.

Armingol

J.M.

and de la Escalera

, Real-time warning system for driver drowsiness detection using visual information, Journal of Intelligent & Robotic Systems 59(2) (2010), 103–125.

Lee

S.J.

Park

K.R.

Kim

I.J.

and Kim

, Detecting driver drowsiness using feature-level fusion and user-specific classification, Expert Systems with Applications 41(4) (2014), 1139–1152.

Subbaiah

D.V.

Reddy

P.V.G.D.P.

and Rao

K.V.

, Driver Drowsiness Detection methods: A comprehensive Survey, International Journal of Research in Advent Technology 7(3) (2019), 992–997.

Wang

and Xu

, Driver drowsiness detection based on non-intrusive metrics considering individual specifics, Accident Analysis & Prevention 95(B) (2016), 350–357.

Cheng

Zhang

Lin

and Feng

, Driver drowsiness detection based on multisource information, Human Factors in Ergonomics &Manufacturing 22(5) (2012), 450–467.

Assari

M.A.

and Rahmati

, Driver drowsiness detection using face expression recognition, in: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), IEEE, 2011, pp. 337–341.

Yoo

C.D.

Pan

Park

and Kang

, Driver drowsiness detection system based on feature representation learning using various deep networks, in: Asian Conference on Computer Vision – ACCV 2016 International Workshops, C.S. Chen, J. Lu, K.K. Ma, eds., Springer, Cham, 2016, pp. 154–164.

10.

Hachisuka

Ishida

Enya

and Kamijo

, Facial expression measurement for detecting driver drowsiness, in: International Conference on Engineering Psychology and Cognitive Ergonomics, D. Harris, eds., Springer, Berlin, Heidelberg, 2011, pp. 135–144.

11.

de Naurois

C.J.

Bourdin

Stratulat

Diaz

and Vercher

J.L.

, Detection and prediction of driver drowsiness using artificial neural network models, Accident Analysis & Prevention 126 (2017), 95–104.

12.

Weng

C.H.

Lai

Y.H.

and Lai

S.H.

, Driver drowsiness detection via a hierarchical temporal deep belief network, in: Asian Conference on Computer Vision – ACCV 2016 International Workshops, C.S. Chen, J. Lu, K.K. Ma, eds., Springer, Cham, 2016, pp. 117–133.

13.

de Naurois

C.J.

Bourdin

Bougard

and Vercher

J.L.

, Adapting artificial neural networks to a specific driver enhances detection and prediction of drowsiness, Accident Analysis & Prevention 121 (2018), 118–128.

14.

Panicker

A.D.

and Nair

M.S.

, Open-eye detection using iris-sclera pattern analysis for driver drowsiness detection, Sādhanā 42(11) (2017), 1835–1849.

15.

McDonald

A.D.

Lee

J.D.

Schwarz

and Brown

T.L.

, A contextual and temporal algorithm for driver drowsiness detection, Accident Analysis & Prevention 113 (2018), 25–37.

16.

Guo

J.M.

and Markoni

, Driver drowsiness detection using hybrid convolutional neural network and long short-term memory, Multimedia Tools and Applications 78 (2018), 29059–29087.

17.

Zhao

Wang

and Liu

, Driver drowsiness detection using facial dynamic fusion information and a DBN, IET Intelligent Transport Systems 12(2) (2018), 127–133.

18.

Akpınar

and Alpaslan

F.N.

, Optical flow-based representation for video action detection, in: Emerging Trends in Image Processing, Computer Vision and Pattern Recognition, L. Deligiannidis, H.R. Arabnia, eds., 2015, pp. 331–351.

19.

Dataset link: http://cv.cs.nthu.edu.tw/php/callforpaper/datasets/DDD/.