Abstract
A vehicle detection method based on the fast extraction of object-oriented candidate window and fused feature of HOG-LBP is proposed for the vehicle detection algorithms based on the single shape feature in the video monitoring of expressway may lead to mistaken inspection and the detection algorithm using the support vector machine (SVM) sliding window is quite time-consuming. Firstly, the vehicle candidate window is quickly extracted based on the binary normalized gradient feature and the background difference, then the histograms of oriented gradients (HOG) feature of the candidate window image and the local binary pattern (LBP) feature are calculated and the feature fusion is carried out, and finally the vehicle detection is taken combing with the SVM classifier. The experimental results show that the fusion of shape and texture features can effectively improve the performance of vehicle detection, and the detection speed of SVM can be raised about 8 times by fast extraction of the candidate window, which can meet the requirements of real time engineering.
Keywords
Introduction
Video monitoring is one of the main components of expressway intelligent traffic detection system, which collects real-time traffic flow data to judge the traffic state of the road and identify the occurrence of traffic accidents intelligently, provides valuable auxiliary decision-making information for the operation and management of the road, and greatly improves the management efficiency. Vehicle automatic recognition technology based on video sequence is the basis of traffic data acquisition and traffic incident detection in video surveillance. Therefore, it is of great significance to study a high precision and real-time vehicle target recognition technology to improve the reliability of detection.
The expressway vehicle detection algorithms are originally realized by the simple object extraction methods such as the difference method and the edge detection, etc., but the effect of these methods is not ideal for the complicated outdoor conditions like frequent change of light intensity and much messy noise. With the development of computational vision technology, more algorithms are used to extract vehicle shape and texture features and combine machine learning methods for vehicle detection. The common method is to calculate features of samples like Haar-like [1], HOG and LBP etc., and to use SVM or cascade Ada-boost method to train to get classifiers. And then image frames are detected and identified by sliding window, which can improve detection accuracy effectively. Viola proposed to use convolutional image to calculate some simple features of samples such as Haar-like, which is used to train the Ada Boost classifier, and then several simple classifiers are used to synthesize the complex classifier through cascade to detect the target. Hakki Can Karaimer et al trained the classifier based on the KNN algorithm and the HOG
However, the vehicle detection method based on the single feature combined with machine learning may lead to mistaken detection and missing detection for some road environment with complex conditions and much interference, which is still difficult to meet the requirements of the engineering practice [3]. In order to solve this problem, scholars have proposed a higher level vehicle feature detection algorithm, such as convolutional neural network (CNN) [4] and Faster RCNN [5] vehicle recognition and tracking algorithm based on deep learning extraction window. Although the detection accuracy of such algorithms has been greatly improved under complex environment, its real-time performance is still low and its cost is quite high, which is not suitable for engineering practice. Therefore, considering the performance of the algorithm and engineering practice, and the shape features and the texture features, the vehicle detection used the SVM classifier can effectively improve the accuracy.
In this paper, a vehicle recognition algorithm based on the binary normalized gradient features and background difference is proposed to quickly extract the target candidate window of the object. The detection accuracy and real-time performance of the SVM classifier is significantly higher than that of the classic HOG +SVM algorithm [6]. The algorithm first collects a large number of positive and negative samples (including vehicle samples and background samples) and converts them into images with 64
The algorithm flowchart is shown in Fig. 1.
Flow chart of the vehicle detection algorithm.
Features of HOG
Histograms of oriented gradients are feature descriptors composed of series of gradient histograms of all parts of segmentation images, which can be well applied to the recognition of vehicle targets because of its insensitivity to light intensity and strong description ability to the shape features of objects. The process of extracting HOG features of vehicle samples is shown as below [7]: normalizing the gamma space and color space of images; convolution calculating gradient’s size and direction of each pixel by using gradient operators [
Features of LBP
Local binary pattern (LBP) is an operator representing local texture features in binary form, which has the characteristics of simple computation, good stability and strong identification, and is suitable for vehicle target detection. Firstly, calculate the LBP feature of the sample images, which means to calculate the LBP eigenvalues of 8 sampling points in the 3
Fusion of feature vector
Fusion of feature vector means calculating the HOG and LBP eigenvalues of the sample images respectively, normalizing the feature vectors and fusing them into row vectors, which is expressed as below:
In Eq. (1),
The principle of image classification using support vector machine (SVM) is to take nonlinear transformation to the input image and to map the input image to high dimensional space, and to find a hyper plane to classify the image. The mapping relation in nonlinear transformation is realized by kernel function, and the hyperplane is the classifier between positive and negative samples [8]. The vehicle recognition can be seen as a binary classification problem, that is to judge whether the target image block is a vehicle target or not, then the SVM classifier can be shown as:
In Eq. (2),
One of the important factors affecting the real-time performance of SVM detection is to scan the whole image through sliding windows. In this paper, we proposed a fast extraction method of object selection window based on BING and background difference that can solve the problem of slow detection speed of SVM [9].
The normalized gradient feature is a normalized gradient in a local area, and its NG features have good stability for the position, length and width and scaling of the target images. Moreover, due to the tightness of NG features, it keeps high efficiency in the process of computation and verification. In order to improve the computation speed of NG features, binary normalized gradient (BING) can be used for approximate calculation:
In Eq. (3),
As shown in Algorithm 1, the BING feature takes advantage of the cumulative relationship with each line, effectively avoids the loop operation by simple original operations, and greatly improves the real-time performance of the algorithm. In the process of searching image objects, two class cascading SVM is used. First, linear SVM is used to train the BING features of positive and negative samples to obtain the linear template
In Eq. (5),
In Eq. (6),
The video surveillance in expressway is usually fixed camera, so the most parts of background images can be eliminated by the background difference method. On the basis of using BING to extract the suspected target window, combined with the foreground object image obtained by the background difference method, the window screened by BING will be used as the final alternative window if there is a large intersection between them.
In Eq. (7), proposal
In order to verify the effectiveness of the algorithm, the samples are collected from the KITTI vehicle database, in which there are 1360 positive samples and 3850 negative samples. The validation set contains 250 positive samples and 250 negative samples. We verify the accuracy and real-time performance of the algorithm by comparing the classical HOG
Analysis of SVM classifier training
From the KITTI vehicle database, the required data samples are obtained, including the training set and the validation set. All the samples are converted to 64
Step 1: Collection and preprocessing of positive and negative samples.
Step 2: The HOG and LBP feature vectors of all samples are calculated, and normalization and fusion processing are performed to get row vector
Step 3: Classifier training. The first job is the choice of kernel function, and the most commonly used kernel functions are linear kernel and RBF kernel. By comparing the test results, we found that the RBF kernel is about 0.45% higher than the linear kernel in detection accuracy. Therefore, the RBF kernel is selected as the SVM classifier to train the kernel function. The next job is the selection of the penalty factor
Step 4: In order to test the detection performance of trained SVM classifier, we should compare and analyze the precision and recall rate of different algorithms for the same set of tests.
In Eq. (8), TP, FP, TN and FN denote comparatively real positive sample, false positive sample, real negative sample and false negative sample. In order to avoid the possible contradiction between precision and recall, the comprehensive evaluation indexes measure and accuracy are used to indicate the detection performance of different algorithms [11].
In Eq. (9),
Comparison of classifiers with different features
Comparison of classifiers in different conditions
Table 1 provides the training and validation results of different algorithms for KITTI vehicle database. The data show that fast RCNN uses deep learning to extract the advanced features and obtains the best performance, whose cost is the sharp increasing in computation. In addition, the detection algorithm based on fusion feature has better performance than the single feature algorithm, and the comprehensive evaluation index and the accuracy rate improved a lot. The result indicates that the segmentation between vehicle samples and non-vehicle samples can be enhanced and the detection performance of the algorithm can be optimized by integrating HOG features and LBP features.
In order to verify the real-time detection performance of the proposed algorithm, we collect videos under different conditions in expressway as a verification database, including different light change, rainy day, and cloud platform movement, foggy day, influence of water droplet on camera lens and normal state. Each video time has 1
We compared the difference among different algorithms for the same database, and the result is shown in Table 2.
The results show that the detection rate with fusion feature is obviously higher than that with a single feature, but the features of the target objects have been seriously blurred under the two conditions (rainy day and water droplet on camera lens), so all the algorithms above do not perform well, and the detection rate is about 40% lower than that of the normal state.
This proposed algorithm uses the detection method of alternative window instead of the sliding window, which has improved the detection effect under the three conditions of foggy day, cloud platform movement and light change. Especially in the foggy day, the detection rate of the algorithm has improved about 28%. Still it can also be seen that deep learning algorithm using advanced features achieves the best level in all kinds of environments.
The detection rate and real-time are two basic requirements for video surveillance in expressway. However, the deep learning network with the best detection performance is not only expensive, but also has much computation cost, which cannot guarantee the requirement on real-time [12]. In addition, the target detection algorithm based on SVM classifier usually traverses the image through the sliding window to extract features and detect, and the high latitude features lead to excessive computation, which seriously affects the real-time performance of the detection system.
Detection time of different algorithms.
In view of the fixity of the video surveillance camera gun in the freeway, the object window is quickly extracted from the object based on the improved BING method, which can solve the time-consuming problem of the SVM sliding window detection quite well. First of all, the BING feature is used to quickly extract the alternative window. The number of windows is less than 1% of the sliding window method under the premise that the target detection rate is above 99% [13]. On this basis, the background difference technology is used to narrow the region of interest, further reducing the number of alternative windows to 102 or even less. Finally, all the candidate windows are preprocessed and their features are extracted, and the trained SVM classifier is used to determine whether the objects are vehicle targets or not. The algorithm based on improved BING has fast speed on extracting object target windows, and the average processing time of each image is 0.003
The comparison of different algorithms for detection time of videos is shown in Fig. 2. From the figure, we can see that in the same detection method, the increasing of feature dimension will make computation increase dramatically, mainly in the process of feature extraction. In this paper, we propose an algorithm using a quick object extraction window instead of a sliding window detection method. The average detection time is reduced to 0.043 s/frame. Compared with the method of fusion feature combing with traditional SVM algorithm, whose detection time is 0.341 s/frame, the speed boost is about 8 times, which greatly reduces the time consuming and improves the real-time performance of the algorithm.
In this paper, a vehicle recognition algorithm in expressway based on HOG
