Toward a pedestrian detection method by various feature combinations

Abstract

Pedestrian detection has been a crucial issue over the last decades. The existing pedestrian detection methods are still face abrupt illumination, partial occlusion, different poses of humans, and cluttered backgrounds challenges. Consequently, the significance of pedestrian detection systems encourages us to propose a new method to address some of these challenges and offer higher accuracy rate. Noting that the power of various kinds of features are different and a single type of feature cannot extract the comprehensive information of human shape. Taking this fact into consideration, we combined pragmatic and useful features in order to detect pedestrian more accurate. Indeed, we combine histogram of oriented gradients (HOG), a proposed modified local binary pattern (M-LBP), and a proposed modified Haar-like features (M-Haar) to achieve these goals. By applying the proposed method, it is possible to extract various information on human shapes including the edge information, texture information, and local shape information. After feature extraction, Cascade Adaboost classifier is used to detect pedestrian images from non-pedestrian. In experiments, INRIA dataset, Daimler dataset, and ETH dataset are applied. The extensive experimental results demonstrate that our approach outperforms the traditional methods in terms of the accuracy and robustness.

Keywords

Pedestrian detection modified local binary pattern M-LBP histogram of oriented gradients HOG modified Haar-like features Adaboost classifier

1. Introduction

Pedestrian detection has been a crucial issue over the last decades [1, 2] and is the most recognized example of object detection [3]. It has extensive utilization in various applications and services such as robotic [4, 5, 6], intelligent video surveillance system (IVS) [12, 13], intelligent image retrieval [1, 9], intelligent transportation, and advanced driver assistance system (ADAS) [2, 6, 10].

The main challenges of pedestrian detection are as follows:

•
It is an undeniable fact that when pedestrians overlap in the crowded scene, discovering them will become more difficult. In particular, when the population grows dramatically, occlusion and missing of body parts increase in urban traffic scene [3, 5, 11].
•
Environmental changes such as illumination alteration during the days and nights, the existence of shadow, and also weather conditions which can affect the efficiency of pedestrian detection [1, 10, 11].
•
The diversity of articulated poses based on the nature of human’s body in comparison with rigid objects, make pedestrian detection more struggling [5, 6, 41].
•
Pedestrians may appear in different colored clothing and even carry accessories like backpack and cap which can decrease the detection rate [1, 7, 12].
•
Cluttered backgrounds can decrease the accuracy of the detection rate. For instance, they might mistakenly be similar to a human’s shape and they may distract from the correct person’s boundary [1, 5, 27].

Taking these challenges into consideration, the significance of this issue encouraged us to design an innovative method based on combinations of essential features in order to eradicate these challenges and detect pedestrians effectively and efficiently. Since the potential effects of these features are different and a single feature cannot extract the comprehensive information of human shape [7, 27], a new method to combine practical features is proposed here. Our method is based on combinations of histogram of oriented gradient (HOG) features, a new proposed modified local binary pattern (M-LBP) features and a new modified M-Haar-like features. HOG is the most used and effective descriptors in previous pedestrian detection methods [3, 25]. Moreover, M-LBP and M-Haar are our new proposed descriptors inspired from the original LBP descriptor and the original Haar-like features respectively. In the following, we will describe this method in more depth.
2. Previous works

The related works offer for possible solutions for pedestrian detection problem are extensive [2, 7, 38]. On account of the vastness of this field, comparative articles are published to compare the notable progress and evaluate the performance of different recent methods on pedestrian detection [3, 4, 5].

Some previous methods focused on pedestrian detection are based on a fixed camera. Background subtraction is the most famous strategy for a constant camera [14, 15]. The main idea behind this approach is extracting the background from the image by such methods as Codebook [16, 17, 40], the mixture of Gaussians model [8, 18], self-organizing background subtraction (SOBS) [14], and then classifying the moving objects to discover their types as pedestrian and non-pedestrian. Nevertheless, these mentioned methods are not able to detect human objects from a moving camera which is a more complex task [10, 13].

Feature extractions are extensively used in order to overcome this problem [3, 41]. For instance, an exceptional texture descriptor and invariant to monotonic gray level changes is the local binary pattern (LBP) feature [21] which is used for pedestrian detection [22, 23]. Although, LBP features could not achieve high accuracy rate for pedestrians in cluttered backgrounds. Haar-like features are also another selected set of features with simple and fast implementation [19, 24], but it is easily affected by complex backgrounds. In order to decrease the problem of original Haar-like features, an informed version of Haar-like features was presented in [2], however, informed Haar-like method [2] suffered from high false positive alarms. The most utilized feature for human detection is the histogram of oriented gradients (HOG) which is robust for changes of cloth colors, body shapes, and heights [25]. The main idea behind this strategy is obtaining the object appearance and shape by characterizing it and using local intensity gradients and edge directions [26]. HOG is used in various pedestrian detection methods. For instance, after HOG features extraction, linear SVM as a simply implemented classifier is used in [25]. Moreover, after HOG feature extraction, decision trees [12], cascade Adaboost algorithm [27, 35], latent SVM [36], and intersection kernel SVM (IKSVM) [37] as advanced and fast classifiers were exploited. All of the above-mentioned methods, which used HOG features, have high precision but also need high computational times.

Considering the fact that only one kind of descriptor cannot extract the widespread information of pedestrian [4, 14, 38], most scholars focused on combinations of features with each other to improve the speed or accuracy of detection rate. For example, Wang et al. [28] incorporated HOG and LBP features. Those authors show that the body shape of a pedestrian can be better acquired with combinations of both the edge features and the texture features [29].

In addition, a method based on integrations of local self similarity (LSS) and HOG achieved better performance in the cluttered background with noisy edges [20]. Next, a combined method on the HOG-Haar features was designed [30]. HOG descriptor has a high accuracy of detection rate, although is deprived of high speed. On the other hand, Haar-like features have fast speed but the low accuracy. This method proved augmented HOG-Haar features can improve the pedestrian detection performance. However, both HOG and HAAR, could not extract the widespread information of pedestrian due to the complexity of human shape and appearance.

Figure 1.

The diagram of the proposed method.

All of the above-mentioned methods were based on combinations of different features. Thus, we aim to improve a pedestrian detection system by extracting various effective features. Indeed, by having different descriptors we may have a more accurate detection rate [3, 13, 27]. Based on these analysis, we proposed a method based on combinations of HOG, M-LBP, and M-Haar-like features in order to detect the pedestrians with higher precision as follows.

3. Proposed method

The general methodology of the proposed method is denoted in Fig. 1. As shown, the HOG, M-LBP, and M-Haar like features are extracted from the input image and then these features are sent to an Adaboost classifier to detect pedestrian images from non-pedestrian images. In the next part, this process is explained in more depth.

3.1 HOG features

The basic idea behind histogram of oriented gradients (HOG) is acquiring the object shape by characterizing it using local intensity gradients and edge directions [25]. Indeed, HOG may obtain local object shape by the distribution of edge directions even without exacting the corresponding gradient or edge direction information.

At the first step, normalization strategy is suggested to eliminate the variety of illumination, local shadows, and colors in the surrounding [3, 25]. Then, first order image gradients in y and x orientation, and the absolute value of gradients are calculated [38]. Weighted votes for gradient direction are accumulated as well. The image is densely divided into small special regions called cells.

For each cell, a 1-D histogram of gradient directions is calculated and then all cell data is mixed to generate a complete HOG descriptor of the window called block. To enhance the performance, we utilize overlapping local contrast normalizations and collect HOG descriptors for all blocks over detection window. As is displayed in Fig. 2, for each block, a local 9-D histogram of gradient directions over the pixels of the block is collected. Indeed, each block has 9 direction bins and the mixed histogram entries form the representation. Finally, the HOG descriptors are collected as our feature vectors $E=[E1,E2,\ldots,En]$ .

Figure 2.

Performance of HOG descriptor. G1, G2, and G3 are first order image gradients in $y$ and $x$ orientation, and the absolute value of gradient. Each block in different scale contains a local 9-D histogram of gradient. $E=[E1,E2,\ldots,En]$ depicts the feature vector of HOG.

Figure 3.

Two simple 3*3 original LBP descriptors.

3.2 Our proposed modified LBP features (M-LBP)

The idea of local binary patterns (LBP) [21] was suggested by Ojala et al. for the first time in order to measure the differences between a given pixel and its neighbor pixels for texture classification task. LBP is well-known to be an exceptional texture descriptor and invariant to monotonic gray level changes [22]. Furthermore, it is a highly discriminative operator [23]. LBP is based on differences of a given pixel and its neighbor. For instance, for 9 pixels, where the gray value of the center pixel is considered as a threshold, if a neighbor pixel has a lower gray value than the center pixel then a zero is assigned to that pixel, else it gets a one. Afterward, the LBP code for the center pixel is generated by concatenating the obtained zeros or ones to a binary code. The original LBP operator is defined as [21]:

$\displaystyle\textit{LBP}_{P}=\sum_{p=0}^{P-1}K(g_{p}-g_{c})2^{P},$ (1) $\displaystyle K(g_{p}-g_{c})=\left\{\begin{array}[]{cc}1,&(g_{p}-g_{c})>0\\ 0,&(g_{p}-g_{c})<0\end{array}\right\}$

Where $g_{c}$ is the gray value of the center pixel, and $g_{p}$ ( $p=$ 0, 1, 2, …, $P-1$ ) are the gray values of $P$ neighbor pixels.

Although original LBP is a strong texture descriptor, in some cases performs poorly. For example, two simple 3*3 LBP descriptors are illustrated in Fig. 3. As is obvious, they have different local structures but have the same LBP vector which is not reasonable.

Figure 4.

Two simple 3*3 modified LBP descriptors.

Figure 5.

The process of modified LBP feature extraction from an input image. $F=[F1,F2,\ldots,Fn]$ depicts feature vector of M-LBP.

To eliminate this drawback, we suggest a modified LBP (M-LBP). As mentioned earlier, in the original LBP the gray value of the center pixel is considered as a threshold. However, in our modified LBP the mean value of the neighborhood magnitude space is defined as a threshold. Our modified LBP operator is defined as:

$\displaystyle\!\!\!\!\!\!M-\textit{LBP}_{P}=\sum_{p=0}^{P-1}K(|g_{p}-g_{c}|)2^% {P},$ (2) $\displaystyle\!\!\!\!\!\!K(|g_{p}-g_{c}|)=\left\{\begin{array}[]{cc}1,&|g_{p}-% g_{c}|>\frac{\sum_{p=0}^{P-1}|g_{p}-g_{c}|}{P}\\ 0,&|g_{p}-g_{c}|<\frac{\sum_{p=0}^{P-1}|g_{p}-g_{c}|}{P}\end{array}\right\}$

Where $\frac{\sum_{p=0}^{P-1}|g_{p}-g_{c}|}{P}$ is the mean value of the neighborhood magnitude space. Regarding our modified LBP, the problem of having different local structure but the same LBP vector is solved. Figure 4 can shed more light on our assertion.

After extracting different M-LBP codes for a specified image region, its histograms can be produced by counting the frequencies of each value of M-LBP codes. Finally, the M-LBP histograms, obtained in various blocks of histograms, are collected in a feature vector and are utilized as information for more accurate classification in the next step [38]. This process is shown in Fig. 5.

Figure 6.

The modified Haar-like features.

Figure 7.

A sample of employing modified Haar-like features on an input image. $G=[G1,G2,\ldots,Gn]$ denotes the feature vector of M-Haar.

3.3 Our proposed modified Haar-like features (M-Haar)

Haar-like features were suggested by Viola and Jones [24] for rapid object detection. These features are an over complete set of two-dimensional Haar functions. It can be employed to discover the local appearance of objects [24]. These features include two or more rectangular regions. The principal reason for the utilization of the Haar-like features is that they effectively capture different image details and prepare a very attractive trade-off between accuracy and speed of evaluation [2]. The logic behind its high speed of calculation is applying integral images [24] which can be utilized to quickly assess any Haar-like features at any scale. It is described by the boundary box of its white regions and black regions with opposite signs. The features, indicated by $f_{h}$ , is computed as:

$\displaystyle f_{h}=\sum\nolimits_{(x,y)\text{ in white area}}w_{1}I(x,y)+\sum% \nolimits_{(x,y)\text{ in black area}}w_{2}I(x,y)$ (3)

Where $w_{1}$ is the weight of black areas (with negative sign) and $w_{2}$ is the weight of white areas (with a positive sign) and $I(x,y)$ is the intensity of pixel $(x,y)$ .

Because of the complexity in cloth and appearance, the human shape cannot be explained completely through the original Haar-like features. Thus, we improve rectangular features in our modified Haar-like features, as shown in Fig. 6. These added various rectangular features, can precisely explain local human shape. The human local parts in tilted positions of rectangular characteristics are ignored to make the computation simpler. A sample of extracting the M-Haar-like features from a pedestrian is illustrated in Fig. 7. The modified Haar-like features are multi-scaled to explain the human’s local area easily.

Table 1

Analysis of feature descriptors

	Features
	M-HAAR	HOG	M-LBP
Difference	✓		✓
Gradient		✓
Convolution		✓
Histogram		✓	✓

Table 2

A comparison of different number of weak classifiers on INRIA, Daimler, and ETH datasets

Table 3

Miss rate of previous pedestrian detectors on three datasets

Detector	Features	Classifier	Miss rate INRIA	Miss rate Daimler	Miss rate ETH
Shapelet [9] (2007)	Gradient	Adaboost	81%	94%	91%
Vj [19] (2004)	HAAR	Adaboost	72%	95%	90%
FTRMine [35] (2007)	HOG-Color	Adaboost	58%	–	–
LBP [22] (2008)	LBP	Linear SVM	49%	62%	68%
HOG [25] (2005)	HOG	Linear SVM	46%	60%	64%
Lat-SVM-V1 [36] (2008)	HOG	Latent SVM	44%	58%	77%
IK-SVM [37] (2008)	HOG	IK SVM	43%	55%	72%
HogLbp [28] (2009)	HOG-LBP	Linear SVM	39%	49%	55%
MultiFtr [30] (2008)	HOG-HAAR	Adaboost	36%	57%	60%
InformedHaar [2] (2014)	Informed Haar	Adaboost	14%	–	–
HOG-LSS [20] (2015)	HOG-LSS	Linear SVM	17%	26%	–
HOG-UDP [31] (2018)	HOG-UDP	SVM	8%	–	–
HOG-LBP-PCA [38] (2017)	HOG, LBP, PCA	K-SVD	8%	9%	–
1 ${}^{\text{st}}$ attempt	M-LBP, M-Haar	Adaboost	11%	25%	37%
2 ${}^{\text{nd}}$ attempt	HOG, M-LBP	Adaboost	9%	20%	30%
3 ${}^{\text{rd}}$ attempt	HOG-M-Haar	Adaboost	13%	24%	32%
4 ${}^{\text{th}}$ attempt	HOG, M-Haar, M-LBP	SVM	13%	25%	34%
Proposed method (best attempt)	HOG, M-HAAR, M-LBP	Adaboost	7%	17%	27%

After extracting M-Haar-like features, a cascade classifier is used since it can acquire high accuracy rate while dramatically reducing computation time. Each classifier in the cascade is learned to obtain high detection rates and modest false positive rates [24]. In this process, simpler classifiers with a small number of features are located earlier and complex classifiers with a plethora of features are located later [24]. The cascade classifiers inspired by a decision tree [7, 24]. As is obvious in Fig. 7, in feature cascading phase, a positive outcome from the first sub-classifier starts the assessment of a second sub-classifier which has also been optimized to obtain the best M-Haar features. A positive outcome from the second classifier starts the third classifier, and so on. A negative result at any stage propels to the refusal. Stages in the cascade are created by training classifiers employing AdaBoost and then optimizing the threshold to minimize the false negatives. After feature cascading, the best-selected M-Haar features are sent to the final classifier named Adaboost classifier. It will be discussed in the following sections in more depth.

3.4 Feature aggregation

At this section, the extracted features including HOG, M-LBP, and M-Haar like features are aggregated together. It is expected to achieve a new feature with powerful description ability. As shown in Table 1, with the exception of HOG, other descriptors (Our proposed M-Haar and M-LBP) need to compute the difference operator captured by subtraction between the pixels of image blocks. HOG descriptors are constructed on the basis of gradient values, and they need to convolve with filters in order to obtain the pyramid scale space. HOG descriptor compute histograms through the angle interval of pixels, but the proposed M-LBP features need to project the outputs into their own templates. Except for the proposed M-Haar like features, M-LBP and HOG features require histogram computation. Table 1 illustrated that HOG features contain three operators except difference operator. When HOG, M-LBP, and M-Haar features are aggregated, the new achieved combined features comprise all the four operators. Actually, it is expected that human shape can be completely recognized if the texture information, the edge information, and the local shape information are mixed together.

Figure 8.

Sample detection on images. The blue boxes, black boxes, yellow boxes, and red boxes depict detected pedestrians through using only M-LBP feature, only HOG feature, only M-Haar-like feature, and our suggested method (HOG, M-LBP, and M-Haar), respectively.

Figure 8.

continued.

4. Experimentations

In order to assess the performance of the proposed method, a dataset is required. Therefore, the INRIA dataset [25], the Daimler datasets [32], and the ETH dataset [33] are considered as our database. The reason of the selection of these three datasets is that INRIA, Daimler, and ETH databases are more diverse and complex compared to other limited datasets, such as MIT, NICTA, and CVC datasets.

Our detection approach is implemented in Matlab 2016 on Windows 8 with 2.50 GHz Intel Core i7 4870HQ, 64 bit and 16 GB RAM.

According to the PASCAL measure [34], the detection is true if the overlap of the ground truth annotation and the detection bounding box is more than 50%:

$\displaystyle a_{0}=\frac{\text{area}(\textit{BB}_{\textit{dt}}\cap\textit{BB}% _{\textit{gt}})}{\text{area}(\textit{BB}_{\textit{dt}}\cup\textit{BB}_{\textit% {gt}})}>0.5$ (4)

Where $\textit{BB}_{\textit{gt}}$ is bounding box of the ground truth annotation and $\textit{BB}_{\textit{dt}}$ shows the bounding box of the detection method.

After combining HOG, the proposed M-LBP, and the proposed M-Haar like features, Cascade AdaBoost algorithm [2] is applied for classification. Intuitively one would expect more weak classifiers cause higher accuracy rate because decision boundaries become more precise. However, too large number of weak classifiers may create overfitting of the training data. We adjust the number of weak classifiers to 32, 64, 128 and 256 alternatively. The best result is achieved when the number of weak classifiers is 128 which obtains 7%, 17%, and 27% miss rates for INRIA, Daimler, and ETH respectively. Table 2 illustrate the miss rates of different number of weak classifiers. Detection accuracy starts to decline gently when the number of weak classifiers is more than 128 because of over fitting.

5. Discussion

A question which may be raised here is that why we choose HOG, M-LBP, and M-Haar-like features among other existed features? In order to provide further legitimacy of choosing these features, we combined HOG-M-LBP for the first time, HOG-M-Haar-like features for the second time, and M-LBP-Haar-like features for the third time and compared the accuracy of these combined-features. The obtained results in Table 3 illustrates the superiority of our method in comparison with the above-mentioned combined features. In addition, another question which often raised is whether there is any logic behind the selection of cascade-AdaBoost as our classifier or not? We used SVM classifier and compared its result with our suggested classifier. The achieved results in Table 3 denotes that using cascade AdaBoost can lead us toward a high detection accuracy rate in comparison with SVM classifier.

It is important to note that that deep learning and Convolutional Neural Networks (CNNs) has revolutionized the object detection methods [6] and recent researches on the solution of pedestrian detection which used this approach are no exceptions to achieved high accuracy with real-time running [6, 42, 43]. However, our proposed method successfully outperformed the previous feature-based methods as illustrated in Table 3.

Using three various datasets (INRIA, Daimler, ETH), we can not only enrich our dataset extensively, but also evaluate the robustness and precision of the proposed method comprehensively. The image of these datasets contains abrupt illumination, different human shape, and cluttered backgrounds but our method has the ability to detect pedestrians in these challenges situations which is another advantage of it. Figure 8 display some sample results of pedestrian detection. In Fig. 8, the blue boxes, black boxes, yellow boxes, and red boxes depict detected pedestrians through using only the proposed M-LBP feature, only HOG feature, only the proposed M-Haar-like feature, and our suggested strategy (combination of HOG, M-LBP, and M-Haar like features), respectively. Indeed, for one time, only the M-LBP features are extracted from images and then these extracted features are sent to the AdaBoost classifier to detect pedestrian (blue boxes). For the second time, only the HOG features are extracted from input images and sent to the AdaBoost classifier to detect pedestrians (black boxes). For the third time, only the M-Haar-like features are extracted from input images and feed to the AdaBoost classifier to detect pedestrians (yellow boxes). For the fourth time, the extracted HOG, M-LBP, M-Haar-like features are sent to the AdaBoost classifier and pedestrians are detected through our method (red boxes). As illustrated in Fig. 8, each of the aforementioned features alone performs poorly. However, the combination of these features achieved higher accuracy and lower miss rate because it contains gradient, difference, convolution, and histogram operators, which are supplementary with each other. We do not provide an exhaustive comparison of runtimes among state-of-the-art detectors in this work because various detectors are implemented on various machines, some even heavily rely on GPU computations [38]. It, therefore, does not make much sense to list runtimes from different computing architectures. However, our method can be readily and efficiently implemented with high speed and low computational time.

6. Conclusion and future work

In this contribution, a precise pedestrian detection method was suggested. The first novelty of this investigation was improving two original features named as modified local binary pattern (M-LBP) and modified Haar like features (M-Haar like features). Then these features were combined with the HOG features as the most famous descriptor in feature-based pedestrian detection methods. Another positive aspect of this work was adopting three different datasets (INRIA, Daimler, and ETH). Each dataset was recorded in various environments, resolution, background occlusion, and light condition. Thus, by employing three different datasets, we can evaluate the robustness and precision of the proposed method comprehensively. The extensive experimental results denoted that our suggested approach obtained better detection-rate and accuracy compared to the traditional feature-based pedestrian detection methods. Considering the widespread success of deep learning and CNNs in object detection methods, we will focus on employing deep learning techniques or combining it with our method in pedestrian detection systems and try to achieve better results.

References

Xia

et al., A unified framework for concurrent pedestrian and cyclist detection, IEEE Transactions on Intelligent Transportation Systems 18(2) (2016), 269–281.

Zhang

et al., Informed haar-like features improve pedestrian detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.

Zhang

et al., Towards reaching human performance in pedestrian detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4) (2017), 973–986.

Dollar

et al., Pedestrian detection: An evaluation of the state of the art, IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4) (2011), 743–761.

Ouyang

et al., Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 40(8) (2017), 1874–1887.

et al., Scale-aware fast R-CNN for pedestrian detection, IEEE Transactions on Multimedia 20(4) (2017), 985–996.

Lahiani

and Mahmoud

, Hand gesture recognition method based on HOG-LBP features for mobile devices, Procedia Computer Science 126 (2018), 254–263.

Sen

-Ching and Chandrika

, Robust techniques for background subtraction in urban traffic video, Visual Communications and Image Processing 5308 (2004). International Society for Optics and Photonics, 2004.

Sabzmeydani

and Greg

, Detecting pedestrians by learning shapelet features, in: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2007.

10.

Kim

and Mesmakhosroshahi

, Stereo-based region of interest generation for real-time pedestrian detection, Peer-to-Peer Networking and Applications 8(2) (2015), 181–188.

11.

Shen

et al. Adaptive pedestrian tracking via patch-based features and spatial-temporal similarity measurement, Pattern Recognition 53 (2016), 163–173.

12.

Paisitkriangkrai

and Chunhua

, Pedestrian detection with spatially pooled features and structured ensemble learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 38(6) (2015), 1243–1257.

13.

Zhao

Jingjing

and Deqiang

, Real-time moving pedestrian detection using contour features, Multimedia Tools and Applications 77(23) (2018), 30891–30910.

14.

Maddalena

, A self-organizing approach to background subtraction for visual surveillance applications, IEEE Transactions on Image Processing 17(7) (2008), 1168–1177.

15.

Andrews

et al., Highway traffic congestion classification using holistic properties, in: 10th IASTED International Conference on Signal Processing, Pattern Recognition and Applications, 2013.

16.

Akula

et al., Adaptive contour-based statistical background subtraction method for moving target detection in infrared video sequences, Infrared Physics and Technology 63 (2014), 103–109.

17.

Kim

et al., Real-time foreground–background segmentation using codebook model, Real-Time Imaging 11(3) (2005), 172–185.

18.

Stauffer

, and Eric

, Learning patterns of activity using real-time tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000), 747–757.

19.

Viola

and Michael

, Robust real-time face detection, International Journal of Computer Vision 57(2) (2004), 137–154.

20.

Yao

et al., A new pedestrian detection method based on combined HOG and LSS features, Neurocomputing 151 (2015), 1006–1014.

21.

Ojala

Matti

and Topi

, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence 7 (2002), 971–987.

22.

et al., Discriminative local binary patterns for human detection in personal album, in: 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2008.

23.

Zhenhai

and Yiteng

, Pedestrian count estimation using texture feature with spatial distribution, Advances in Mechanical Engineering 9(1) (2016), 1687814016683599.

24.

Viola

Michael

and Daniel

, Detecting pedestrians using patterns of motion and appearance, International Journal of Computer Vision 63(2) (2005), 153–161.

25.

Dalal

and Triggs

, Histograms of oriented gradients for human detection, Computer Vision and Pattern Recognition, CVPR 2005. IEEE Computer Society Conference on, Vol. 1, 2005, pp. 886–893.

26.

Jiang

and Jinwen

, Combination features and models for human detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

27.

Baek

et al., Fast and efficient pedestrian detection via the cascade implementation of an additive kernel support vector machine, IEEE Transactions on Intelligent Transportation Systems 18(4) (2016), 902–916.

28.

Wang

et al., An HOG-LBP human detector with partial occlusion handling, in: 2009 IEEE 12th international conference on computer vision, IEEE, 2009.

29.

Campmany

et al., GPU-based pedestrian detection for autonomous driving, Procedia Computer Science 80 (2016), 2377–2381.

30.

C.Wojek and Bernt

, A performance evaluation of single and multi-feature people detection, in: Joint Pattern Recognition Symposium, Springer, Berlin, Heidelberg, 2008.

31.

Shim

, Design and implementation of a Pedestrian recognition algorithm using trilinear interpolation based on HOG-UDP, The Journal of Supercomputing 74(2) (2018), 787–800.

32.

Enzweiler

and Dariu

, Monocular pedestrian detection: Survey and experiments, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(12) (2008), 2179–2195.

33.

Ess

et al., Depth and appearance for mobile scene analysis, in: 2007 IEEE 11th International Conference on Computer Vision, IEEE, 2007.

34.

Everingham

et al., The pascal visual object classes (voc) challenge, International Journal of Computer Vision 88(2) (2010), 303–338.

35.

Dollar

et al., Feature mining for image classification, in: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2007.

36.

Felzenszwalb

and Deva

, A discriminatively trained, multiscale, deformable part model, in: 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2008.

37.

Maji

et al., Classification using intersection kernel support vector machines is efficient, in: 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2008.

38.

Zheng

et al., Pedestrian detection based on gradient and texture feature integration, Neurocomputing 228 (2017), 71–78.

39.

Biswas

and Peyman

, Linear support tensor machine with LSK channels: Pedestrian detection in thermal infrared images, IEEE Transactions on Image Processing 26(9) (2017), 4229–4242.

40.

Zhao

et al., Robust pedestrian detection in thermal infrared imagery using a shape distribution histogram feature and modified sparse representation classification, Pattern Recognition 48(6) (2015), 1947–1960.

41.

Zhang

et al., Efficient pedestrian detection via rectangular features based on a statistical shape model, IEEE Transactions on Intelligent Transportation Systems 16(2) (2014), 763–775.

42.

et al.Pedestrian detection aided by deep learning semantic tasks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.

43.

Zhang

et al., Towards reaching human performance in pedestrian detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4) (2017), 973–986.