Automatic pedestrian detection in partially occluded single image

Abstract

In this study, we propose a novel method to process crowds and partially occluded pedestrians during single-image pedestrian detection. First, two procedures are proposed and developed to extract features at different body parts of pedestrian from images. One procedure uses the multiscale block-based histogram of oriented gradients, which is preprocessed via Gabor filtering, to effectively enhance the descriptions of features of pedestrians’ heads and bodies. The other involves modifying the Haar-like features as parallelogram-based Haar-like features to suitably depict pedestrians’ legs and arms, which are typically not straight while walking. In addition, the computations of features are expedited through integral image acceleration for these two extraction methods. After the pedestrian features are acquired, a two-tier support vector machine (SVM) classifier is proposed for processing partially occluded pedestrians. The first-tier SVM classifier is used to judge if there are any occluded body parts. Next, classification probabilities of nonoccluded body parts in first-tier one are input into the second-tier classifier to determine if there are any pedestrians in the detection window. Compared with six state-of-the-art approaches, the experimental results indicate that the proposed method is more accurate and satisfactory in terms of the receiver operating characteristic curve and four other criteria. Additionally, our method effectively processes images in which a crowd is present or pedestrians are partially occluded and enables pedestrian detection in images of different scenes.

Keywords

Pedestrian detection feature extraction multiscale block-based histogram of oriented gradient (MBHOG)support vector machine (SVM)partial obscurity

1. Introduction

Pedestrian detection [1, 2, 3, 4, 5, 6, 7, 8, 9], an intelligent camera technology, is critical in machine visual techno-logy [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. As its name implies, this technology is useful for determining if there are any pedestrians in an image and where they are located. It is applied in various fields [20, 21, 22, 23, 24, 25, 26, 27, 28], such as in intelligent surveillance systems for public safety, serves as an indispensable component of the ever-evolving automatic driving system, improves human-AI interaction, and aids in unmanned aerial vehicles used in mountain or shipwreck rescue operations [29, 30, 31]. Pedestrian detection is used so extensively that it has become a major focus of research.

The target of pedestrian detection has both rigid and flexible object properties [32], and the image features are easily affected by clothes, hair, body shape, skin color, posture, and visual angle. If a detection target exists in a complex image background, the target is likely to be obscured/occluded by trees, buildings, and vehicles in the background [33, 34, 35, 36, 37, 38, 39, 40, 41, 42]. More specifically, herein, obscuration mostly means occlusion. For pedestrian detection to be suitably applied in different images and environments, recognition rate and speed should be taken into consider. Because of these considerations, pedestrian detection is a relatively challenging area of image recognition research. Moreover, because the detection system functions similarly to human eyes, its effectiveness is easily affected by weather, climate, and visibility. For example, detection effectiveness is lower at night and without streetlights than during the day. Taken together, complex backgrounds and obscurity are two major limitations in current pedestrian detection technology [43]. Overcoming the limitations will enhance the practicality of this technology. Therefore, the objective of this study is to modify the recognition algorithm for pedestrian detection to improve its recognition rate.

The image features of pedestrians can be divided into three categories: primary features, features acquired through machine learning, and mixed features. Primary features refer to colors, texture, and image gradient vectors. Features acquired through machine learning methods such as deep learning are highly recognizable pedestrian image features that computers learn on the basis of a massive amount of pedestrian samples. Mixed features are a combination of multiple low-level features or the higher-order statistical features of low-level features.

The histogram of oriented gradient (HOG) [8], a primary feature, is the most widely used pedestrian image feature with favorable performance. An HOG depicts the local gradient magnitude and gradient-oriented features of an image. The census transform histogram is proposed for illustrating the global information of a scene, and was first used in scene classification [44]. Features acquired through machine learning are the distinctive features of an image that are obtained through feature selection. The AdaBoost Haar-based face detection method [45] is proposed to use the AdaBoost algorithm to select discernible features (weak classifiers) among numerous Haar image features. This method has been successfully used in pedestrian detection. Adaptive contour features [46] are proposed that use AdaBoost to select features in an oriented gradient space in which three operations are defined – growth, consolidation, and cutting – to effectively illustrate the feature co-occurrences of a shape. Regarding mixed features, [47, 48] have combined both oriented gradient and local binary pattern (LBP) histograms, used this combination as the image feature, and proposed a pedestrian detection method for partial obscurity. Moreover, they have trained two linear support vector machine (SVM) classifiers [49, 50, 51] through training data and used occluded pedestrian images (composed of INRIA [52] and PASCAL [53] datasets) in combination with enhanced HOG–LBP features and a procedure for processing global and partial obscurity to verify the procedure.

The present study focuses on increasing the accuracy of pedestrian detection by modifying the HOG as the multiscale block-based histogram of oriented gradient (MBHOG) for pedestrians’ heads and bodies. In addition, the parallelogram-based Haar-like feature (PHF), an improved version of the Haar-like feature, is applied for depicting pedestrians’ arms and legs. The MBHOG and PHF constitute a joint component model that serves as an improved feature vector. This image feature vector is incorporated with the proposed two-tier SVM classifier to improve the accuracy rate of pedestrian detection by processing partial obscurity, complex environments, and crowds. More specifically, a novel method is proposed to improve the accuracy of single-image partially occluded pedestrian detection. Two procedures are proposed to extract features from different body parts of pedestrian images. The first proposes MBHOG features to effectively enhance the descriptions of pedestrians’ heads and bodies. The second involves PHF features to suitably depict pedestrians’ legs and arms. These two features extracted are expedited through integral image acceleration and then concatenated as novel feature vectors. Finally, a two-tier SVM classifier is further proposed to resolve the issues of partially occluded pedestrians. The first-tier classifier is used to judge if there are any occluded body parts, whereas the second-tier one is used to determine if there are any pedestrians in the detection window. The proposed method can effectively process images in which a crowd is present or pedestrians are partially occluded and enables pedestrian detection in images of different scenes.

This paper is structured as follows. Section 2 describes the materials and the proposed methods. In Section 3, the experimental results and discussion are presented. The conclusion is given in Section 4.

2. Materials and methods

Figure 1 depicts the study procedure. To collect training images from the INRIA database, this study first adopts two extraction methods to capture the features of pedestrians’ body parts. One method is MBHOG feature preprocessed by the Gabor filter; this feature describes the trunk. The other is the PHF, which describes limbs. Next, this study uses principal components analysis (PCA) to reduce the dimension of these two features. Lastly, this study employes the two-tier SVM classifier for recognizing partially occluded pedestrians. The first-tier SVM classifier is used to determine if any body part is occluded. Next, classification probabilities of estimated in first-tier one for nonoccluded parts are input into the second-tier classifier to determine if there are any pedestrians in the detection window.

Figure 1.

Flowchart of the proposed method.

2.1 Materials

In the experiments, the whole images in INRIA database [52] are used. The database contains training and testing sets. In the training set, there are 2,416 pedestrian photos and 1,218 background photos without pedestrians, and in the testing set, there are 1,131 pedestrian photos and 453 background photos without pedestrians (Fig. 2). The border of original positive and negative samples (which refer respectively to pedestrian photos and background photos without pedestrians) is cropped to obtain a resolution of 64 $\times$ 128. A total of 3,548 positive samples are acquired from pedestrian photos and 3,548 negative ones from background photos with no pedestrian.

2.2 Pedestrian head and body feature extraction through Gabor filtering and MBHOG

Figure 3 shows the MBHOG proposed by this study. After an image is input, gamma calibration and Gabor filter preprocessing are performed. Next, the gradient is calculated, multiscale cells and blocks are configured, and L2-norm normalization is performed to collect feature vectors and thereby enhance the description of pedestrians’ upper body features.

Figure 2.

Positive and negative samples for pedestrian detection: (a) positive samples; (b) negative samples.

Figure 3.

Flow chart of MBHOG feature extraction.

Figure 4.

Preprocessing for pedestrian head and body feature extraction: (a) input images; (b) preprocessed images.

2.2.1 Pre-processing using gamma correction and Gabor filtering

First, gamma correction is performed on images to compensate for human visual characteristics. The purpose of this preprocessing is to reduce the impact of light or shadows on the images and to reduce noise interferences.

$\displaystyle V_{\text{out}}=AV_{\text{in}}^{\gamma}$ (1)

Both scenes and backgrounds can be complex during pedestrian detection and Gabor filters are effective in image edge detection [54, 55]. Therefore, this study uses Gabor filters for preprocessing in pedestrian detection. The generating function for Gabor filters is expressed by:

$\displaystyle\varphi(\vec{z})=\frac{1}{2\pi}\frac{{||\vec{k}||}^{2}{||\vec{z}|% |}^{2}}{2\sigma^{2}}\exp(j\vec{k}\cdot\vec{z})$ (2) $\displaystyle\vec{k}=2\pi f\exp(j\theta),\lambda=\frac{1}{f}$ (3)

where $\vec{k}$ is the wave vector, $\lambda$ is the wavelength, and $\theta$ is the direction of the wave vector. This study sets four pairs of ( $\theta$ , $\lambda$ ) to be: (157.5, 5.6), (67.5, 8), (90, 11.31), and (112.5, 16) [54]. Four preprocessed images are then averaged to obtain the final image. The results of preprocessing for pedestrian head and body feature extraction are shown in Fig. 4.

Figure 5.

Oriented gradient histogram: (a) Pixel gradient angle and intensity, (b) Oriented gradient histogram of a cell.

2.2.2 Image gradient calculation

To calculate the gradient direction, simple one-dimension gradient masks [ $-$ 1, 0, 1] and [ $-$ 1, 0, 1] ${}^{T}$ have the optimal detection performance [3]; using a complicated mask to compute the image gradient normally lowers the detection performance. The preceding masks are used to calculate $G_{x}$ and $G_{y}$ , wherein $G_{x}$ and $G_{y}$ are the horizontal gradient and vertical gradient of image $G$ . The gradient intensity and angle (gradient direction) of image $G$ are subsequently computed to be $\nabla G(x,y)$ and $\theta(x,y)$ , respectively.

2.2.3 Configuring multiscale cells and blocks

The HOG feature segments a training image into multiple cells of equal size, which are then classified into the nine bins, and the number of cells in each bin is counted. More specifically, 180 ${}^{\circ}$ (from 0 ${}^{\circ}$ to 180 ${}^{\circ}$ ) is used in the experiments, and it is then equally divided into nine bins. Pixel gradient angle $\theta(x,y)$ is used as a yardstick to determine which cell belonged in which bin. Pixel gradient intensity $\nabla G(x,y)$ indicates the weight of a pixel. Both pixel gradient angle and intensity constitute the oriented gradient histogram of a cell (Fig. 5).

In addition, to effectively describe the partial contour and shape of a target feature, the HOG feature organizes multiple cells into one block. The blocks are overlapped to prevent any critical feature descriptors from being overlooked, or the miss rate will increase. However, if the overlap is excessively large, feature dimensions and computation time will increase (Fig. 6).

2.2.4 Cumulative sum of histogram gradients method

The proposed MBHOG feature extraction method links HOG feature vectors obtained from different cells and blocks to improve contour description and feature recognition, thereby increasing the accuracy rate of detection. Because the MBHOG repeatedly computes the gradient histogram of the same block, the cumulative sum of histogram gradients method is adopted to facilitate the computation.

The first step of the method is to save the cumulative sum of voting ingredients of the $k$ -th direction into ${\textit{CS}}_{k}$ , as expressed by:

$\displaystyle{\textit{CS}}_{k}(i,j)=\sum\limits_{x=1}^{i}\sum\limits_{y=1}^{j}% {M_{k}(x,y)}$ (4)

where $M_{k}(x,y)$ is the gradient intensity of the $k$ direction of pixel $(x,y)$ . To expedite the computation, the next pixel is estimated on the basis of the previously estimated ${\textit{CS}}_{k}(i,j)$ ; as expressed by:

$\displaystyle{\textit{CS}}_{k}(i,j)={\textit{CS}}_{k}(i,j-1)+{\textit{CS}}_{k}% (i-1,j)-∼{}{\textit{CS}}_{k}(i-1,j-1)+M_{k}(i,j)$ (5)

where ${\textit{CS}}_{k}(i,0)=0$ and ${\textit{CS}}_{k}(0,j)=0$ . The cell histogram can be obtained from the derived CS matrix and used to aid in the computation of feature of the $k$ -th direction of various scales, as expressed by:

$\displaystyle H_{k}(x,y,w,h)={\textit{CS}}_{k}(x+w-1,y+h-1)∼{}-{\textit{CS}}_{% k}(x-1,y+h-1)∼{}-{\textit{CS}}_{k}(x+w-1,y-1)∼{}+{\textit{CS}}_{k}(x-1,y-1)$ (6)

where $w$ and $h$ are the width and height of the cell and $H_{k}(x,y,w,h)$ is the cumulative gradient of the $k$ -th bin. Multiscale computation can be expedited by surveying these CS matrices, and this approach can also enhance the recognition rate.

Figure 6.

Blocks overlapping.

2.2.5 Partial L2-norm normalization

The adaptation of cells to partial light and shadows is improved. To this end, before all features are combined, each block is normalized to render each cell self-adaptive and reduce the impact of partial light. The equation for partial L2-norm normalization is expressed by:

$\displaystyle v_{i}^{\prime}=\frac{v_{i}}{\sqrt{\sum\limits_{i=0}^{n}v_{i}^{2}}}$ (7)

where $i=1,\ldots,n$ , $n$ is the number of feature vectors in a block, and $v_{i}$ and $v_{i}^{\prime}$ are the original and normalized feature vectors, respectively. Lastly, all feature vectors are linked to create a sample MBHOG image feature.

2.3 Pedestrian limb feature extraction using PHF

The PHF is an improvement on Haar-like features for the detection of certain pedestrian body parts. Haar-like features are often used for detecting human facial images, and have a rectangular area constraint, which makes them not suitable for detecting pedestrians because their legs and arms are typically inclined while they are walking. In this case, the parallelogramical image features of the PHF are more appropriate for pedestrian detection. Therefore, this study uses the PHF to extract the features of pedestrians’ arms and legs.

2.3.1 Preprocessing

Preprocessing comprises three steps. First, the image is rendered in grayscale. Second, gamma correction is performed to reduce both the impact of light and shadows and noise interferences. Lastly, the histogram equalization is used to limit the differences between image and background intensity to an acceptable range for classifier recognition. The results of preprocessing for pedestrian limb feature extraction are shown in Fig. 7.

2.3.2 PHF image features and integral image acceleration method

A PHF image feature is obtained by subtracting the sum of intensities of white blocks in a region from the sum of intensities of black blocks in the region. The reliability of a PHF is accessed by examining the features of a group of PHFs; these values are acquired via different-sized templates from images. PHF features of various scales and areas are accordingly obtained. The integral image acceleration method is also applied to facilitate the acquisition of PHF feature vectors. A PHF is parallelogrammical. The construction of the first integral image ${\textit{TP}}^{(1)}$ is expressed by

Figure 7.

Preprocessing for pedestrian limb feature extraction: (a) input images; (b) preprocessed images.

Eq. (10), and, all TP values are obtained from the computed TP values, as expressed by Eq. (11):

$\displaystyle{\textit{TP}}^{(1)}(x,y)=\sum\limits_{j=1}^{y}\sum\limits_{i=1}^{% x+y-j}{I(i,j)}$ (8) $\displaystyle{\textit{TP}}^{(1)}(x,y)={\textit{TP}}^{(1)}(x-1,y)+∼{}{\textit{% TP}}^{(1)}(x+1,y-1)-∼{}{\textit{TP}}^{(1)}(x,y-1)+I(x,y)$ (9)

where $I(ij)$ is the intensity of the image at $\mathrm{}(ij)$ . ${\textit{TP}}^{(2)}$ is the second integral image, and its construction is expressed by the following equations:

$\displaystyle{\textit{TP}}^{(2)}(x,y)=\sum\limits_{j=1}^{y}\sum\limits_{i=x+j-% y}^{W}{I(i,j)}$ (10) $\displaystyle{\textit{TP}}^{(2)}(x,y)={\textit{TP}}^{(2)}(x-1,y-1)+∼{}{\textit% {TP}}^{(2)}(x+1,y)-∼{}{\textit{TP}}^{(2)}(x,y-1)+I(x,y)$ (11)

where $W$ is the width of the image. ${\textit{TP}}^{(3)}$ and ${\textit{TP}}^{(4)}$ are the third and fourth integral images, and their construction is expressed by the following equations:

$\displaystyle{\textit{TP}}^{(3)}(x,y)=\sum\limits_{i=1}^{x}\sum\limits_{j=1}^{% y+x-j}{I(i,j)}$ (12) $\displaystyle{\textit{TP}}^{(3)}(x,y)={\textit{TP}}^{(3)}(x-1,y+1)+∼{}{\textit% {TP}}^{(3)}(x,y-1)-∼{}{\textit{TP}}^{(3)}(x-1,y)+I(x,y)$ (13) $\displaystyle{\textit{TP}}^{(4)}(x,y)=\sum\limits_{i=x}^{W}\sum\limits_{j=1}^{% y+i-x}{I(i,j)}$ (14) $\displaystyle{\textit{TP}}^{(4)}(x,y)={\textit{TP}}^{(4)}(x,y-1)+∼{}{\textit{% TP}}^{(4)}(x+1,y+1)-∼{}{\textit{TP}}^{(4)}(x+1,y)+I(x,y)$ (15)

2.3.3 Template configuration and features estimation using integral images

This study uses templates of different sizes to extract the PHF features of the left arm, left leg, right arm, and right leg. To improve pedestrian detection, this study uses the absolute value of PHF as the feature vector for detection. Next, TP integral images are used to compute the sum of intensities within a parallelogram region, quickly yielding the regional cumulative sum:

$\displaystyle{\textit{SP}}^{(1)}={\textit{TP}}^{(1)}(x+w-h,y+h-1)+∼{}{\textit{% TP}}^{(1)}(x-h,y+h-1)-∼{}{\textit{TP}}^{(1)}(x+w,y-1)-∼{}{\textit{TP}}^{(1)}(x% ,y-1)$ (16) $\displaystyle{\textit{SP}}^{(2)}={\textit{TP}}^{(2)}(x+h-1,y+h-1)+∼{}{\textit{% TP}}^{(2)}(x+w-1,y-1)-∼{}{\textit{TP}}^{(2)}(x-1,y-1)-∼{}{\textit{TP}}^{(2)}(x% +w+h-1,y+h-1)$ (17) $\displaystyle{\textit{SP}}^{(3)}={\textit{TP}}^{(3)}(x+w-1,y+h-w)+∼{}{\textit{% TP}}^{(3)}(x-1,y)-∼{}{\textit{TP}}^{(3)}(x-1,y+h)-∼{}{\textit{TP}}^{(3)}(x+w-1% ,y-1)$ (18) $\displaystyle{\textit{SP}}^{(4)}={\textit{TP}}^{(4)}(x,y+h-1)+∼{}{\textit{TP}}% ^{(4)}(x+w,y+w-1)-∼{}{\textit{TP}}^{(4)}(x,y-1)-∼{}{\textit{TP}}^{(4)}(x+w,y+h% +w-1)$ (19)

Table 1

Feature dimensions of different body parts

Feature	Trunk and	Left	Right	Left	Right
dimensions	head	arm	arm	leg	leg
Before reduction	11322	36736	36736	36736	36736
After reduction	1057	388	634	401	583

2.4 Feature dimension reduction using PCA

Because of the large dimensions of MBHOG and PHF, this study uses PCA to reduce their dimensions and obtain main features. The extent of reduction varies depending on the training set (Table 1). Nevertheless, 99% of image information is retained. Following feature dimension reduction, the accuracy rate of pedestrian detection increases, as do classifier training and the recognition speed.

Table 2
Proposed two-tier classifiers

No. in first-tier classifier	Part
1	Right leg
2	Left leg
3	Right arm
4	Left arm
5	Main trunk
No. in second-tier classifier	Part combination
C1	1, 2, 3, 4, 5
C2	1, 3, 4, 5
C3	2, 3, 4, 5
C4	1, 2, 4, 5
C5	1, 2, 3, 5
C6	3, 4, 5
C7	2, 4, 5
C8	1, 4, 5
C9	2, 3, 5
C10	1, 3, 5

2.5 Partially occluded body parts recognition through proposed two-tier SVM classifiers and joint component model

This study proposes the two-tier SVM classifier and the joint component model to address the limitations of pedestrian detection technology concerning partially body parts and overlapping pedestrians. The PHF and MBHOG are used on different body parts; the MBHOG are applied in the trunk and the head, whereas different PHF templates in the left arm, right arm, left leg, and right leg. After features of each body part are extracted and their dimensions are reduced through PCA, the SVM is employed for the first-tier training on each body part. Classifier parameters are thereby obtained for all five body parts. For new testing data, the trained first-tier classifier is used to compute the classification probability of each of the five body parts, and possible combinations of these five body parts are specified and used as the second-tier SVM classifier training set (Table 2). If the combinations of nonoccluded body parts in the first-tier classifier do not exist in the second-tier classifier, this suggests that no pedestrian is present in the image.

Recognizing new data entails the following steps. First, the size of the data is adjusted to that of the training set. Second, the features of the five body parts are obtained and their dimensions are reduced through PCA. The trained first-tier SVM classifier is used to generate the classification probabilities of the five body parts and determine whether any of the body parts are occluded. If the probability score exceeds the threshold (Fig. 8), this body part is deemed to be occluded. Lastly, nonoccluded body parts are paired with the trained second-tier classifier to determine the presence of any pedestrians. Any combinations of nonoccluded body parts that are not paired with a second-tier SVM classifier indicate the absence of pedestrians.

Table 3
MBHOG with and without Gabor filtering preprocessing

	Accuracy	Miss	Detection	False
	rate	rate	rate	positive rate
Without Gabor filtering	91.06%	5.86%	91.1%	8.9%
Gabor filtering	94.19%	7%	94.5%	5.9%

Figure 8.

Body part obscurity as determined by the first-tier classifier.

2.6 Performance evaluation

The objective of this study is to accurately identify whether there are any pedestrians in an image and where these pedestrians are located. The accuracy rate, false negative rate (i.e. miss rate), true positive rate (i.e. detection rate), false positive rate, and receiver operating characteristic (ROC) curve value are used to evaluate the efficacy of the proposed method. The accuracy, miss, detection, and false positive rates are defined as follows:

$\displaystyle\text{accuracy rate}=\frac{\textit{TP}+\textit{TN}}{\textit{TP}+% \textit{FP}+\textit{FN}+\textit{TN}}{\%}$ (20) $\displaystyle\text{false negative rate}=\text{FNR}=\frac{\textit{FN}}{\textit{% FN}+\textit{TP}}{\%}$ (21) $\displaystyle\text{true positive rate}=\text{TPR}=\frac{\textit{TP}}{\textit{% FN}+\textit{TP}}{\%}$ (22) $\displaystyle\text{false positive rate}=\text{FPR}=\frac{\textit{FP}}{\textit{% FP}+\textit{TN}}{\%}$ (23)

where the accuracy rate is the proportion of correct detection results to all detection results. The miss rate is the proportion of undetected pedestrians to pedestrians actually present. The detection rate is the proportion of correctly detected pedestrians to the pedestrians actually present. The false positive rate is the proportion of cases in which pedestrians are incorrectly detected to cases in which no pedestrian is present. The receiver operating characteristic (ROC) curve is also used [56], with the $x$ -axis representing the false positive rate and the $y$ -axis the detection rate.

3. Results and discussion

3.1 Result evaluation in first-tier classifier

3.1.1 Preprocessing using Gabor filtering on trunk

First, this study uses Gabor filtering to preprocess the trunk and enhance the efficacy of MBHOG features. MBHOG features of the trunk are extracted for all training data. After the dimensions of the features are reduced through PCA, 10-fold cross validation is performed using the SVM classifier. The results show that the mean accuracy and detection rates of MBHOG features preprocessed through Gabor filtering increase by 3.13% and 3.4%, respectively, the false positive rate decreases by 3%, and the miss rate is improved by 1.14% (Table 3).

Table 4
Classification results of different pedestrian limb

	Accuracy	Miss	Detection	False
	rate	rate	rate	positive rate
Left arm	86.52%	5.86%	91.1%	8.9%
Right arm	86.06%	11.51%	86.1%	13.9%
Left leg	87.29%	14.08%	87.3%	12.7%
Right leg	88.54%	10.01%	88.5%	11.5%

Figure 9.

Results of proposed pedestrian detection method: (a) Nonoccluded full-body pedestrian images, (b) Partially occluded pedestrian images.

Figure 10.

ROC curves of the proposed method and six state-of-the-art approaches for nonoccluded pedestrian images.

Figure 11.

ROC curves of the proposed method and six state-of-the-art approaches for partially occluded pedestrian images.

3.1.2 Limb classification results

Thereafter, PHF feature extraction of the four limbs is performed on all training data. After the dimensions of the features are reduced through PCA, 10-fold cross validation is conducted using the SVM classifier with the RBF kernel and the parameters of Gamma and Cost being 0.03375 and 312.5, respectively.

As shown in Table 4, the classification results of the four limbs are highly similar, with accuracy rates approximating 87%. The images of the four limbs change considerably more than those of the trunk; thus, the accuracy rate is lower for the limb images. However, these images can still be recognized by the second-tier SVM classifier.

3.2 Results of proposed pedestrian detection method

The proposed pedestrian detection method is tested in different complex images. Figure 9 presents partially occluded pedestrian images (Fig. 9a) and nonoccluded pedestrian full-body images (Fig. 9b), with detected pedestrians circled with a red rectangle.

3.3 Comparisons with six state-of-the-art approaches

The performance of the proposed method is tested on full-body and partially occluded images of pedestrians in the INRIA database according to the four criteria (the accuracy, miss, detection, and false positive rates). The method is also compared with six state-of-the-art approaches, which are evaluated on the basis of the ROC curve. These six approaches – i.e., Dalal and Triggs approach [8], Wang et al. approach [47], Hoang and Jo approach [48], Zhang et al. approach [57], Ouyang and Wang approach [58], and Luo et al. approach [59] – are selected for comparison because they resolve pedestrian obscurity, ensure high pedestrian detection accuracy, and incorporate the INRIA database.

The results of the proposed method in detecting nonoccluded pedestrian images are with respect to the four criteria: accuracy rate, miss rate, detection rate, and false positive rate. The accuracy rate, miss rate, detection rate, and false positive rate of the proposed method for nonoccluded pedestrian image detection are 97.61%, 2.7%, 97.6%, and 2.4%, respectively. The accuracy rate of the proposed method in detecting pedestrian full-body images is as high as 97.61%. As Fig. 10 shows, at a false positive rate of 0.01, the proposed method achieves a detection rate of at least 5.5% higher than that of any of the six comparable approaches. The figure also indicates that the detection rate of the proposed method is the highest across all false positive rates.

The results of the proposed method in detecting partially occluded pedestrian images are also with respect to the four criteria. The accuracy rate, miss rate, detection rate, and false positive rate of the proposed method for partially occluded pedestrian image detection are 93.81%, 3.3%, 93.8%, 6.5%, respectively. The accuracy rate of the proposed method in detecting partially occluded pedestrian images is as high as 93.81%. As Fig. 11 shows, with the false positive rate ranging between 0.025 and 0.05, the proposed method exhibits a detection of at least 4.5% higher than that of any of the six comparable approaches. The figure also suggests that the detection rate of the proposed method is the highest across all false positive rates.

4. Conclusion

This study has proposed a method to process crowds and partially occluded pedestrians during pedestrian detection. This method is developed in two ways. First, two image features are proposed to process partial obscurity according to the body parts of pedestrians involved in images: the MBHOG features preprocessed by Gabor filtering for depicting the trunk, and PHF for depicting the four limbs. Computation of both features is accelerated through the integral image acceleration method. Second, a two-tier SVM classifier is designed to process partial pedestrian obscurity. The first-tier SVM classifier determines if any body part is occluded. The classification probabilities of unoccluded body parts are used as the inputs of the second-tier SVM classifier to determine if any pedestrians are present in the detection window. Compared with six state-of-the-art detection approaches, the proposed method is more accurate, performing better in the ROC curve and four other criteria. In addition, the method processes images in which a crowd is present or pedestrians are partially occluded and enables pedestrian detection in images of different scenes.

Footnotes

Acknowledgments

The author would like to express his sincere appreciation for grants partially from MOST105-2410-H-194-059-MY3, Ministry of Science and Technology, Taiwan. In addition, he also thanks the student, Ji-Di Su, which helps to handle parts of the materials.

References

Crociani

Lämmel

. Finding flows equilibrium in pedestrian environments with a cellular automaton. Computer-Aided Civil and Infrastructure Engineering 2016; 31: 432-448.

Chen

Henrickson

Wang

. Kinect-based pedestrian detection for crowded scenes. Computer-Aided Civil and Infrastructure Engineering 2016; 31: 229-240.

Olmeda

Premebida

Nunes

Armingol

, de la Escalera

. Pedestrian detection in far infrared images. Integrated Computer-Aided Engineering 2013; 20(4): 347-360.

Goodman

. Integrating a statistical background-foreground extraction algorithm and SVM classifier for pedestrian detection and tracking. Integrated Computer-Aided Engineering 2013; 20(3): 201-216.

Lacabex

Cuesta-Infante

Montemayor

Pantrigo

. Lightweight tracking-by-detection system for multiple pedestrian targets. Integrated Computer-Aided Engineering 2016; 23(3): 299-311.

París

Brazalez

. A new autonomous agent approach for the simulation of pedestrians in urban environments. Integrated Computer-Aided Engineering 2009; 16(4): 283-297.

Ciarelli

Salles

Oliveira

. Human automatic detection and tracking for outdoor video. Integrated Computer-Aided Engineering 2011; 18(4): 379-390.

Dalal

Triggs

. Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2005; 1: 886-893.

Geronimo

Lopez

Sappa

Graf

. Survey of pedestrian detection for advanced driver assistance systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 2010; 32(7): 1239-1258.

10.

Hsu

. EEG-based motor imagery classification using neuro-fuzzy prediction and wavelet fractal features. Journal of Neuroscience Methods 2010; 189(2): 295-302.

11.

Hsu

. Continuous EEG signal analysis for asynchronous BCI application. International Journal of Neural Systems 2011; 21(4): 335-350.

12.

Baumgartner

Flesia

Gimenez

Pucheta

. A new image segmentation framework based on two-dimensional hidden Markov models. Integrated Computer-Aided Engineering 2016; 23(1): 1-13.

13.

Delibasis

Georgakopoulos

Kottari

Plagianakos

Maglogiannis

. Geodesically-corrected Zernike descriptors for pose recognition in omni-directional images. Integrated Computer-Aided Engineering 2016; 23(2): 185-199.

14.

Hsu

. Application of competitive hopfield neural network to brain-computer interface systems. International Journal of Neural Systems 2012; 22(1): 51-62.

15.

Hsu

. Single-trial motor imagery classification using asymmetry ratio, phase relation, wavelet-based fractal, and their selected combination. International Journal of Neural Systems 2013; 23(2): 1350007.

16.

Sánchez

Moreno

Vélez

. Analyzing the influence of contrast in large-scale recognition of natural images. Integrated Computer-Aided Engineering 2016; 23(3): 221-235.

17.

Koziarski

Cyganek

. Image recognition with deep neural networks in presence of noise – dealing with and taking advantage of distortions. Integrated Computer-Aided Engineering 2017; 24(4): 337-350.

18.

Hsu

. Application of quantum-behaved particle swarm optimization to motor imagery EEG classification. International Journal of Neural Systems 2013; 23(6): 1350026.

19.

Hsu

. A novel image registration algorithm for indoor and built environment applications. Computer-Aided Civil and Infrastructure Engineering 2015; 30(10): 802-814.

20.

Zhu

Yang

Zhang

. Panoramic image stitching for arbitrary shaped tunnel lining inspection. Computer-Aided Civil and Infrastructure Engineering 2016; 31(12): 936-953.

21.

Almeida

Biscaia

Melicio

Chastre

Fonseca

. In-plane displacement and strain image analysis. Computer-Aided Civil and Infrastructure Engineering 2016; 31(4): 292-304.

22.

Hsu

. Assembling a multi-feature EEG classifier for left-right motor data using wavelet-based fuzzy approximate entropy for improved accuracy. International Journal of Neural Systems 2015; 25(8): 1550037.

23.

Hsu

. Automatic atrium contour tracking in ultrasound imaging. Integrated Computer-Aided Engineering 2016; 23(4): 401-411.

24.

Hsu

. A hybrid approach for brain image registration with local constraints. Integrated Computer-Aided Engineering 2017; 24(1): 73-85.

25.

Sanchez

Ferreiroa

Arias

Martinez

. Image sharpness and contrast tuning and in the early visual pathway. International Journal of Neural Systems 2017; 27(8): 1750045.

26.

Adeli

Hung

. Machine learning – neural networks. Genetic Algorithms, and Fuzzy Systems, John Wiley and Sons, New York, 1995.

27.

Hsu

. Segmentation-based compression: New frontiers of telemedicine in telecommunication. Telematics and Informatics 2015; 32(3): 475-485.

28.

Hsu

Liu

Chiu

. Application of multiscale amplitude modulation features and FCM clustering to brain-computer interface. Clinical EEG and Neuroscience 2012; 43(1): 32-38.

29.

Andriluka

Schnitzspan

Meyer

Kohlbrecher

Petersen

, Stryk

Schiele

. Vision based victim detection from unmanned aerial vehicles. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2010; 1740-1747.

30.

Zhang

Yan

Zeng

. Fatigue detection with 3D facial features based on binocular stereo vision. Integrated Computer-Aided Engineering 2014; 21(4): 387-397.

31.

Delibasis

Georgakopoulos

Kottari

Plagianakos

Maglogiannis

. Geodesically-corrected Zernike descriptors for pose recognition in omni-directional images. Integrated Computer-Aided Engineering 2016; 23(2): 185-199.

32.

Curio

Edelbrunner

Kalinke

Tzomakas

, Seelen

. Walking pedestrian recognition. IEEE Transactions on Intelligent Transportation Systems 2000; 1(3): 155-163.

33.

Hsu

. Brain-computer interface: The next frontier of telemedicine in human-computer interaction. Telematics and Informatics 2015; 32(1): 180-192.

34.

Hsu

. Registration accuracy and quality of real-life images. PLoS ONE 2012; 7(7): e40558.

35.

Hsu

. Clustering-based compression connected to cloud databases in telemedicine and long-term care applications. Telematics and Informatics 2017; 34(1): 299-310.

36.

Eisenloffel

Adeli

. Imaging techniques for cable network structures. International Journal of Imaging Systems and Technology 1990; 2(3): 157-168.

37.

Adeli

Hung

. A fuzzy neural network learning model for image recognition. Integrated Computer-Aided Engineering 1993; 1(1): 43-55.

38.

Hsu

Chou

. Medical image enhancement using modified color histogram equalization. Journal of Medical and Biological Engineering 2015; 35(5): 580-584.

39.

Hsu

. Analytic differential approach for robust registration of rat brain histological images. Microscopy Research and Technique 2011; 74(6): 523-530.

40.

Adeli

Hung

. A concurrent adaptive conjugate gradient learning algorithm on MIMD machines. Journal of Supercomputer Applications, MIT Press 1993; 7(2): 155-166.

41.

Hung

Adeli

. Parallel backpropagation learning algorithms on cray Y-MP8/864 supercomputer. Neurocomputing 1993; 5(6): 287-302.

42.

Adeli

Hung

. An adaptive conjugate gradient learning algorithm for effective training of multilayer neural networks. Applied Mathematics and Computation 1994; 62(1): 81-102.

43.

Dollar

Wojek

Schiele

Perona

. Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 2012; 34(4): 743-761.

44.

Rehg

. CENTRIST: A visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 2011; 33(8): 1489-1501.

45.

Viola

Jones

. Robust real-time face detection. International Journal of Computer Vision 2004; 57(2): 137-154.

46.

Gao

Lao

. Adaptive contour features in oriented granular space for human detection and segmentation. IEEE Conference on Computer Vision and Pattern Recognition 2009; 1786-1793.

47.

Wang

Han

Yan

. An HOG-LBP human detector with partial occlusion handling. IEEE 12th International Conference on Computer Vision 2009; 32-39.

48.

Hoang

. Joint components based pedestrian detection in crowded scenes using extended feature descriptors. Neurocomputing 2016; 188: 139-150.

49.

Dai

. A wavelet support vector machine-based neural network meta model for structural reliability assessment. Computer-Aided Civil and Infrastructure Engineering 2017; 32: 344-357.

50.

Direito

Teixeira

Sales

Castelo-Branco

Dourado

. A realistic seizure prediction study based on multiclass SVM. International Journal of Neural Systems 2017; 27: 1750006.

51.

Khedher

Illan

Gorriz

Ramirez

Brahim

Meyer-Baese

. Independent component analysis support vector machine-based computer-aided diagnosis system for Alzheimer’s with visual support. International Journal of Neural Systems 2017; 27: 1650050.

52.

Dalal

. Finding People in Images and Videos. PhD thesis: INRIA Rhne-Alpes, Grenoble, France, 2006.

53.

Everingham

Van Gool

Williams

CKI

Winn

Zisserman

. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results.

54.

Conde

Moctezuma

De Diego

Cabello

. HoGG: Gabor and HoG-based human detection for surveillance in non-controlled environments. Neurocomputing 2013; 100: 19-30.

55.

Schroff

Kalenichenko

Philbin

. Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015; 815-823.

56.

Dollár

Wojek

Schiele

Perona

. Pedestrian detection: A benchmark. IEEE Conference on Computer Vision and Pattern Recognition 2009; 304-311.

57.

Zhang

Bauckhage

Cremers

. Informed haar-like features improve pedestrian detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2014; 947-954.

58.

Ouyang

Wang

. Joint deep learning for pedestrian detection. IEEE International Conference on Computer Vision (ICCV) 2013; 2056-2063.

59.

Luo

Tian

Wang

Tang

. Switchable deep network for pedestrian detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014; 899-906.

Automatic pedestrian detection in partially occluded single image

Abstract

Keywords

1. Introduction

2. Materials and methods

2.2 Pedestrian head and body feature extraction through Gabor filtering and MBHOG

2.2.3 Configuring multiscale cells and blocks

2.2.4 Cumulative sum of histogram gradients method

2.3.1 Preprocessing

2.3.2 PHF image features and integral image acceleration method

Table 2 Proposed two-tier classifiers

Table 3 MBHOG with and without Gabor filtering preprocessing

3.1 Result evaluation in first-tier classifier

3.1.1 Preprocessing using Gabor filtering on trunk

Table 4 Classification results of different pedestrian limb

3.2 Results of proposed pedestrian detection method

3.3 Comparisons with six state-of-the-art approaches

4. Conclusion

Footnotes

Acknowledgments

References

Table 2
Proposed two-tier classifiers

Table 3
MBHOG with and without Gabor filtering preprocessing

Table 4
Classification results of different pedestrian limb