Abstract
Binarized normed gradients (BING) can be utilized as a preprocessing step for generic object proposal generation, and has attracted great attention because of its fast running and appropriate generalization performance. Recently, although some modified schemes were presented to improve the proposal localization quality, the mechanism of enhancing the performance is still an open problem. In this paper, Adaptive weighted binary normed gradients plus (AWBING Plus) algorithm is proposed, based on the BING method, which replaces the support vector machine (SVM) with adaptive weighted extreme learning machine (Adaptive WELM) to reduce the number of proposals, as well as comparable performance, by using the multi-thresholding straddling expansion (MTSE) as the post-processing stage to enhance the localization quality. We explain the methodology of WELM applied to BING, and analyzed the effect of the improved WELM algorithm, which is named Adaptive WELM. The experimental results from PASCAL VOC2007, Microsoft COCO2014 and ILSVRC2013 show that the proposed approach achieved superior performance compared with other advanced methods on generic object proposal generation, and it runs faster as well.
Introduction
Object detection, as a fundamental research in computer vision, has recently received great attention and made significant progress in many applications such as image procession [11, 18], feature extraction [12, 20] and object recognition [19, 24]. The most advanced object detectors require the categorization of specific classifiers which score the sliding windows in featured image pyramids during the testing phase. However, humans typically perceive objects before discrimination because of human visual and cognitive systems [6, 52]. Likewise, it is reasonable for a detector to detect objects before species identification. Generic object proposal generation, which has significant application in preprocess of detection, has remarkably motivated scholars to do research in this field [1, 39]. These methods aim to evaluate how likely an image window contains an object regardless of object class and offer a bounding box, framing the object with high rate of overlap between the ground-truth and predicted bounding boxes. In recent years, it has been demonstrated that generic object proposal generation, as a data preprocess step, can effectively reduce the memory and computation cost for the purpose of object detection [9].
Within the current state of the algorithms for generic object detection [1, 51], binarized normed gradients (BING) [10] is a robust method because of its outstanding performance and efficiency. The intuitions of foregrounds in the image is that objects, which are stand-solely things with well-defined closed boundaries, containing strong similarity when observing the norm of the gradient (NG) feature even if the size of the object windows turns to 8×8 pixels. BING takes the advantage of the finding and uses the 64D NG features in two-stage cascaded support vector machines (SVMs) to estimate all possible objects in an image. In order to speed up the algorithm, it is attempted to employ binarization of NG features and the model W at Stage I, requiring only a few bitwise operations to efficiently achieve their inner product. Consequently, BING is widely considered less complex and more computationally efficient, while still being effective for capturing the generic objects.
Following up the study, several scholars have studied and developed the BING approach. For instance, Zhao et al. [60], from the perspective of combinational geometry, demonstrated the validity of BING. In addition, they also attempted to predict the potential target area by evaluating the hot position and hot scale of objects in databases. Liang et al. [34] used BING as a base-method, and then trained an adaptive object proposal generation for tracking. To the best of our knowledge, however, the existing BING’s research involves no discussion from performance point of view. Additionally, it was revealed that outliers may affect the accuracy of the models’ fitting and even provide some false information, which is not considered by others.
In this paper, a novel approach, i.e. adaptive weight binary normed gradients plus (AWBING Plus), is proposed to overcome the current deficiencies in BING. The main contribution of this paper is presented as follows. First, it attempts to analyze the performance of Stage I and explain in detail the invalidation of model W. Second, the SVM in BING is replaced with the weighted extreme learning machine (WELM) [62] and binary approximate input weights of WELM. Third, an adaptive WELM algorithm is proposed to alleviate the impact of outliers on generic object estimation performance. Eventually, the excellent performance of the proposed method is shown for generic object proposal generation.
The remainder of this paper is organized as follows. Section 2 reviews previous studies on object proposal generation. In Section 3, the BING is fully analyzed. The proposed algorithm is explained in Section 4. The experimental simulation and results are presented in Section 5. Finally, conclusions and the next objectives are described in Section 6.
Background
In object detection, extracting the main features of a special category with different strategies (e.g., color, texture, and structure) was of great importance. Selective search [50] was proposed by Uijlings et al. which was used in object recognition by the similarity measurement on local color regions. Gai et al. proposed improved quaternion wavelet transform methods [18, 19] to extract the features of texture. A high detection rate was obtained by using a complex classifier [11, 57]. For a more accurate description of category information, some algorithms also include feature information such as parts and locations [16, 17]. Although the detection rate of specific categories was significantly improved, it also reduced the efficiency because of the use of a complex classifier or the characteristics of samples with massive information. In addition, most object detectors algorithms still use the sliding window strategy. The number of windows can reach 106∼ 107 because the windows of different aspect ratios and scales are used in the image pyramid. Therefore, the mechanism of effectively reducing the number of candidate windows is an important problem, and should be seriously studied.
Alexe et al. [1, 2] first proposed a measure of generic object, which first represented the window with a combination of multiple visual cues, and then determined whether the window contains foreground objects according to the Bayesian framework. Their research could reduce the number of windows in the subsequent process, and the superpixel-based image segmentation proposed in their article is still widely used with different grouping criteria. Uijlings et al. [50], for instance, proposed a data-driven selective search method for grouping various characteristics including superpixels by a hierarchical grouping and an exhaustive search method. In order to improve the performance of image segmentation, Yanulevskaya et al. [56] merged the superpixels at each stage of a training random forest. Sang et al. [40] proposed a modified selective search method which generated object proposals on three-dimensional (3D) indoor image by cost function-based segment grouping and depth segmentation. Similar to the selective search method, randomized prim’s algorithm [37] calculated the similarity of adjacent sub-regions as well. The difference is attributed to the seeds of graph-cut, which were randomly selected. The introduction of grab-cut achieved the foreground-background segmentation, and avoided the iteration of multi-feature similarity in adjacent sub-regions. After that, a large number of seed-based grab-cut extension algorithms have emerged. Carreira et al. [7] attempted to predict the seeds by a Gaussian mixture model and grab-cut. Endres et al. [13] proposed a hierarchical merging algorithm based on graph-cut technique. Krähenbüh et al. [31] indicated the seeds by using the rank of SVM, the rapid geodesic distance transformation, and generation of proposals through a dynamic programming algorithm. In addition, there are other segment-based methods to generate proposals. Humayun et al. [28] proposed the generation of a segment pool with superpixel graph. Arbelaez et al. [3] proposed a multiscale hierarchical segmentation algorithm without parameter learning. Although these generic object proposal generations on the basis of segmentation are capable of achieving pleasurable localization quality, how to effectively develop the method continually perplexes the researchers.
Along with previous studies [1, 2], Rahtu et al. [42] proposed a novel approach to rank the candidate’s scores of windows through the structured learning method, which was based on the rapid cascade framework. This method achieved better detection rate as well as lower computational cost. After this, the researches on generic object proposal generation were mostly conducted based on the multi-cascade model. Based on Rahtu et al. ’s algorithm, Blaschko et al. [5] improved the performance of the general object generator by optimizing the non-maximal suppression algorithm. The regionlet-based method [51] was proposed to handle deformation by spatial relationships among all groups of sub-regions, which was learned by a cascaded boosting classifier. Cheng et al. [10] proposed the BING algorithm to efficiently produce object boxes in sliding widows. The computational efficiency of BING surpassed several state-of-the-art methods. Zhang et al. [58, 59] extended the work of BING by the cascaded ranking SVMs to generate a set of proposals based on the gradient features in the sliding windows. In order to improve the localization quality, they utilized the regression algorithm to readjust the candidate window, and obtain a more favorable performance with the GPU acceleration of the multi-thresholding straddling expansion (MTSE)[54] algorithm. Numerous approaches have been presented to enhance the performance of BING, yet none of them has theoretically analyzed BING.
The extreme learning machine (ELM) [25, 26] is widely used in classification, regression, and feature learning because of its efficient generalization performance as well as minimal manual intervention. The ELM algorithm was improved, (in particular as WELM) [62], and is especially appropriate for imbalanced data learning. The generic object proposal generation belongs to the field of imbalanced data learning. From an algorithmic point of view, the inner product between the hidden layer output function and output weights is similar to that of SVM. In particular, the weight vector in the hidden node of WELM can be binarized as well. These together provide the possibility of replacing SVM with WELM in BING.
With the popularization of deep learning [21, 48], more and more new models have been applied to object recognition [8, 33]. However, the study of traditional generic object proposal generation is still of great significance and is very meaningful for convolutional networks [35, 53].
BING analysis
As mentioned earlier, the efficiency of BING surpasses all existing advanced generic object proposal algorithm. The methodology of BING is to train two-stage cascaded linear filters using linear SVM, which is known as LIBLINEAR [15]. In the first stage, a single filter W ∈R64 was learned, and evaluated the binarized NG of the training samples. In the second stage, the group of filters with different size was obtained by filter scores achieved in the first stage and corresponding samples labeling. Binary filter W can reduce the computational time of inner product of W and NG features in the Stage I.
In order to gain deep insight into the BING algorithm, DR-#WIN [2] was analyzed in Stage I on the VOC2007 test set. However, before that, the analysis of sample distribution was required. In a binary classification problem, negative samples normally comprise the majority of the set, and few positive samples are contained. Of 1×105 training samples in BING, it was led to achieve around 7×104 positive samples. In other words, the positive samples account for the vast majority in the training set. This is not in line with the sample distribution of imbalanced data. In order to gain an in-depth analysis, multiple metrics were used to evaluate the performance of BING. The filter W ∈R64 of BING was learned from L2-loss linear SVR supported by LIBLINEAR. As shown in Fig. 1, the DR-#WIN of BING in Stage I is 82.54%. Under the same training samples, we utilized the L2-loss linear SVM, which was supported by LIBLINEAR, to classify the test samples, thereby enabling a user to evaluate the performance of the BING algorithm by other metrics. The test samples, extracted from the VOC2007 test set, contained 7.3673×104 positive and 2.6327×104 negative samples. As illustrated in Fig. 2, the accuracy and sensitivity of BING in Stage I were 73.67% and 100%, respectively. However, the specificity of the algorithm is 0.0001%. The results demonstrated that all of the test samples were classified as positive. The inference from the experiments is that a single model W ∈R64 may not have the ability to filter the test samples.

The DR-#WIN results of performance in Stage I with the aim of comparison between BING and WELM and AWELM on Pascal VOC2007. (a) AWBING Plus, (b) BING, (c) Selective Search, (d) EdgeBoxes, (f) Endres.

The performance in Stage I on Pascal VOC2007.
This surprising insight remind us what we can do to address the issue of BING. Among machine learning methods, WELM arouses the most interest from us. In terms of optimization of view, the essence of WELM is that the hidden neuron output matrix is determined by the KKT theorem for training data and input weight. This optimal process is similar to SVM. Furthermore, the WELM [62] indeed outperforms the SVM in dealing with the classification of imbalanced data [23]. ELM has attracted great attention due to its better regression and classification performance at extremely fast learning speeds. However, WELM was proposed on the basis of ELM with superior generalization performance for imbalanced data, while maintaining high efficiency.
In this section we describe our method for finding generic object proposals. During the training of the proposed algorithm, we begin by replacing SVM with WELM (Subsection 4.1) in Stage I of BING, so as to increase the quantity of filters, which results in improved generalization performance (Subsection 4.2.1). Next, we compute the weight-matrix V of WELM by adaptive WELM so as to ensure that the separating boundary is not affected by outliers (Subsection 4.2.2). Finally, we use multi-thresholding straddling expansion (MTSE) as the post-processing stage in testing to improve and lower the number of proposals, and thus further improve the localization quality (Subsection 4.2.3).
WELM algorithm
The ELM algorithm is based on single-hidden layer feed-forward neural networks (SLFNs), in which the input weight vectors and biases of hidden nodes do not need to be turned by gradient descent.
Given N training samples (x i , t i ), where x i =[xi1, xi2, …, x in ] T ∈ R n and t i = [ti1, ti2, …, t im ] T ∈R m . The relationship between the inputs and outputs of SLFNs is
where η, T, and H are the output weight, binary labels, and hidden output matrix, respectively. H can be represented as
After w
i
and d
i
have been randomly assigned, Equation (1) can be solved by the least square method [4] or optimization theory [47, 62] as
The WELM assigns weight to each training sample. In WELM, Equation (2) is represented as
According to the optimization method, Equation (4) is expressed as
Binarized filters for WELM
As mentioned earlier, the optimal solution method for WELM is consistent with SVM. From an algorithmic point of view, the process of getting filter score in WELM is similar to that of SVM in BING. Surprisingly enough, the performance of WELM, which replace SVM in Stage I of BING, is superior to SVM used in Stage I. As illustrated in Figs. 1 and 2, in Stage I, the DR-#WIN and accuracy of WELM is on par with SVM. However, the specificity of WELM is ticking up at 64.36%, and the sensitivity of WELM is up to 85.42%. This can be justified in that the model of W in the Stage I of BING is only one filter which has limited ability to evaluate all objects. This is not the case in WELM, where multiple input weights in hidden layer can be served as filters. Selecting the proper number of hidden neurons in the neuron network can effectively improve the generalization ability of the system and reduce the error rate [46]. WELM is precisely this neuron network in which multiple input weights in hidden layer can be served as filters. Therefore, WELM provides improved generalization performance for imbalance learning.
The crux of the method is how to implement binarization the variables of the inner product. The inner product of input weight w and sample x are the main activities when WELM is applied to BING. For a superior performance, sample x needs to be normalized [27]. Then, w and x could be binary approximations, which is proposed in [22]:
After this, the binary inner product of w and x is written as
As a result, the solution of Equation (11) is
Zong et al. [62] defined a weight v i for each training sample x i . Sample x i originates from a minority class, and the corresponding weight v i will be greater than others belonging to the majority class. The main goal is to push the boundary toward the majority class, not the minority class. However, it is unreasonable in practice to allocate the same weights for congener samples. For example, outliers in the positive sample set, as shown in Fig. 3(a), do not have enough representations. However, because of the outliers with the larger weights, the separating boundary can be partitioned by as many as the negative samples into the minority side. Mathematically speaking, the classification result can be explained referring to Equation (5). If the user is more concerned with the minimization of training error, then WELM must reduce the requirement of the distance between classes. As a consequence, the overall distribution of the various class is overlooked. To give the outliers less weight, in this paper an adaptive WELM algorithm is proposed to rationally separate all of the imbalanced data, which combines the minimum reconfiguration error and information theory [38, 55], so as to adjust the weight of every training sample. From a mathematic perspective, our method still searches the separating boundary to maximize the distance between classes while minimizing the mean square error. Information entropy is introduced in the adaptive WELM to effectively show the centralization of data with the same class. Outliers are found by the information entropy, and their weights are reduced. Fig. 3(b) illustrates the boundary that separates the samples in the middle of the categories. It can be observed that the separating boundary moves toward the middle instead of the outliers of minority. Therefore, our adaptive WELM algorithm lowers the ability of outliers to disturb the overall distribution of data.

Scatter diagram with the separating boundary. (a) WELM, (b) AWELM.
Assume that the hidden output vectors H i (i = 1, …, N) are zero-meaned. Define matrix B = H T V2H, λ1 > ⋯ > λ k (k < = N) as the eigenvalues of B, with corresponding eigenvectors U1, …, U k . Our aim is to solve the diagonal matrix V by minimizing the reconstruction error between VH and its reconstructionVHU T U, and the entropy of V, which leads to the following formulation
Minimize:
To calculate
Consequently, the U is altered with the v i update, and the v i is optimized with the U modification. The iteration leads to an optimum solution for v i . The proposed scheme is outlined in Algorithm 1.
It is known that the higher value of k (k < = N), the lower the reconstruction error of VH - VHU T U. When k is equal to N, however, regardless of the adjustment of V, the value the reconstruction error is zero. Therefore, we set the k at N-1.
The initial weight scheme of V is given by the principle of the outlier with less weight. This means that the value of v
i
is inversely proportional to its variance. It can also meet earlier rules so that the weights of most samples from the minority class are larger than others belonging to the majority class. Therefore, the initial weight of sample x
i
is
In this study, it was attempted to propose the AWBING algorithm, replacing SVM with Adaptive WELM and binaries of the inner product of the input weights and the samples. As shown in Figs. 1 and 2, AWBING retains the DR-#WIN and accuracy, while significantly improving the specificity.
Inspired by [58], the multi-thresholding straddling expansion (MTSE) [54] is utilized as the post-processing stage to further improve the detection rate (DR). The experimental results in the Section 5 also shows that the combination of AWBING and MTSE, which was named AWBING Plus, improves the DR significantly.
Experimental process and analysis results
In this section, we first introduce the experimental datasets, and discuss the parameter settings. Finally, we analyze comparisons of the proposed AWBING Plus method with BING and several other advanced algorithms. All of the experimental results originate from the proposed algorithm which is performed 10 times independently and the average value is obtained.
Description of datasets
Experiments are conducted to evaluate the performance of the proposed algorithm on three datasets: PASCAL VOC2007 [14] is taken as standard into account in the field of object detection and identification and is currently the most frequently used dataset in the research area of target recognition. It contains 20 object categories. The training set includes 2501 images with 6301 objects. The testing set contains 4952 images with 12032 objects. In the training process, we choose the examples without “difficult” annotation. Microsoft COCO2014 [36] is also a widely accepted object dataset, which is used in comparisons and analysis of performance. Microsoft COCO2014 contains 91 proposed categories. The dataset contains 2.5 million bounding boxes in 3.28 ×105 images. Due to wider categories and greater number of instances, Microsoft COCO2014 has become the main object dataset in object detection. ILSVRC2013, a subset of ImageNet2013[45], is used in the visual recognition challenge. ImageNet2013 is used in the study of object recognition, and is currently the largest visual database in the world. Stanford University computer scientists Fei-Fei Li et al. built this project which mimics human recognition. ILSVRC has been hosted eight times since 2010. It is now acknowledged as a benchmark in the visual database for measuring the performance of classifiers. ILSVRC2013 contains 200 basic-level categories. The training set includes 395,909 images with 345,854 objects. The validation set contains 19,680 images.
The proposed model is compared with eight advanced algorithms namely BING [10], EdgeBoxes [61], Endres [13], multiscale combinatorial grouping (MCG) [3], Objectness [2], RandomlizedPrims [37], Rantalankila [43], and Selective Search [50].
Parameter settings
The parameter settings are evaluated on the basis of PASCAL VOC2007, and the Intersection over Union (IoU) threshold is presented at 0.5. In the AWBING algorithm, there are the following parameters: the threshold ɛ, and the maximum iterationsT, C1, and C2 We fixed the parameters ɛ = 0.05 and T = 6 in the experiments which could operate well. According to the method developed in [62], the constant C1, was set to 210. In this study, the parameter C2 is tuned to achieve an optimal result.
In addition, different values for C2 (2-50, 2-49, ... , 249, 250) are specified, and the corresponding classification performance is shown in Fig. 4. It is evident that the performance is significantly improved when C2 reaches 22, and eventually leads to optimal performance when C2is 214. With the increase of C2value, the performance shows no remarkable change. Therefore, the parameter C2 in AWBING is set to 214.

The performance in Stage I with different choices of C2 for AWBING.
In this study, the proposed method is compared with BING. Fig. 5 shows the number of proposals per each test sample on PASCAL VOC2007. As illustrated in Fig. 5, AWBING has the ability to remarkably reduce the number of proposals. The reason for this is that multiple filters of AWBING come together to evaluate the samples, demonstrating that the proposed method is more applicable to evaluating generic object proposals. When MTSE is used as the post-processing stage in the proposed method, the number of proposals significantly decreases. Importantly, the generic object proposal detection rates of AWBING and AWBING Plus are excellent when the number of proposals is limited (about 1000), which is testified in subsequent experiments.

The number of proposals of each test sample on PASCAL VOC2007.
The performance of DR and IoU overlap threshold were compared, and presented in Figs. 6, 9 and 11, which are tested on the PASCAL VOC 2007, Microsoft COCO2014 and ILSVRC2013. In addition, their DR at exact IoU as well as the number of proposals are shown in Tables 1, 3 and 4, respectively.

Recall versus IoU curve in different methods on PASCAL VOC2007.
Comparison of DR values between different proposal generation methods on PASCAL VOC2007
As illustrated in Fig. 6, VOC2007, AWBING and AWBING Plus all achieve satisfactory DR whether the number of proposals is small or large. The DR of AWBING exceeds that of BING, and AWBING Plus is close to the best detector. Particularly, the DR of AWBING Plus is much more pronounced when the number of proposals is up to 100.
Table 1 demonstrates that AWBING Plus achieves a satisfactory result for the generic object detection when compared to others on PASCAL VOC2007. It is noteworthy that the performance of AWBING Plus is improved by about 43.75% in comparison with BING (the number of proposals = 1000, IoU threshold = 0.8).
On Microsoft COCO2014, all of the methods executed were shown to be worse than on PASCAL VOC2007 due to the complexity of Microsoft COCO2014. Despite these challenges, the proposed method is able to successfully generate the proposals. As depicted in Fig. 9, it can be seen that AWBING is competitive at DR over a large range with BING. While increasing the number of proposals, the DR of AWBING and AWBING Plus both markedly improved. When the number of proposals rises above 100, the DR of AWBING Plus issignificantly enhanced. However, there is still a narrow gap between the AWBING Plus and best algorithm among competitors when the number of proposals is 100 and 1000.
As shown in Table 3, it can be concluded that MCG and the proposed method are the two best approaches among these competitive methods. Furthermore, in IoU equal to 0.5, the proposed method outperforms the other schemes, including Endres, Randomlized Prims, Rigor, and Selective Search, in a range between 5% and 12%.
Figure 11 illustrates DR when varying IoU thresholds for the different numbers of object proposals for ILSVRC2013. It is observed that the performance of most of methods on ILSVRC is between that of PASCAL VOC 2007 and Microsoft COCO2014. Both the DRs of AWBING and AWBING Plus achieve good results across a variety of IoU at different numbers of proposals. In particularly, when the number of proposals is 100 and 1000, the DRs of AWBING Plus, compared with original BING, lead to a significant improvement from IoU = 0.6 to 0.8.
Table 4 summarizes the DR comparison results on ILSVRC2013 using the different numbers of proposals at the IoU threshold of 0.5, 0.7, 0.8. The highest DR for each competing method is made at the IoU threshold = 0.8 and the number of proposals = 1000. However, no competing method exceeds 84%, except for MCG, AWBING and AWBING Plus. AWBING achieves a DR of 86.49%, and AWBING Plus reaches 87.22%, which is second only to MCG. In general, the performance of AWBING and AWBING Plus exceed that of BING. In particular, AWBING Plus leads to much better DR at IoU = 0.7 and 0.8, improving the DR by about 42.5% and 40.9%, respectively, in comparison with BING (number of proposals = 1000).
For achieving more intuitive results, on the basis of those two datasets, we quantitatively analyzed the proposed method and others in terms of detection recall rate versus the number of proposals at IoU threshold equal to 0.5, for which the results are shown in Figs. 7, 10 and 12.

DR versus number of proposal windows on PASCAL VOC2007.
Figure 7 demonstrates that, on PASCAL VOC2007, the proposed algorithm is superior to other competitive methods, and can achieve the best performance when the number of proposals is sufficient. On Microsoft COCO2014, as shown in Fig. 10, the proposed method which ranks second is slightly superior to MCG, which is consistent with Table 3.
The qualitative results on ILSVRC2013 are shown in Fig. 12. The results of the proposed method outpace most competing methods. In the first 400 proposals, the DR of AWBING Plus method increases significantly. This result shows that the proposed method is an effective method for generic object proposal generation.
The runtime efficiency is displayed in the final column of Table 1. The computational time per each image of AWBING Plus is just 0.3, by using a personal computer (Intel(R) Core i7-5930K). Such runtime efficiency as well as its high DR, remarkably surpass other well-known methods [24].
Comparing feedback to fast region-based convolutional neural network (R-CNN)
In order to further validate the performance of the proposed algorithm, the proposals generated by different methods were fed back to the fast R-CNN [21]. As displayed in Table 2, it can be seen that the mean average precision (MAP) of the proposed method is close to the others, and the average precisions (APs) of AWBING Plus in some classes are comparable with the performance of the best algorithms. This demonstrate that the idea of combination AWBING Plus with R-CNN is feasible and effective.
Comparison of AP (%) and MAP (%) for different detection proposal methods within fast R-CNN on PASCAL VOC2007
Comparison of AP (%) and MAP (%) for different detection proposal methods within fast R-CNN on PASCAL VOC2007
Comparison of DR for different detection proposal methods on Microsoft COCO2014
Comparison of DR for different detection proposal methods on Microsoft COCO2014.
Based on the a forementioned qualitative results, the proposed method is able to generate generic object proposals compared to the other advanced methods.
Figure 8 illustrates the proposals of AWBING Plus and other methods on the basis of PASCAL VOC2007. Compared with other methods, AWBING Plus can capture the salient object when the number of proposals is 1. In both cases where the number of proposals is 5 or 10, the proposed method could achieve a satisfactory result for generic object proposal generation compared with the others.

Pedestrian detecting results of evaluated algorithms on PASCAL VOC2007. Each row shows the proposals for different object detectors. The first row shows 1 proposal, the second row shows 5 proposals, and the last row shows 10 proposals.

DR versus IoU curve between different proposal generation methods on Microsoft COCO2014.

Recall versus number of proposal windows on Microsoft COCO2014.

DR versus IoU curve between different proposal generation methods on ILSVRC2013.

Recall versus number of proposal windows on ILSVRC2013.
This paper has proposed the AWBING Plus algorithm for generic object proposal generation. To benefit from the similarities between SVM and WELM, it was attempted to replace the SVM with WELM in Stage I of BING, there by leading to enhancing the filter ability. In order to reduce the influence of outliers, an Adaptive WELM algorithm was developed, combining minimum reconfiguration error and the information theory. In addition, the MTSE was used as the post-processing stage to enhance the proposal localization quality. By means of in-depth analysis of the impact of each parameter on the performance, the proposed AWBING Plus algorithm gained the optimal parameters and remarkably enhanced the performance in comparison with BING and other advanced object detection methods. The proposed generic object detector has been validated through experimental evaluations on Pascal VOC2007, Microsoft COCO2014 and ILSVRC2013.
Our next goal will be to combine the proposed method with a simple linear iterative clustering (SLIC) algorithm to further accelerate the speed of detection. It will be significant to adjust the weight matrix V so as to reduce the impact of the outliers on the performance of the algorithm.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Acknowledgments
This work is supported in part by the Natural Nature Science Foundation of China (No. 61563043, 61751215).
