AWBING plus algorithm for generic object proposal generation

Abstract

Binarized normed gradients (BING) can be utilized as a preprocessing step for generic object proposal generation, and has attracted great attention because of its fast running and appropriate generalization performance. Recently, although some modified schemes were presented to improve the proposal localization quality, the mechanism of enhancing the performance is still an open problem. In this paper, Adaptive weighted binary normed gradients plus (AWBING Plus) algorithm is proposed, based on the BING method, which replaces the support vector machine (SVM) with adaptive weighted extreme learning machine (Adaptive WELM) to reduce the number of proposals, as well as comparable performance, by using the multi-thresholding straddling expansion (MTSE) as the post-processing stage to enhance the localization quality. We explain the methodology of WELM applied to BING, and analyzed the effect of the improved WELM algorithm, which is named Adaptive WELM. The experimental results from PASCAL VOC2007, Microsoft COCO2014 and ILSVRC2013 show that the proposed approach achieved superior performance compared with other advanced methods on generic object proposal generation, and it runs faster as well.

Keywords

generic object proposal generation imbalanced data BING WELM

1 Introduction

Object detection, as a fundamental research in computer vision, has recently received great attention and made significant progress in many applications such as image procession [11, 18], feature extraction [12, 20] and object recognition [19, 24]. The most advanced object detectors require the categorization of specific classifiers which score the sliding windows in featured image pyramids during the testing phase. However, humans typically perceive objects before discrimination because of human visual and cognitive systems [6, 52]. Likewise, it is reasonable for a detector to detect objects before species identification. Generic object proposal generation, which has significant application in preprocess of detection, has remarkably motivated scholars to do research in this field [1 , 39]. These methods aim to evaluate how likely an image window contains an object regardless of object class and offer a bounding box, framing the object with high rate of overlap between the ground-truth and predicted bounding boxes. In recent years, it has been demonstrated that generic object proposal generation, as a data preprocess step, can effectively reduce the memory and computation cost for the purpose of object detection [9].

Within the current state of the algorithms for generic object detection [1 , 51], binarized normed gradients (BING) [10] is a robust method because of its outstanding performance and efficiency. The intuitions of foregrounds in the image is that objects, which are stand-solely things with well-defined closed boundaries, containing strong similarity when observing the norm of the gradient (NG) feature even if the size of the object windows turns to 8×8 pixels. BING takes the advantage of the finding and uses the 64D NG features in two-stage cascaded support vector machines (SVMs) to estimate all possible objects in an image. In order to speed up the algorithm, it is attempted to employ binarization of NG features and the model W at Stage I, requiring only a few bitwise operations to efficiently achieve their inner product. Consequently, BING is widely considered less complex and more computationally efficient, while still being effective for capturing the generic objects.

Following up the study, several scholars have studied and developed the BING approach. For instance, Zhao et al. [60], from the perspective of combinational geometry, demonstrated the validity of BING. In addition, they also attempted to predict the potential target area by evaluating the hot position and hot scale of objects in databases. Liang et al. [34] used BING as a base-method, and then trained an adaptive object proposal generation for tracking. To the best of our knowledge, however, the existing BING’s research involves no discussion from performance point of view. Additionally, it was revealed that outliers may affect the accuracy of the models’ fitting and even provide some false information, which is not considered by others.

In this paper, a novel approach, i.e. adaptive weight binary normed gradients plus (AWBING Plus), is proposed to overcome the current deficiencies in BING. The main contribution of this paper is presented as follows. First, it attempts to analyze the performance of Stage I and explain in detail the invalidation of model W. Second, the SVM in BING is replaced with the weighted extreme learning machine (WELM) [62] and binary approximate input weights of WELM. Third, an adaptive WELM algorithm is proposed to alleviate the impact of outliers on generic object estimation performance. Eventually, the excellent performance of the proposed method is shown for generic object proposal generation.

The remainder of this paper is organized as follows. Section 2 reviews previous studies on object proposal generation. In Section 3, the BING is fully analyzed. The proposed algorithm is explained in Section 4. The experimental simulation and results are presented in Section 5. Finally, conclusions and the next objectives are described in Section 6.

2 Background

In object detection, extracting the main features of a special category with different strategies (e.g., color, texture, and structure) was of great importance. Selective search [50] was proposed by Uijlings et al. which was used in object recognition by the similarity measurement on local color regions. Gai et al. proposed improved quaternion wavelet transform methods [18, 19] to extract the features of texture. A high detection rate was obtained by using a complex classifier [11 , 57]. For a more accurate description of category information, some algorithms also include feature information such as parts and locations [16, 17]. Although the detection rate of specific categories was significantly improved, it also reduced the efficiency because of the use of a complex classifier or the characteristics of samples with massive information. In addition, most object detectors algorithms still use the sliding window strategy. The number of windows can reach 10⁶∼ 10⁷ because the windows of different aspect ratios and scales are used in the image pyramid. Therefore, the mechanism of effectively reducing the number of candidate windows is an important problem, and should be seriously studied.

Alexe et al. [1, 2] first proposed a measure of generic object, which first represented the window with a combination of multiple visual cues, and then determined whether the window contains foreground objects according to the Bayesian framework. Their research could reduce the number of windows in the subsequent process, and the superpixel-based image segmentation proposed in their article is still widely used with different grouping criteria. Uijlings et al. [50], for instance, proposed a data-driven selective search method for grouping various characteristics including superpixels by a hierarchical grouping and an exhaustive search method. In order to improve the performance of image segmentation, Yanulevskaya et al. [56] merged the superpixels at each stage of a training random forest. Sang et al. [40] proposed a modified selective search method which generated object proposals on three-dimensional (3D) indoor image by cost function-based segment grouping and depth segmentation. Similar to the selective search method, randomized prim’s algorithm [37] calculated the similarity of adjacent sub-regions as well. The difference is attributed to the seeds of graph-cut, which were randomly selected. The introduction of grab-cut achieved the foreground-background segmentation, and avoided the iteration of multi-feature similarity in adjacent sub-regions. After that, a large number of seed-based grab-cut extension algorithms have emerged. Carreira et al. [7] attempted to predict the seeds by a Gaussian mixture model and grab-cut. Endres et al. [13] proposed a hierarchical merging algorithm based on graph-cut technique. Krähenbüh et al. [31] indicated the seeds by using the rank of SVM, the rapid geodesic distance transformation, and generation of proposals through a dynamic programming algorithm. In addition, there are other segment-based methods to generate proposals. Humayun et al. [28] proposed the generation of a segment pool with superpixel graph. Arbelaez et al. [3] proposed a multiscale hierarchical segmentation algorithm without parameter learning. Although these generic object proposal generations on the basis of segmentation are capable of achieving pleasurable localization quality, how to effectively develop the method continually perplexes the researchers.

Along with previous studies [1, 2], Rahtu et al. [42] proposed a novel approach to rank the candidate’s scores of windows through the structured learning method, which was based on the rapid cascade framework. This method achieved better detection rate as well as lower computational cost. After this, the researches on generic object proposal generation were mostly conducted based on the multi-cascade model. Based on Rahtu et al. ’s algorithm, Blaschko et al. [5] improved the performance of the general object generator by optimizing the non-maximal suppression algorithm. The regionlet-based method [51] was proposed to handle deformation by spatial relationships among all groups of sub-regions, which was learned by a cascaded boosting classifier. Cheng et al. [10] proposed the BING algorithm to efficiently produce object boxes in sliding widows. The computational efficiency of BING surpassed several state-of-the-art methods. Zhang et al. [58, 59] extended the work of BING by the cascaded ranking SVMs to generate a set of proposals based on the gradient features in the sliding windows. In order to improve the localization quality, they utilized the regression algorithm to readjust the candidate window, and obtain a more favorable performance with the GPU acceleration of the multi-thresholding straddling expansion (MTSE)[54] algorithm. Numerous approaches have been presented to enhance the performance of BING, yet none of them has theoretically analyzed BING.

The extreme learning machine (ELM) [25, 26] is widely used in classification, regression, and feature learning because of its efficient generalization performance as well as minimal manual intervention. The ELM algorithm was improved, (in particular as WELM) [62], and is especially appropriate for imbalanced data learning. The generic object proposal generation belongs to the field of imbalanced data learning. From an algorithmic point of view, the inner product between the hidden layer output function and output weights is similar to that of SVM. In particular, the weight vector in the hidden node of WELM can be binarized as well. These together provide the possibility of replacing SVM with WELM in BING.

With the popularization of deep learning [21 , 48], more and more new models have been applied to object recognition [8, 33]. However, the study of traditional generic object proposal generation is still of great significance and is very meaningful for convolutional networks [35, 53].

3 BING analysis

As mentioned earlier, the efficiency of BING surpasses all existing advanced generic object proposal algorithm. The methodology of BING is to train two-stage cascaded linear filters using linear SVM, which is known as LIBLINEAR [15]. In the first stage, a single filter W ∈R⁶⁴ was learned, and evaluated the binarized NG of the training samples. In the second stage, the group of filters with different size was obtained by filter scores achieved in the first stage and corresponding samples labeling. Binary filter W can reduce the computational time of inner product of W and NG features in the Stage I.

In order to gain deep insight into the BING algorithm, DR-#WIN [2] was analyzed in Stage I on the VOC2007 test set. However, before that, the analysis of sample distribution was required. In a binary classification problem, negative samples normally comprise the majority of the set, and few positive samples are contained. Of 1×10⁵ training samples in BING, it was led to achieve around 7×10⁴ positive samples. In other words, the positive samples account for the vast majority in the training set. This is not in line with the sample distribution of imbalanced data. In order to gain an in-depth analysis, multiple metrics were used to evaluate the performance of BING. The filter W ∈R⁶⁴ of BING was learned from L2-loss linear SVR supported by LIBLINEAR. As shown in Fig. 1, the DR-#WIN of BING in Stage I is 82.54%. Under the same training samples, we utilized the L2-loss linear SVM, which was supported by LIBLINEAR, to classify the test samples, thereby enabling a user to evaluate the performance of the BING algorithm by other metrics. The test samples, extracted from the VOC2007 test set, contained 7.3673×10⁴ positive and 2.6327×10⁴ negative samples. As illustrated in Fig. 2, the accuracy and sensitivity of BING in Stage I were 73.67% and 100%, respectively. However, the specificity of the algorithm is 0.0001%. The results demonstrated that all of the test samples were classified as positive. The inference from the experiments is that a single model W ∈R⁶⁴ may not have the ability to filter the test samples.

Fig.1

The DR-#WIN results of performance in Stage I with the aim of comparison between BING and WELM and AWELM on Pascal VOC2007. (a) AWBING Plus, (b) BING, (c) Selective Search, (d) EdgeBoxes, (f) Endres.

Fig.2

The performance in Stage I on Pascal VOC2007.

This surprising insight remind us what we can do to address the issue of BING. Among machine learning methods, WELM arouses the most interest from us. In terms of optimization of view, the essence of WELM is that the hidden neuron output matrix is determined by the KKT theorem for training data and input weight. This optimal process is similar to SVM. Furthermore, the WELM [62] indeed outperforms the SVM in dealing with the classification of imbalanced data [23]. ELM has attracted great attention due to its better regression and classification performance at extremely fast learning speeds. However, WELM was proposed on the basis of ELM with superior generalization performance for imbalanced data, while maintaining high efficiency.

4 Research method

In this section we describe our method for finding generic object proposals. During the training of the proposed algorithm, we begin by replacing SVM with WELM (Subsection 4.1) in Stage I of BING, so as to increase the quantity of filters, which results in improved generalization performance (Subsection 4.2.1). Next, we compute the weight-matrix V of WELM by adaptive WELM so as to ensure that the separating boundary is not affected by outliers (Subsection 4.2.2). Finally, we use multi-thresholding straddling expansion (MTSE) as the post-processing stage in testing to improve and lower the number of proposals, and thus further improve the localization quality (Subsection 4.2.3).

4.1 WELM algorithm

The ELM algorithm is based on single-hidden layer feed-forward neural networks (SLFNs), in which the input weight vectors and biases of hidden nodes do not need to be turned by gradient descent.

Given N training samples (x_i, t_i), where x_i =[x_i1, x_i2, …, x_in] ^T ∈ Rⁿ and t_i = [t_i1, t_i2, …, t_im] ^T∈R^m. The relationship between the inputs and outputs of SLFNs is

$H η = T$ (1)

where η, T, and H are the output weight, binary labels, and hidden output matrix, respectively. H can be represented as $\begin{matrix} H (w_{1}, \dots, w_{L}, d_{1}, \dots, d_{L}, x_{1}, \dots, x_{N}) \\ = [\begin{matrix} H_{1} \\ ⋮ \\ H_{N} \end{matrix}] \\ = {[\begin{matrix} G (w_{1} \cdot x_{1} + d_{1}) & \dots & G (w_{L} \cdot x_{1} + d_{L}) \\ ⋮ & \dots & ⋮ \\ G (w_{1} \cdot x_{N} + d_{1}) & \dots & G (w_{L} \cdot x_{N} + d_{L}) \end{matrix}]}_{N \times L} \end{matrix}$ (2)w_i = [w_i1, w_i2, …, w_in] ^T and d_i denote the input weight and bias of the ith hidden node respectively, and G (·) is the activation function.

After w_i and d_i have been randomly assigned, Equation (1) can be solved by the least square method [4] or optimization theory [47, 62] as $\hat{η} = H^{†} T = {\begin{matrix} H^{T} (\frac{1}{C_{1}} + {HH}^{T})^{- 1} T, N < L \\ (\frac{1}{C_{1}} + H^{T} H)^{- 1} H^{T} T, N > L \end{matrix}$ (3) where C₁ is a small constant, and H^† is the Moore-Penrose generalized inverse of the Jacobian matrix.

The WELM assigns weight to each training sample. In WELM, Equation (2) is represented as $\begin{matrix} VH = [\begin{matrix} v_{1} & 0 \\ ⋱ \\ 0 & v_{N} \end{matrix}] [\begin{matrix} H_{1} \\ ⋮ \\ H_{N} \end{matrix}] \\ = [\begin{matrix} v_{1} G (w_{1} x_{1} + d_{1}) & \dots & v_{1} G (w_{L} x_{1} + d_{L}) \\ ⋮ & \dots & ⋮ \\ v_{N} G (w_{1} x_{N} + d_{1}) & \dots & v_{N} G (w_{L} x_{N} + d_{L}) \end{matrix}] \end{matrix}$ (4)

According to the optimization method, Equation (4) is expressed as $\hat{η} = {\begin{matrix} H^{T} ({\frac{1}{C}}_{1} + {VHH}^{T})^{- 1} VT, N < L \\ {(\frac{1}{C_{1}} + H^{T} VH)}^{- 1} H^{T} VT, N > L \end{matrix}$ (5)

4.2 The proposed algorithm

4.2.1 Binarized filters for WELM

As mentioned earlier, the optimal solution method for WELM is consistent with SVM. From an algorithmic point of view, the process of getting filter score in WELM is similar to that of SVM in BING. Surprisingly enough, the performance of WELM, which replace SVM in Stage I of BING, is superior to SVM used in Stage I. As illustrated in Figs. 1 and 2, in Stage I, the DR-#WIN and accuracy of WELM is on par with SVM. However, the specificity of WELM is ticking up at 64.36%, and the sensitivity of WELM is up to 85.42%. This can be justified in that the model of W in the Stage I of BING is only one filter which has limited ability to evaluate all objects. This is not the case in WELM, where multiple input weights in hidden layer can be served as filters. Selecting the proper number of hidden neurons in the neuron network can effectively improve the generalization ability of the system and reduce the error rate [46]. WELM is precisely this neuron network in which multiple input weights in hidden layer can be served as filters. Therefore, WELM provides improved generalization performance for imbalance learning.

The crux of the method is how to implement binarization the variables of the inner product. The inner product of input weight w and sample x are the main activities when WELM is applied to BING. For a superior performance, sample x needs to be normalized [27]. Then, w and x could be binary approximations, which is proposed in [22]: $w = \sum_{j = 1}^{N_{basis}} β_{j} a_{j}$ (6)

$x = \sum_{j = 1}^{N_{basis}} α_{j} b_{j}$ (7) where N_basis is the number of the basis vector, α_j, β_j ∈ R are coefficients, $a_{j}^{+}, b_{j}^{+} \in {0, 1}^{N_{basis}}$ are basis vectors, and $a_{j}^{+}, b_{j}^{+} \in {0, 1}^{N_{basis}}$ satisfy the criteria as $a_{j} = a_{j}^{+} - \bar{a_{j}^{+}}$ (8) $b_{j} = b_{j}^{+} - \bar{b_{j}^{+}}$ (9)

After this, the binary inner product of w and x is written as

$\begin{matrix} 〈 w, x 〉 & = \sum_{i = 1}^{N_{basis}} β_{i} 〈 a_{i}, \sum_{j = 1}^{N_{basis}} α_{j} b_{j} 〉 \\ = \sum_{i = 1}^{N_{basis}} β_{i} \sum_{j = 1}^{N_{basis}} α_{j} 〈 a_{i}, b_{j} 〉 \end{matrix}$ (10) where

$\begin{matrix} 〈 a_{i}, b_{j} 〉 & = 〈 a_{i}^{+}, b_{j} 〉 - 〈 \bar{a_{i}^{+}}, b_{j} 〉 \\ = 2^{2} 〈 b_{j}^{+}, a_{i}^{+} 〉 - 2^{1} [| a_{i}^{+} | + | b_{j}^{+} |] + 2^{6} \end{matrix}$ (11)

As a result, the solution of Equation (11) is $〈 w, x 〉 = \sum_{i = 1}^{N_{a}} β_{i} \sum_{j = 1}^{N_{a}} α_{j} S_{ij}$ (12) where $S_{ij} = [2^{2} 〈 b_{j}^{+}, a_{i}^{+} 〉 - 2^{1} [| a_{i}^{+} | + | b_{j}^{+} |] + 2^{6}]$ canbe computed by bitwise and POPCNT64.

4.2.2 Adaptive WELM algorithm

Zong et al. [62] defined a weight v_i for each training sample x_i. Sample x_i originates from a minority class, and the corresponding weight v_i will be greater than others belonging to the majority class. The main goal is to push the boundary toward the majority class, not the minority class. However, it is unreasonable in practice to allocate the same weights for congener samples. For example, outliers in the positive sample set, as shown in Fig. 3(a), do not have enough representations. However, because of the outliers with the larger weights, the separating boundary can be partitioned by as many as the negative samples into the minority side. Mathematically speaking, the classification result can be explained referring to Equation (5). If the user is more concerned with the minimization of training error, then WELM must reduce the requirement of the distance between classes. As a consequence, the overall distribution of the various class is overlooked. To give the outliers less weight, in this paper an adaptive WELM algorithm is proposed to rationally separate all of the imbalanced data, which combines the minimum reconfiguration error and information theory [38, 55], so as to adjust the weight of every training sample. From a mathematic perspective, our method still searches the separating boundary to maximize the distance between classes while minimizing the mean square error. Information entropy is introduced in the adaptive WELM to effectively show the centralization of data with the same class. Outliers are found by the information entropy, and their weights are reduced. Fig. 3(b) illustrates the boundary that separates the samples in the middle of the categories. It can be observed that the separating boundary moves toward the middle instead of the outliers of minority. Therefore, our adaptive WELM algorithm lowers the ability of outliers to disturb the overall distribution of data.

Fig.3

Scatter diagram with the separating boundary. (a) WELM, (b) AWELM.

Assume that the hidden output vectors H_i (i = 1, …, N) are zero-meaned. Define matrix B = H^TV²H, λ₁ > ⋯ > λ_k (k < = N) as the eigenvalues of B, with corresponding eigenvectors U₁, …, U_k. Our aim is to solve the diagonal matrix V by minimizing the reconstruction error between VH and its reconstructionVHU^TU, and the entropy of V, which leads to the following formulation

Minimize: $L_{P} = \sum_{i = 1}^{N} v_{i} {∥ H_{i} - H_{i} U^{T} U ∥}^{2} + C_{2} \sum_{i = 1}^{N} v_{i} log (v_{i})$ $Subjected to : U^{T} U = I_{L}, \sum_{i = 1}^{N} v_{i} = 1$ (13) where $V = [\begin{matrix} v_{1} & 0 \\ ⋱ \\ 0 & v_{N} \end{matrix}]$ , U = [U₁, …, U_k], andC₂ is the regularization parameter. Under Karush– Kuhn– Tucker (KKT) conditions, the optimization problem is turned into

$\begin{matrix} L_{D} & = \sum_{i = 1}^{N} v_{i} {∥ H_{i} - H_{i} U^{T} U ∥}^{2} \\ + C_{2} \sum_{i = 1}^{N} v_{i} log (v_{i}) + α_{1} (U^{T} U - I_{N}) \\ + α_{2} (\sum_{i = 1}^{N} v_{i} - 1) \end{matrix}$ (14)

To calculate $\frac{\partial}{\partial v_{i}} L_{D}$ , U is assumed as a fixed value and the ordinary derivative is calculated with respect to v_i. Set Ω = ∥ H_i - H_iU^TU ∥ ², and the Equation (15) is expressed as $v_{i} = \frac{e^{- \frac{Ω}{C_{2}}}}{\sum_{i = 1}^{N} e^{- \frac{Ω}{C_{2}}}}$ (15)

Consequently, the U is altered with the v_i update, and the v_i is optimized with the U modification. The iteration leads to an optimum solution for v_i. The proposed scheme is outlined in Algorithm 1.

It is known that the higher value of k (k < = N), the lower the reconstruction error of VH - VHU^TU. When k is equal to N, however, regardless of the adjustment of V, the value the reconstruction error is zero. Therefore, we set the k at N-1.

The initial weight scheme of V is given by the principle of the outlier with less weight. This means that the value of v_i is inversely proportional to its variance. It can also meet earlier rules so that the weights of most samples from the minority class are larger than others belonging to the majority class. Therefore, the initial weight of sample x_i is $v_{i} = δ \cdot \frac{e^{- {∥ H_{i} - \bar{H} ∥}^{2}}}{N_{C}}$ (16) where N_C is the number of samples within the class to which x_i belongs, and δ is a user-defined parameter as mentioned in [62].

Algorithm 1 Adaptive WELM

Input: Zero-meaned vectors {H_i}, Threshold ɛ ≥ 0,

Maximum iterations T ≥ 0

Output: weight-matrix V

1: generate initial the weight-matrix V

2: calculate the eigenvectors U_j corresponding the first k

eigenvalues of the matrix B = H^TV²H

3: evaluate the reconstruction error

\sum = \frac{1}{N} \sum_{i = 1}^{N} v_{i} {∥ H_{i} - H_{i} U^{T} U ∥}^{2}

4: while (∑ > ɛ) or (t < T)

5: replace v_i by the new solution

6: calculate the matrix B

7: calculate the eigenvectors U_j of matrix B

8: if (∑ > Threshold)

9: continue

10: end

11: end

12: return V

4.2.3 AWBING Plus algorithm

In this study, it was attempted to propose the AWBING algorithm, replacing SVM with Adaptive WELM and binaries of the inner product of the input weights and the samples. As shown in Figs. 1 and 2, AWBING retains the DR-#WIN and accuracy, while significantly improving the specificity.

Inspired by [58], the multi-thresholding straddling expansion (MTSE) [54] is utilized as the post-processing stage to further improve the detection rate (DR). The experimental results in the Section 5 also shows that the combination of AWBING and MTSE, which was named AWBING Plus, improves the DR significantly.

5 Experimental process and analysis results

In this section, we first introduce the experimental datasets, and discuss the parameter settings. Finally, we analyze comparisons of the proposed AWBING Plus method with BING and several other advanced algorithms. All of the experimental results originate from the proposed algorithm which is performed 10 times independently and the average value is obtained.

5.1 Description of datasets

Experiments are conducted to evaluate the performance of the proposed algorithm on three datasets:

PASCAL VOC2007 [14] is taken as standard into account in the field of object detection and identification and is currently the most frequently used dataset in the research area of target recognition. It contains 20 object categories. The training set includes 2501 images with 6301 objects. The testing set contains 4952 images with 12032 objects. In the training process, we choose the examples without “difficult” annotation.

Microsoft COCO2014 [36] is also a widely accepted object dataset, which is used in comparisons and analysis of performance. Microsoft COCO2014 contains 91 proposed categories. The dataset contains 2.5 million bounding boxes in 3.28 ×105 images. Due to wider categories and greater number of instances, Microsoft COCO2014 has become the main object dataset in object detection.

ILSVRC2013, a subset of ImageNet2013[45], is used in the visual recognition challenge. ImageNet2013 is used in the study of object recognition, and is currently the largest visual database in the world. Stanford University computer scientists Fei-Fei Li et al. built this project which mimics human recognition. ILSVRC has been hosted eight times since 2010. It is now acknowledged as a benchmark in the visual database for measuring the performance of classifiers. ILSVRC2013 contains 200 basic-level categories. The training set includes 395,909 images with 345,854 objects. The validation set contains 19,680 images.

The proposed model is compared with eight advanced algorithms namely BING [10], EdgeBoxes [61], Endres [13], multiscale combinatorial grouping (MCG) [3], Objectness [2], RandomlizedPrims [37], Rantalankila [43], and Selective Search [50].

5.2 Parameter settings

The parameter settings are evaluated on the basis of PASCAL VOC2007, and the Intersection over Union (IoU) threshold is presented at 0.5. In the AWBING algorithm, there are the following parameters: the threshold ɛ, and the maximum iterationsT, C₁, and C₂ We fixed the parameters ɛ = 0.05 and T = 6 in the experiments which could operate well. According to the method developed in [62], the constant C₁, was set to 2¹⁰. In this study, the parameter C₂ is tuned to achieve an optimal result.

In addition, different values for C₂ (2^-50, 2^-49, ... , 2⁴⁹, 2⁵⁰) are specified, and the corresponding classification performance is shown in Fig. 4. It is evident that the performance is significantly improved when C₂ reaches 2², and eventually leads to optimal performance when C₂is 2¹⁴. With the increase of C₂value, the performance shows no remarkable change. Therefore, the parameter C₂ in AWBING is set to 2¹⁴.

Fig.4

The performance in Stage I with different choices of C2 for AWBING.

5.3 The number of proposals comparison

In this study, the proposed method is compared with BING. Fig. 5 shows the number of proposals per each test sample on PASCAL VOC2007. As illustrated in Fig. 5, AWBING has the ability to remarkably reduce the number of proposals. The reason for this is that multiple filters of AWBING come together to evaluate the samples, demonstrating that the proposed method is more applicable to evaluating generic object proposals. When MTSE is used as the post-processing stage in the proposed method, the number of proposals significantly decreases. Importantly, the generic object proposal detection rates of AWBING and AWBING Plus are excellent when the number of proposals is limited (about 1000), which is testified in subsequent experiments.

Fig.5

The number of proposals of each test sample on PASCAL VOC2007.

5.4 Comparing DR vs. IoU overlap threshold

The performance of DR and IoU overlap threshold were compared, and presented in Figs. 6, 9 and 11, which are tested on the PASCAL VOC 2007, Microsoft COCO2014 and ILSVRC2013. In addition, their DR at exact IoU as well as the number of proposals are shown in Tables 1, 3 and 4, respectively.

Fig.6

Recall versus IoU curve in different methods on PASCAL VOC2007.

Table 1

Comparison of DR values between different proposal generation methods on PASCAL VOC2007

Method	The number of proposals at IoU=0.5				The number of proposals at IoU=0.7				The number of proposals at IoU=0.8				Time
	1	10	100	1000	1	10	100	1000	1	10	100	1000	(sec)
BING[10]	11.01	26.65	59.36	85.97	4.5	11.93	15.51	24.74	2.3	5.24	6.81	7.85	0.07
EdgeBoxes[61]	10.78	30.65	58.03	85.6	5.69	19.56	30.24	71.71	3.53	12.83	28.48	44.53	0.3
Endres[13]	13.74	40.54	70.25	81.79	7.64	24.19	33.52	59.86	4.8	14.98	27.56	45.45	100
MCG[3]	11.71	38.86	70.18	89.82	6.28	21.64	34.86	70.99	4	13.6	34.11	55.98	30
Objectness[2]	10.44	34.07	60.42	81.02	4.54	15.63	22.32	34.36	2.21	6.32	9.07	9.86	3
RandomlizedPrims [37]	7.89	23.77	51.7	80.68	3.49	11	19.08	57.32	1.85	6.05	18.74	29.11	1
Rantalankila [43]	0.25	2.65	33.58	81.15	0.07	0.82	2.94	60.24	0.02	0.3	8.19	45.97	10
Selective Search [50]	6.24	25.72	56.04	83.53	2.52	13.01	22.62	64.14	1.45	7.94	23.7	35.32	10
AWBING	12.23	32.85	63.67	87.84	5.9	17.4	30.11	40.41	3.42	9.58	16.08	24.38	0.12
AWBING Plus	11.65	37.75	69.1	89.74	6.12	20.8	44.99	66.78	3.85	12.84	31.63	51.6	0.3

As illustrated in Fig. 6, VOC2007, AWBING and AWBING Plus all achieve satisfactory DR whether the number of proposals is small or large. The DR of AWBING exceeds that of BING, and AWBING Plus is close to the best detector. Particularly, the DR of AWBING Plus is much more pronounced when the number of proposals is up to 100.

Table 1 demonstrates that AWBING Plus achieves a satisfactory result for the generic object detection when compared to others on PASCAL VOC2007. It is noteworthy that the performance of AWBING Plus is improved by about 43.75% in comparison with BING (the number of proposals = 1000, IoU threshold = 0.8).

On Microsoft COCO2014, all of the methods executed were shown to be worse than on PASCAL VOC2007 due to the complexity of Microsoft COCO2014. Despite these challenges, the proposed method is able to successfully generate the proposals. As depicted in Fig. 9, it can be seen that AWBING is competitive at DR over a large range with BING. While increasing the number of proposals, the DR of AWBING and AWBING Plus both markedly improved. When the number of proposals rises above 100, the DR of AWBING Plus issignificantly enhanced. However, there is still a narrow gap between the AWBING Plus and best algorithm among competitors when the number of proposals is 100 and 1000.

As shown in Table 3, it can be concluded that MCG and the proposed method are the two best approaches among these competitive methods. Furthermore, in IoU equal to 0.5, the proposed method outperforms the other schemes, including Endres, Randomlized Prims, Rigor, and Selective Search, in a range between 5% and 12%.

Figure 11 illustrates DR when varying IoU thresholds for the different numbers of object proposals for ILSVRC2013. It is observed that the performance of most of methods on ILSVRC is between that of PASCAL VOC 2007 and Microsoft COCO2014. Both the DRs of AWBING and AWBING Plus achieve good results across a variety of IoU at different numbers of proposals. In particularly, when the number of proposals is 100 and 1000, the DRs of AWBING Plus, compared with original BING, lead to a significant improvement from IoU = 0.6 to 0.8.

Table 4 summarizes the DR comparison results on ILSVRC2013 using the different numbers of proposals at the IoU threshold of 0.5, 0.7, 0.8. The highest DR for each competing method is made at the IoU threshold = 0.8 and the number of proposals = 1000. However, no competing method exceeds 84%, except for MCG, AWBING and AWBING Plus. AWBING achieves a DR of 86.49%, and AWBING Plus reaches 87.22%, which is second only to MCG. In general, the performance of AWBING and AWBING Plus exceed that of BING. In particular, AWBING Plus leads to much better DR at IoU = 0.7 and 0.8, improving the DR by about 42.5% and 40.9%, respectively, in comparison with BING (number of proposals = 1000).

5.5 Comparing DR vs. the number of proposals

For achieving more intuitive results, on the basis of those two datasets, we quantitatively analyzed the proposed method and others in terms of detection recall rate versus the number of proposals at IoU threshold equal to 0.5, for which the results are shown in Figs. 7, 10 and 12.

Fig.7

DR versus number of proposal windows on PASCAL VOC2007.

Figure 7 demonstrates that, on PASCAL VOC2007, the proposed algorithm is superior to other competitive methods, and can achieve the best performance when the number of proposals is sufficient. On Microsoft COCO2014, as shown in Fig. 10, the proposed method which ranks second is slightly superior to MCG, which is consistent with Table 3.

The qualitative results on ILSVRC2013 are shown in Fig. 12. The results of the proposed method outpace most competing methods. In the first 400 proposals, the DR of AWBING Plus method increases significantly. This result shows that the proposed method is an effective method for generic object proposal generation.

5.6 Comparing running time

The runtime efficiency is displayed in the final column of Table 1. The computational time per each image of AWBING Plus is just 0.3, by using a personal computer (Intel(R) Core i7-5930K). Such runtime efficiency as well as its high DR, remarkably surpass other well-known methods [24].

5.7 Comparing feedback to fast region-based convolutional neural network (R-CNN)

In order to further validate the performance of the proposed algorithm, the proposals generated by different methods were fed back to the fast R-CNN [21]. As displayed in Table 2, it can be seen that the mean average precision (MAP) of the proposed method is close to the others, and the average precisions (APs) of AWBING Plus in some classes are comparable with the performance of the best algorithms. This demonstrate that the idea of combination AWBING Plus with R-CNN is feasible and effective.

Table 2
Comparison of AP (%) and MAP (%) for different detection proposal methods within fast R-CNN on PASCAL VOC2007

Method MAP Aero Bike Bird Boat Bus Bottle Bus Car Cat Chair Cow Table Dog Horse Person Sheep Sofa Train TV

BING [10] 55.3 56.7 63.5 55.2 36.9 72.0 24.8 71.2 70.6 69.3 30.5 50.2 35.6 62.5 73.4 58.6 52.0 46.5 65.1 56.9

EdgeBoxes [61] 64.6 63.9 72.6 67.8 53.6 83.8 35.7 83.3 78.5 81.7 32.6 60.7 36.2 78.2 76.4 66.9 62.3 51.7 73.1 67.7

Endres [13] 66.4 75.8 75.2 67.3 46.2 80.3 38.9 81.3 76.6 82.4 36.5 72.7 36.9 78.6 76.5 62.1 63.8 69.9 76.3 65.2

MCG [3] 68.5 76.0 80.3 68.3 49.9 82.5 38.2 82.5 83.2 83.1 37.5 71.9 40.1 79.6 83.3 67.4 66.4 69.5 75.2 67.5

Objectness [2] 56.1 61.1 64.4 54.8 38.2 72.4 29.3 71.8 69.3 71.2 29.6 55.9 35.9 60.5 74.2 51.2 51.4 53.9 65.0 55.9

RandomlizedPrims [37] 66.6 77.1 78.3 63.0 53.8 80.4 32.1 81.9 78.4 82.1 36.4 66.2 38.4 75.3 82.9 67.0 64.9 67.6 73.0 66.3

Rantalankila [43] 64.7 68.3 73.2 64.3 52.9 78.6 27.6 78.6 69.8 81.9 33.1 65.1 37.8 80.5 81.6 68.0 63.5 66.4 73.2 65.6

Selective Search[50] 65.8 69.4 74.5 63.2 51.9 78.5 31.5 77.9 79.0 79.9 38.9 72.0 37.5 77.3 80.2 66.3 64.5 66.2 77.6 63.2

AWBING Plus 68.4 76.9 77.9 67.4 54.6 83.0 38.4 82.3 79.3 82.9 36.9 72.4 38.7 80.1 83.7 67.3 65.9 67.9 76.5 67.1

Table 3

Comparison of DR for different detection proposal methods on Microsoft COCO2014

Method	Number of proposals at IoU=0.5				Number of proposals at IoU=0.7				Number of proposals at IoU=0.8
	1	10	100	1000	1	10	100	1000	1	10	100	1000
BING [10]	3.38	10.91	29.61	59.24	1.23	3.8	7.44	12.81	0.41	1.13	1.84	2.79
Endres [13]	4.96	18.37	42.31	62.24	2.8	10.66	24.69	42.3	1.9	6.84	15.23	31.6
MCG [3]	4.45	19.06	44.47	71.91	2.48	10.46	27.83	49.95	1.65	6.76	19.20	36.48
RandomlizedPrims [37]	2.63	9.93	27.27	55.6	1.24	4.54	14.39	34.45	0.8	2.72	8.84	22.87
Rigor[28]	3.13	8.53	23.51	58.08	1.59	5.22	13.76	39.33	1.03	3.8	9.4	28.2
Selective Search [50]	2.16	10.58	30.38	58.73	0.94	5.07	17.11	39.77	0.59	3.17	11.18	28.43
AWBING	4.09	14.24	35.28	60.57	1.93	6.86	15.14	25.97	1.07	3.68	7.82	14.78
AWBING Plus	4.13	16.75	40.73	67.62	2.12	8.43	22.14	43.68	1.37	5.44	14.95	31.16

Table 4

Comparison of DR for different detection proposal methods on Microsoft COCO2014.

Method	Number of proposals at IoU=0.5				Number of proposals at IoU=0.7				Number of proposals at IoU=0.8
Method	1	10	100	1000	1	10	100	1000	1	10	100	1000
BING	7.43	28.84	60.33	84.89	2.58	10.31	17.94	23.21	1.40	4.09	5.76	6.76
EdgeBoxes	9.77	32.59	60.44	84.87	5.16	20.60	46.63	71.30	3.31	13.42	31.02	45.52
Endres	13.45	40.10	68.41	80.86	7.77	24.27	43.74	60.87	5.29	15.77	28.32	45.40
MCG	13.96	40.28	69.32	88.04	8.13	23.77	48.15	69.47	5.48	15.88	35.07	54.46
RandomizedPrims	5.42	23.93	55.13	81.18	2.37	9.96	31.77	59.30	1.43	5.68	19.35	42.96
Rigor	7.12	24.51	49.83	80.04	3.12	15.10	31.74	60.72	2.00	10.93	22.33	46.59
SelectiveSearch	4.90	25.88	58.95	84.15	1.97	12.06	38.07	66.53	1.19	7.36	26.13	51.83
AWBING	11.16	32.94	62.16	86.49	6.24	15.31	26.10	41.77	4.19	8.66	14.35	26.18
AWBING Plus	12.22	37.14	65.28	87.22	6.81	21.68	45.41	65.76	4.52	14.14	30.93	47.69

Based on the a forementioned qualitative results, the proposed method is able to generate generic object proposals compared to the other advanced methods.

5.8 Qualitative comparison

Figure 8 illustrates the proposals of AWBING Plus and other methods on the basis of PASCAL VOC2007. Compared with other methods, AWBING Plus can capture the salient object when the number of proposals is 1. In both cases where the number of proposals is 5 or 10, the proposed method could achieve a satisfactory result for generic object proposal generation compared with the others.

Fig.8

Pedestrian detecting results of evaluated algorithms on PASCAL VOC2007. Each row shows the proposals for different object detectors. The first row shows 1 proposal, the second row shows 5 proposals, and the last row shows 10 proposals.

Fig.9

DR versus IoU curve between different proposal generation methods on Microsoft COCO2014.

Fig.10

Recall versus number of proposal windows on Microsoft COCO2014.

Fig.11

DR versus IoU curve between different proposal generation methods on ILSVRC2013.

Fig.12

Recall versus number of proposal windows on ILSVRC2013.

6 Conclusions

This paper has proposed the AWBING Plus algorithm for generic object proposal generation. To benefit from the similarities between SVM and WELM, it was attempted to replace the SVM with WELM in Stage I of BING, there by leading to enhancing the filter ability. In order to reduce the influence of outliers, an Adaptive WELM algorithm was developed, combining minimum reconfiguration error and the information theory. In addition, the MTSE was used as the post-processing stage to enhance the proposal localization quality. By means of in-depth analysis of the impact of each parameter on the performance, the proposed AWBING Plus algorithm gained the optimal parameters and remarkably enhanced the performance in comparison with BING and other advanced object detection methods. The proposed generic object detector has been validated through experimental evaluations on Pascal VOC2007, Microsoft COCO2014 and ILSVRC2013.

Our next goal will be to combine the proposed method with a simple linear iterative clustering (SLIC) algorithm to further accelerate the speed of detection. It will be significant to adjust the weight matrix V so as to reduce the impact of the outliers on the performance of the algorithm.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Acknowledgments

This work is supported in part by the Natural Nature Science Foundation of China (No. 61563043, 61751215).

References

Alexe

, Deselaers

, Ferrari

, What is an object? 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 73–80.

Alexe

, Deselaers

, Ferrari

, Measuring the Objectness of Image Windows, IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (2012), 2189–2202.

Arbeláez

, Pont-Tuset

, Barron

, Marques

, and Malik

, Multiscale Combinatorial Grouping, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 328–335.

Axelsson

, A generalized conjugate gradient, least square method, Numerische Mathematik 51 (1987), 209–227.

Blaschko

M.B.

, Vedaldi

, Zisserman

, Simultaneous object detection and ranking with weak supervision, in: Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, Curran Associates Inc., Vancouver, British Columbia, Canada, 2010, pp. 235–243.

Borji

, Itti

, State-of-the-Art in Visual Attention Modeling, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), 185–207.

Carreira

, Sminchisescu

, CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts, IEEE Trans. Pattern Anal. Mach. Intell. 34 (2012), 1312–1328.

Chen

, Lu

, Fan

, S-CNN: Subcategory-aware convolutional networks for object detection, IEEE Transactions on Pattern Analysis and Machine Intelligence PP (2018), 1–1.

Chen

, Ma

, Zhu

, Wang

, Zhao

, Boundary-aware box refinement for object proposal generation, Neurocomput 219 (2017), 323–332.

10.

Cheng

M.-M.

, Zhang

, Lin

W.-Y.

, Torr

, BING: Binarized normed gradients for objectness estimation at 300fps, in: Proceedings of the IEEE conference on computer vision and pattern recognition (2014), 3286–3293.

11.

Dalal

, Triggs

, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers Comuter Society, San Diego, CA, United states, 2005, pp. 886–893.

12.

Dollár

, Tu

, Perona

, Belongie

, Integral Channel Features, in: BMVC, London, 2009.

13.

Endres

, Hoiem

, Category-Independent ObjectProposals with Diverse Ranking, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2014), 222–234.

14.

Everingham

, Van Gool

, Williams

C.K.I.

, Winn

, and Zisserman

, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision 88 (2010), 303–338.

15.

Fan

R.-E.

, Chang

K.-W.

, Hsieh

C.-J.

, Wang

X.-R.

, Lin

C.-J.

, LIBLINEAR: A Library for Large Linear Classification, J. Mach. Learn. Res. 9 (2010), 1871–1874.

16.

Felzenszwalb

, McAllester

, Ramanan

, A discriminatively trained, multiscale, deformable part model, in: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8.

17.

Felzenszwalb

P.F.

, Girshick

R.B.

, McAllester

, Cascade object detection with deformable part models, in: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, IEEE, 2008, pp. 2241–2248.

18.

Gai

, Wang

, Yang

, Sparse representation based on vector extension of reduced quaternion matrix for multiscale image denoising, IET Image Processing 10 (2016), 598–607.

19.

Gai

, Yang

, Wan

, Employing quaternion wavelet transform for banknote classification, Neurocomputing 118 (2013), 171–178.

20.

Gai

, Yang

, Zhang

, Multiscale texture classification using reduced quaternion wavelet transform, AEU – International Journal of Electronics and Communications 67 (2013), 233–241.

21.

Girshick

, Fast R-CNN, in: The IEEE International Conference on Computer Vision (ICCV), 2015.

22.

Hare

, Saffari

, Torr

P.H.S.

, Efficient online structured output learning for keypoint-based object tracking, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), pp. 1894–1901.

23.

, Garcia

E.A.

, Learning from Imbalanced Data, IEEE Transactions on Knowledge and Data Engineering 21 (2009), 1263–1284.

24.

Hosang

, Benenson

, Dollár

, and Schiele

, What Makes for Effective Detection Proposals?, IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2016), 814–830.

25.

Huang

G.-B.

, What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle, Cognitive Computation 7 (2015), 263–278.

26.

Huang

G.-B.

, Wang

D.H.

, Lan

, Extreme learning machines: a survey, International Journal of Machine Learning and Cybernetics 2 (2011), 107–122.

27.

Huang

G.-B.

, Zhu

Q.-Y.

, Siew

C.-K.

, Extreme learning machine: theory and applications, Neurocomputing 70 (2006), 489–501.

28.

Humayun

, Li

, Rehg

J.M.

, RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 336–343.

29.

Jain

S.D.

, Bo Xiong , and Grauman

, Pixel Objectness, CoRR abs/1701.05349 (2017).

30.

Kong

, Sun

, Yao

, Liu

, Chen

, Lu

, RON: Reverse Connection with Objectness Prior Networks for Object Detection, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5244–5252.

31.

Krähenbühl

, Koltun

, in: Computer Vision – ECCV 2014:13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V, Fleet

, Pajdla

, Schiele

, and Tuytelaars

, eds., Springer International Publishing, Cham, 2014, pp. 725–739.

32.

Krizhevsky

, Sutskever

, Hinton

G.E.

, ImageNet classification with deep convolutional neural networks, in: Proceedings of the 25th International Conference on Neural Information Processing Systems, Curran Associates Inc., Lake Tahoe, Nevada, 2012, pp. 1097–1105.

33.

Kuo

, Hariharan

, Malik

, Deepbox: Learning objectness with convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2479–2487.

34.

Liang

, Pang

, Liao

, Mei

, Ling

, Adaptive Objectness for Object Tracking, IEEE Signal Processing Letters 23 (2016), 949–953.

35.

Lin

T.-Y.

, Dollár

, Girshick

, He

, Hariharan

, and Belongie

, Feature pyramid networks for object detection, arXiv preprint arXiv:1612.03144 (2016).

36.

Lin

T.-Y.

, Maire

, Belongie

, Hays

, Perona

, Ramanan

, Dollár

, and Zitnick

C.L.

, Microsoft COCO: Common Objects in Context, in: ECCV 2014:13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V Fleet

, Pajdla

, Schiele

, and Tuytelaars

, eds., Springer International Publishing, Cham, 2014, pp. 740–755.

37.

Manen

, Guillaumin

, Gool

L.V.

, Prime Object Proposals with Randomized Prim’s Algorithm, in: 2013 IEEE International Conference on Computer Vision, 2013, pp. 2536–2543.

38.

Minfang

Q.I.

, Zhongguang

F.U.

, Yuan

, Ya

M.A.

, A Comprehensive Evaluation Method of Power Plant Units Based on Information Entropy and Principal Component Analysis, Proceedings of the CSEE 33 (2013), 58–64.

39.

Nguyen

T.V.

, Salient object detection via objectness proposals, in: Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

40.

S.I.

, Kang

H.B.

, A new object proposal generation method for object detection in RGB-D data, in: 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI), 2017, pp. 000393–000398.

41.

Ošep

, Hermans

, Engelmann

, Klostermann

, Mathias

, and Leibe

, Multi-scale object candidates for generic object tracking in street scenes, in: 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 3180–3187.

42.

Rahtu

, Kannala

, Blaschko

, Learning a category independent object detection cascade, 2011 International Conference on Computer Vision, 2011, pp. 1052–1059.

43.

Rantalankila

, Kannala

, Rahtu

, Generating Object Segmentation Proposals Using Global and Local Search, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2417–2424.

44.

Ren

, He

, Girshick

, Sun

, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (2017), 1137–1149.

45.

Russakovsky

, Deng

, Su

, Krause

, Satheesh

, Ma

, Huang

, Karpathy

, Khosla

, Bernstein

, Berg

A.C.

, Fei-Fei

, ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision 115 (2015), 211–252.

46.

Sheela

K.G.

, Deepa

S.N.

, Review on Methods to Fix Number of Hidden Neurons in Neural Networks, Mathematical Problems in Engineering 2013 (2013), 11.

47.

Smith

J.M.

, Optimization Theory in Evolution, Annual Review of Ecology and Systematics 9 (1978), 31–56.

48.

Springenberg

J.T.

, Dosovitskiy

, Brox

, Riedmiller

, Striving for simplicity: The all convolutional net, arXiv preprint arXiv:1412.6806 (2014).

49.

W.C.

, He

, Yang

, Chien

S.Y.

, Real-Time Salient Object Detection with a Minimum Spanning Tree, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2334–2342.

50.

Uijlings

J.R.

, Sande

K.E.

, Gevers

, Smeulders

A.W.

, Selective Search for Object Recognition, Int. J. Comput. Vision 104 (2013), 154–171.

51.

Wang

, Yang

, Zhu

, Lin

, Regionlets for Generic Object Detection, in: 2013 IEEE International Conference on Computer Vision, 2013, pp. 17–24.

52.

Wolfe

J.M.

, Horowitz

T.S.

, What attributes guide the deployment of visual attention and how do they do it?, Nat Rev Neurosci 5 (2004), 495–501.

53.

, Li

, Meng

, Ngan

K.N.

, Generic Proposal Evaluator: A Lazy Learning Strategy Toward Blind Proposal Quality Assessment, IEEE Transactions on Intelligent Transportation Systems 19 (2018), 306–319.

54.

Xiaozhi

, Huimin

, Wang

, Zhichen

, Improving object proposals with multi-thresholding straddling expansion, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2587–2595.

55.

Yang

, meng

, Zhen

, Adaptively weighted PCA algorithm, Computer Engineering and Applications (2012), 189–191.

56.

Yanulevskaya

, Uijlings

, Sebe

, Learning to Group Objects, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3134–3141.

57.

Zhang

, Benenson

, Omran

, Hosang

, Schiele

, How Far are We from Solving Pedestrian Detection? in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1259–1267.

58.

Zhang

, Liu

, Chen

, Zhu

, Cheng

M.-M.

, Saligrama

, Torr

P.H.

, Sequential Optimization for Efficient High-Quality Object Proposal Generation, IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).

59.

Zhang

, Torr

P.H.S.

, Object Proposal Generation Using Two-Stage Cascade SVMs, Machine Intelligence 38 (2016).

60.

Zhao

, Liu

, Yin

, Cracking BING and Beyond, in: Proceedings British Machine Vision Conference, Valstar

, French

, and Pridmore

, eds., BMVA Press, 2014.

61.

Zitnick

C.L.

, Dollár

, Edge Boxes: Locating Object Proposals from Edges, in: Computer Vision – ECCV 2014:13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V, Fleet

, Pajdla

, Schiele

, and Tuytelaars

, eds., Springer International Publishing, Cham, 2014, pp. 391–405.

62.

Zong

, Huang

G.-B.

, Chen

, Weighted extreme learning machine for imbalance learning, Neurocomput. 101 (2013), 229–242.

Method	MAP	Aero	Bike	Bird	Boat	Bus	Bottle	Bus	Car	Cat	Chair	Cow	Table	Dog	Horse	Person	Sheep	Sofa	Train	TV
BING [10]	55.3	56.7	63.5	55.2	36.9	72.0	24.8	71.2	70.6	69.3	30.5	50.2	35.6	62.5	73.4	58.6	52.0	46.5	65.1	56.9
EdgeBoxes [61]	64.6	63.9	72.6	67.8	53.6	83.8	35.7	83.3	78.5	81.7	32.6	60.7	36.2	78.2	76.4	66.9	62.3	51.7	73.1	67.7
Endres [13]	66.4	75.8	75.2	67.3	46.2	80.3	38.9	81.3	76.6	82.4	36.5	72.7	36.9	78.6	76.5	62.1	63.8	69.9	76.3	65.2
MCG [3]	68.5	76.0	80.3	68.3	49.9	82.5	38.2	82.5	83.2	83.1	37.5	71.9	40.1	79.6	83.3	67.4	66.4	69.5	75.2	67.5
Objectness [2]	56.1	61.1	64.4	54.8	38.2	72.4	29.3	71.8	69.3	71.2	29.6	55.9	35.9	60.5	74.2	51.2	51.4	53.9	65.0	55.9
RandomlizedPrims [37]	66.6	77.1	78.3	63.0	53.8	80.4	32.1	81.9	78.4	82.1	36.4	66.2	38.4	75.3	82.9	67.0	64.9	67.6	73.0	66.3
Rantalankila [43]	64.7	68.3	73.2	64.3	52.9	78.6	27.6	78.6	69.8	81.9	33.1	65.1	37.8	80.5	81.6	68.0	63.5	66.4	73.2	65.6
Selective Search[50]	65.8	69.4	74.5	63.2	51.9	78.5	31.5	77.9	79.0	79.9	38.9	72.0	37.5	77.3	80.2	66.3	64.5	66.2	77.6	63.2
AWBING Plus	68.4	76.9	77.9	67.4	54.6	83.0	38.4	82.3	79.3	82.9	36.9	72.4	38.7	80.1	83.7	67.3	65.9	67.9	76.5	67.1