An adaptive active learning algorithm with informativeness and representativeness

Abstract

Active learning focuses on selecting a small subset of the most valuable instances for labeling to learn a highly accurate model. Considering informativeness and representativeness of unlabeled instances is significant for a query, some works have been done about combing informativeness and representativeness criteria. However, most of them are generally in a fixed manner to balance these criteria, and difficult to find suitable sampling strategies and weights of informativeness and representativeness for various datasets. In this paper, an adaptive active learning method ALIR is proposed to address these limitations. Firstly, an adaptive active learning framework is represented, in which the weight of informativeness and representativeness criteria can be dynamically updated by the feedback of previous learning processes. Secondly, by formulating the active learning as a Markov decision process, ALIR can adaptively select the suitable sampling strategies according to the reward of the learning process. Finally, extensive experimental results over several benchmark datasets and two real classification datasets demonstrate that ALIR outperforms several state-of-the-art methods. Different from traditional active learning algorithms, ALIR can adaptively select sampling strategies and adjust the weights simultaneously, which helps it more feasible in the application.

Keywords

Active learning adaptive informativeness representativeness

1. Introduction

Many classification problems require abundant training instances to achieve superior performance [1]. Given the large quantities of unlabeled instances and scarce labeled instances of the real world, many conventional algorithms to request the label for an instance seems insufficient as the professional domain knowledge and large cost-consuming need to be required [2, 32]. Thus, it is attached great importance to training a precise model with a small number of labeled instances. Active learning is a technique to address this challenge by iteratively selecting the most valuable instance for labeling to maximize the accuracy of the classifier [3, 33].

The most popular method is pool-based active learning, which assumes that the entire instances are in a pool where the most utility unlabeled instances are selected to query their label by appropriate query strategy. Two types of query criteria, i.e., informativeness and representativeness, are widely applied by active learning methods [4]. The informativeness criterion measures the model uncertainty of classifying unlabeled examples, whereas representativeness measures the ability of an instance represents the entire data space. Usually, a single criterion is deployed for many active learning methods, which often leads to a limitation of performance. Methods selecting the informative instance often are not immune to the sampling bias significantly since it has high uncertainty and neglects the further exploration of datasets [5]. In contrast, approaches favoring the representative instance attempt to take into account the overall data space, but it may reach suboptimal performance as the informativeness of a single instance is ignored and the density of the input data is estimated only with the current cluster result [6, 7].

Figure 1.

Framework diagram of the previous algorithm. The two criteria are combined by a certain weight as a measurement of the instance selection, where the query strategies or their weights based on two criteria are frequently fixed.

Hence some methods [8, 9, 10, 31] were developed to combine the two criteria for picking the instance that is both informative and representative. However, the performance of the proposed combinations is not optimal as two criteria seldom reinforce each other, and instead tend to a disagreement on instance selection [11]. As shown in Fig. 1, some methods are usually done in a fixed manner for a combination of two criteria, rather than by dynamically selecting or reweighting [12]. Additionally, other approaches commonly apply fixed two query strategies to all datasets, which can not work well across all datasets with different properties such as a varying number of classes, different training set size, and data clustering structure [1, 13, 30]. All in all, there is no one algorithm to adaptively select the most profitable query strategies for a given dataset and adjust the weight between the selected strategies dynamically.

In this paper, an adaptive Active Learning algorithm with Informativeness and Representativeness (ALIR) is proposed, which can significantly improve the accuracy of the classifier by robustly incorporating informativeness and representativeness criteria. The first focus of our method is to provide a novel Active Learning Framework (ALF), where the weight of query strategies combination can be dynamically adjusted by the feedback of previously selected instances. Further, our algorithm can learn from experience over multiple active learning rounds to select effective query strategies, and formalize it as a Markov decision process, which can be used to guide the next query according to the past sequence of decisions. Hence, our work seems to be more efficient in instance selection, as the appropriate query strategies and their weights can be dynamically adapted according to specific data distribution and data informativeness. ALIR achieves obvious improvements by identifying informative and representative queries compared to some state-of-the-art methods, which is verified by our empirical study on some UCI and KEEL benchmark datasets and the datasets of practical application.

The remainder of this paper is organized as follows. Section 2 briefly reviews and summarizes some related work. Some efficient sampling strategies of the two criteria are presented in Section 3. Section 4 presents the proposed algorithm in detail. The results of our empirical studies are described and observation and analysis can be obtained in Section 5. Section 6 concludes the proposed approach and provides the suggestion for potential future works.

2. Related work

Approach drawing on informativeness criteria is the most common approach for active learning. Typical approaches include 1) query-by-committee [15], which chooses the instances that have the highest disagreement based on the labels predicted by several distinct classifiers; 2) margin sampling [16, 17], in which the active learner is more inclined to pick the instances that closest to the classification boundaries; 3) entropy sampling [18, 19, 20], where the classifier is uncertain for instances with high information entropy, hence selecting these instances has a high information content for the task of improving the classifier. The main drawback of these methods can not further explore the data distribution, as they only focus on the instances whose label predicted by the classifier is the most ambiguous, making it prone to sampling bias [21, 22].

In contrast, several methods have been proposed to find the most representative instance. Some algorithms [23, 24, 25] take advantage of the cluster structure of the unlabeled instance to select instances that are located in the high-density area for labeling. Despite the success of these methods, which probably leads to a bad performance by only considering the uniform sampling of data space. Further, optimal experimental design methods [26] are developed to find the representativeness instances that can reconstruct the entire data space, whereas the classification uncertainty of the unlabeled instance is almost ignored.

Several active learning algorithms tried to combine the criteria of informativeness and representativeness to address the deficiency of a single criterion. Du et al. [27] combined informativeness and representativeness simultaneously according to the loss risk minimization principle, but their weights need to be adjusted manually. Ebert et al. [1] proposed a feedback-driven framework where the query strategies are picked by the entropy gain before and after instance selection, but their weights are not dynamically adjusted according to the specific situation. More recently the trade-off of the two criteria can be dynamically updated by estimating the gain for the next iteration or by using a feedback-driven method. [5] adopted a fixed boundary to switch the uncertainty sampling and density-weighted uncertainty sampling. Cebron and Berthold [7] developed a weighted formula between density measure and uncertainty measure. In [14], the weights of the two criteria were tuned-up using a multi-armed bandit (MAB) formulation. [13] balanced the trade-off of informativeness and representativeness criteria in a time-varying manner. Despite the strong progress of more accurate models, these algorithms may be ineffective in instance selection, for they only involve two sampling strategies rather than dynamically selecting sampling strategies for a given dataset. Further, previous algorithms are limited in the selection of effective sampling strategies and weights between them simultaneously, which may achieve or face sampling bias, cluster dependency, and classification deviations.

In this work, a new framework is designed to update the weights of two criteria according to the feedback from the previously selected instance for the model, without any need to inform the method about the specifics of the dataset or the available criteria. Meanwhile, the active learning process is formalized as a reinforcement learning problem to select two desirable query strategies based on the reward during the learning process, which helps to an adaptive selection of the most valuable instances for a given dataset. Hence, the proposed algorithm tends to be an adaptive process, which can learn a robust classifier and achieve the same or better performance compared to previous works [1, 2, 28, 29].

Figure 2.

An illustrate examples for selecting informative and representative instances.

To illustrate the informative and representative instance is important for a query, we first perform empirical research on a synthetic dataset. Figure 2a illustrates a synthetic dataset for binary classification, where the triangles and circles represent different classes. We verify four active learning algorithms by sequentially selecting 30 instances for labeling, where the red triangles and red circles are selected instances by each algorithm, and the line of dashes denotes the classification boundary. Figure 2b demonstrates that the right cluster of triangle class and the left cluster of circle class are unexplored by an active learning method [16] based on informativeness criterion, for the data distribution keeps frequently being ignored. As indicated by Fig. 2c, The method [7] adopting representativeness criteria neglects some informative instances in the right cluster of triangle class, as it only considers the data distribution to ensure the diversity of the selected instances. Further, Fig. 2d shows that an active learning method [1] combining the two criteria is usually difficult to find a balance between the two criteria, which makes it difficult to classify some instances located at the classification boundary, resulting in classification deviation. As shown in Fig. 2e, our proposed algorithm is more efficient in finding the decision boundary than the other three approaches, as it can adaptively select the most informative and representative instance for a given dataset.

3. Sampling criteria

The key to active learning lies in the query strategy, which decides the accuracy of the classifier as the desirable instance can be selected by this query strategy to improve the performance. Two informativeness-based query strategies and two representativeness-based query strategies are described in this section.

Supposed there is a dataset with $n$ instances ${D}=\{({x_{1}},{y_{1}}),\ldots,({x_{l}},{y_{l}}),{x_{l+1}},\ldots,{x_{n}}\}$ , which is composed of dataset $L$ with $l$ labeled instances and dataset $U$ with $u$ unlabeled instances. We denote that ${y_{i}}\in\{1,\ldots,c\}$ is the class label of the instance $x_{i}$ and assume there are $c$ classes of the dataset.

3.1 Informativeness criteria

Entropy (Ent) favors the most informative instances where the classification model is most uncertain about their class label:

$\displaystyle\textit{Ent}(x_{i})=-\sum_{j=1}^{c}p(y_{ij}|x_{i})\log p(y_{ij}|x% _{i})$ (1)

where the $y_{ij}$ means that the label of instance $x_{i}$ belongs to class $j$ , the $p(y_{ij}|x_{i})$ is the posterior probability of attributing to class $j$ and the $\sum_{j}p(y_{ij}|x_{i})=1$ . The Entropy based on posterior probability measures the overall uncertainty of instances, which can be applied to the selection of the most informativeness instance $x=\arg{\max_{{x_{i}}\in U}}{\textit{Ent}({x_{i}})}$ , since the greater entropy of the instance contains more new information, which can further improve the model.

Query-By-Committee (QBC) chooses the unlabeled instance whose label is most ambiguous based on several different classifiers:

$\displaystyle\textit{QBC}(x_{i})=-\sum_{j=1}^{c}\frac{v(y_{ij}|x_{i})}{n}\ln% \frac{v(y_{ij}|x_{i})}{n}$ (2)

where the $v(y_{ij}|x_{i})$ is the number of the class $y_{ij}$ requested for instance $x_{i}$ by several classification models, $n$ denotes the total number of models, and $\sum_{j}{v(y_{ij}|x_{i})=n}$ . QBC focuses on the instance $x=\arg{\max_{{x_{i}}\in U}}{\textit{QBC}({x_{i}})}$ whose label is difficult to distinguish by multiple models of voting patterns.

3.2 Representativeness criteria

Node potential (Nod) adopts a Gaussian weighting function to find the representative instances that near or are located in the dense regions:

$\displaystyle\textit{Nod}({x_{i}})=\sum\limits_{j=1}^{n}{{e^{-\alpha{d^{2}}({x% _{i}},{x_{j}})}}}$ (3)

note that the distance between two instances ${d}({x_{i}},{x_{j}})$ is generally measured by the Euclidean distance formula, parameter $\alpha=\frac{4}{{r_{a}^{2}}}$ controls the influence of neighborhood of a node $r_{a}$ . The instance $x=\arg{\max_{{x_{i}}\in U}}{\textit{Nod}({x_{i}})}$ located in the high-density area is chosen for labeling, then the weight of neighbor node is reduced by the formula $\textit{Nod}({x_{j}})=\textit{Nod}({x_{j}})-\textit{Nod}({x_{i}}){e^{-\beta{d^% {2}}({x_{i}},{x_{j}})}}$ where $\beta=\frac{4}{{r_{b}^{2}}}$ .

Kernel (Ker) explores the undiscovered regions to find the most representative instance according to the data distribution:

$\displaystyle\textit{Ker}({x_{i}})=\mathop{\min{\textit{dis}}}\limits_{{x_{j}}% \in L}({x_{i}},{x_{j}})$ (4)

where ${\textit{dis}}({x_{i}},{x_{j}})$ is Euclidean distance between $x_{i}$ and $x_{j}$ , where the Ker measures the minimum distance of entire labeled instances to an unlabeled instance, then the active learner identifies the furthest instance $x=\arg{\max_{{x_{i}}\in U}}{\textit{Ker}({x_{i}})}$ to request a label.

4. Methodology

No one algorithm can select the appropriate query strategies and trim their weights from the previous experience simultaneously. Meanwhile, in previous work, the combination of two criteria can not reinforce each other, as the sampling strategies and the weights between them during the whole active learning process usually are fixed. To address these issues, we first develop a novel adaptive formula, where the weights of the selected sampling strategies are updated by the classified gain for the previous iteration. A second part formalizes the active learning process as a decision process to learn a robust and dynamic policy, which favors the selection of the most efficient strategies according to the utility of the past sequence of annotating decisions. In this work, we focus on adaptively selecting the optimal sampling strategies for the classifier and adjusting their weights concurrently for a given dataset. Meanwhile, the informativeness and representativeness of selected instances are adopted implicitly as one kind of observation of the reinforcement learning agent. Hence, this algorithm is flexible enough to reinforce any combination of query strategies, showing a continuous improvement compared to several state-of-the-art active learning approaches.

In this section, we first introduce how the weight of sampling strategies can be dynamically adjusted using a new framework, and then show how to formalize active learning as a reinforcement learning problem for choosing an efficient combination of sampling strategies.

4.1 ALF

Inspired by [1], the framework that combines informativeness and representativeness criteria is displayed as follow:

$\displaystyle{H}({x_{i}})=p{I}({x_{i}})+(1-p){R}({x_{i}})$ (5)

where ${I}\in\{{\textit{Ent},\textit{QBC}}\}$ , ${R}\in\{{\textit{Nod},\textit{Ker}}\}$ and $p\in[0,1]$ . The smaller $p$ the more exploration for data distribution is available in the next iteration.

The first improvement deploys a rank function to unify the measurement of informativeness and representativeness:

$\displaystyle{r[F}({x_{i}})]={m_{i}},{r[F}({x_{i}})]\leqslant{r[F}({x_{j}})]% \Leftrightarrow{m_{i}}\leqslant{m_{j}}$ (6)

where $m\in\{1,\ldots,u\}$ , ${F}\in\{{\textit{Ent},\textit{QBC},\textit{Nod},\textit{Ker}}\}$ . ${r[I}({x_{i}})]\leqslant{r[I}({x_{j}})]$ means that the informativeness of $x_{i}$ is smaller than $x_{j}$ . By applying similar measures to both criteria, our method is effective in identifying the instance that is both informative and representative.

The crucial contribution of this work is that a new feedback function replaces the fixed probability $p$ . The feedback obtained during each labeling round is applied to adjust the weight of informativeness and representativeness criteria:

$\displaystyle\left\{\begin{array}[]{ll}p={p^{f}}&p<=0.5,\\ p={(1-p)^{(1-f)}}&\text{otherwise}.\end{array}\right.$ (7)

where $p\in({0,1})$ and the feedback $f$ is applied to find a trade-off. The larger feedback implies that the previous learning process is efficient, the probability should be maintained to encourage the current query. In this way, the instances that contribute significantly to classification performance can be selected for a given dataset without any prior knowledge. The feedback is given by the change induced from the previous hypothesis to the current hypothesis:

$\displaystyle f=\frac{1}{{1+\exp[-(h({x_{i}})-h^{\prime}({x_{j}}))]}}$ (8)

where $h({x_{i}})$ is the current hypothesis, $h^{\prime}({x_{j}})$ is the previous hypothesis, $x_{j}$ is the previous selected instance, and the feedback is adopted to measure the success of the previous active learning process. The higher change of hypotheses means that the feedback is higher, leading to a smaller change for probability $p$ . Hence, If the change $(h({x_{i}})-h^{\prime}({x_{j}}))$ is positive (implying significant change between $h({x_{i}})$ and $h^{\prime}({x_{j}})$ ) and we assume the informativeness was successful (meaning $p$ greater than 0.5), keeping the probability is stable to encourage a selection of most informative instances.

Thirdly, we propose a hypothesis using neighborhood entropy, which can measure the success of the previous learning process:

$\displaystyle\textit{Nei}(x_{i})=\frac{\sum_{x_{j}\in\delta(x_{i})}\textit{Ent% }(x_{j})}{\|\delta(x_{i})\|}$ (9)

with $\textit{Ent}({x_{j}})$ is the entropy of $x_{j}$ , $\|\delta(x_{i})\|$ is the number of unlabeled instances in the neighborhood $\delta({x_{i}})$ of $x_{i}$ where $\delta(x_{i})=\{x_{j}|x_{j}\in U,\textit{dis}(x_{i},x_{j})\leqslant\delta_{i}\}$ , ${\delta_{i}}$ is neighborhood radius that controls neighborhood size, ${\delta_{i}}=\lambda\min[\textit{dis}({x_{i}},{x_{j}})]+(1-\lambda)\max[% \textit{dis}({x_{i}},{x_{j}})]$ and $\lambda=0.8$ can be obtained from experimental experience. Neighborhood entropy not only considers the information entropy of each instance, but also considers the relationship between all instances in the neighborhood. To get positive or negative feedback, we use Eq. (6) to rescale this hypothesis:

$\displaystyle h({x_{i}})={r[\textit{Nei}({x_{i}})]}$ (10)

This hypothesis favors instances that are both informative and representative. Therefore, it can be a good measure of whether the past active learning process is effective.

The feedback has a large gradient in the interval [ $-$ 4, 4] whereas the gradient is close to 0 in other intervals, causing the gradient to disappear probably. Hence, the change of hypotheses should be normalized in the interval [ $-$ 4, 4], the final feedback can be calculated as follows:

$\displaystyle f=\frac{1}{1+\exp[-(h(x_{i})-h^{\prime}(x_{j}))/(\|D_{u}\|/8)]}$ (11)

where $\|{{D}_{u}}\|$ denotes the number of unlabeled instances. As a result, the final active learning framework can be obtained as:

$\displaystyle{H}({x_{i}})=p{r[I({x_{i}})]}+(1-p){r[R({x_{i}})]}$ (12)

In each iteration, the label of the instance $x=\arg{\max_{{x_{i}}\in U}}H({x_{i}})$ is requested by the active learner. The framework aims to drive learning from past feedback to select the most informative and representative instances for labeling. If the feedback is positive, continue to maintain the current learning, whereas it is negative (it means that the current learning process is invalid), leading to a dynamical adjustment of the mixture of density and informativeness components for making learning more effective. Hence, the framework using this feedback can outperform other frameworks in most cases.

4.2 Sampling strategies selection mechanism

Selecting different combinations of query strategies on different datasets has different effects, so a mechanism is required to select the optimal query strategies for a given dataset. To address these drawbacks, the active learning process is formalized as a simpler MDP that is a general-purpose framework for decision making as a tuple $(S,A,Q,R)$ . The state $S$ corresponds to the selected instance for labeling and their label. The selected combination of sampling strategies in each round corresponds to the action ${A}={\{I+R}\}$ with ${I}={\{\textit{Ent},\textit{QBC}}\}$ and ${R}={\{\textit{Ker},\textit{Nod}}\}$ . $Q$ is a transition function of selecting an action in the current state. The feedback after selecting an action corresponds to reward $R$ , which provides a reward on the quality of after the action made by the policy. The reward function is central for the policy to choose the action and determine the reward for each action, which measures the performance improvement over the whole learning process.

The crucial improvement applies the reward measuring the informativeness and representativeness to select desirable action. This allows the $Q$ learning to select sampling strategy from the feedback of the previous decision process. Inspired by [1], the policy iteratively updates the $Q$ table using the rewards obtained from choosing each action in the current state:

$\displaystyle Q({{s}^{{(t-1)}}},a)\leftarrow{Q}({{s}^{{(t-1)}}},a+\lambda({r^{% (t)}}+\gamma\mathop{\max}\limits_{{a_{i}}}{Q}({{s}^{{(t)}}},{{a}_{i}})-Q({{s}^% {{(t-1)}}},a)))$ (13)

where $\lambda\in[0,1]$ is a parameter discounting the current reward, $\gamma\in[0,1]$ is a factor to control the influence of the future reward and ${r^{(t)}}=r[\textit{Nei}({x_{i}})]$ is the utility of the selected unlabeled instance from each action. During the active learning process, we decide for action $a=\arg\max{{Q}_{{{a}_{i}}}}$ and use a mixture of action with the state to select the most valuable instance. The success of previous learning can be measured by this reward since the informativeness of the selected instance and its distribution are considered. Hence, apposite and effective query strategies can be selected from multiple strategies, which allows the learning of policy can dynamically select the most informative and representative instances.

4.3 ALIR

The selection of query strategies and the trade-off between informativeness and representativeness are key ingredients to improve the performance of active learning. In previous work, the trade-off between informativeness and representativeness criteria is usually fixed in all scenes, which may achieve a sampling bias and cluster dependency. Therefore, the experience of previous learning processes is applied to select an appropriate and effective query strategy combination and adapt their weights to tackle this challenge. Based on the proposed optimal framework and sampling strategies selection mechanism, we propose an adaptive active learning method, whose flow-chart is as shown in Fig. 3. The work details of this method are mainly composed of the following two parts:

Figure 3.

Framework diagram of the proposed algorithm. First of all, adapt the weight between the informative and representative criteria according to the feedback during the previous learning processes, then combine the query strategies based on the two criteria separately, and finally choose the sampling strategies with the maximum reward for a selection of the most valuable instances.

Weight decision: the fixed parameter is replaced by adapting weight during entire active learning processes as shown in the previous section, which can tackle the problem that a trade-off between selected query strategies is difficult to find. The main idea is that some datasets might need more exploration at the beginning and more exploitation at the end while other datasets might need a constant trade-off. To address this limitation, we use Eq. (7) to adjust the parameters of the selected query strategies, leading our work more flexible and convenient.

Strategies selection: the active learning process can be viewed as a sequence of decisions, and then it can be formulated as a Markov decision process (MDP) to select two effective query strategies over the entire labeling process. When the $Q$ table is empty, the $Q$ table needs to be initialized. We compute the reward ${r^{(t)}}=r[\textit{Nei}({{x}_{i}})]$ and ${\max_{{x_{i}}\in U}}H({x_{i}})$ for each action $a_{i}$ . Hence, the $Q$ value for each action can be calculated as $Q({{s}^{{(t-1)}}},a)\leftarrow Q({{s}^{{(t-1)}}},a+\lambda({r^{(t)}}+\gamma{% \max_{i}}H({x_{i}})))$ . In other cases, we adopt Eq. (13) to calculate the $Q$ value for each action. Finally, we select the next action $a=\arg\max{{Q}_{{{a}_{i}}}}$ , making it achieves a more accurate classifier.

When action $a=\{\textit{Ent},\textit{Nod }\}$ is chosen where $a=\arg{\max_{a}}{{Q}_{a}}$ , the Ent and Nod are selected and the parameters between them are updated by the feedback function in Eq. (7), and then the newly selected instance $x=\arg\mathop{\max}\limits_{a}H({x_{i}})$ is annotated and added to the training dataset, and a more accurate classifier can be trained accordingly. In summary, a rich and dynamic policy can be learned for selecting desirable sampling strategies and adjusting the weights between them based on the past sequence of annotation decisions. This work does not adopt the fixed heuristic but instead learns a robust policy to select the most valuable instances for labeling. Meanwhile, this allows for the algorithm to be applied in the different datasets to select the most valuable instances without prior knowledge, leading to a significant performance over all datasets. The details are shown in algorithm 3.

[h] : The ALIR Algorithm[1] Unlabeled dataset $U$ , labeled dataset $L$ , budget $B$ ; Model $\phi$ ; each $\textit{episode}\in[1,N]$ ${{L}}\Leftarrow\emptyset$ , state ${s_{i}}=\emptyset$ ; Calculate probability $p$ using Eq. (7); ${{I}_{i}},{{R}_{i}}\in A$ ${{x}_{i}}\in U$ Calculate $H({x_{i}})$ using Eq. (12); Store $x=\arg{\max_{{x_{i}}\in U}}H({x_{i}})$ ; $\textit{episode}=1$ Receive a reward ${{r}^{{(t)}}}$ ; Calculate $Q({{s}^{{(t-1)}}},a)\leftarrow Q({{s}^{{(t-1)}}},a+\lambda({r^{(t)}}+\gamma{% \max_{i}}H({x_{i}})))$ ; Receive a reward ${{r}^{{(t)}}}$ ; Calculate $Q({{s}^{{(t-1)}}},a)\leftarrow{Q(}{{s}^{{(t-1)}}},a+\lambda({r^{(t)}}+\gamma{% \max_{i}}H({x_{i}})-Q({{s}^{{(t-1)}}},a)))$ ; Receive $x=\arg\mathop{\max}\limits_{a}H({x_{i}})$ with $a=\arg\mathop{\max}\limits_{{a_{i}}}Q({{s}_{i}},{a_{i}})$ ; Obtain the annotation $y$ for $x$ ; ${{L}}\Leftarrow{{L}}+(x,y)$ ; $U\Leftarrow U-x$ ; Update model $\phi$ based on $L$ Construct the new state ${s_{i+1}}$ ; Update $B$ ; $B$ is exhausted break; Update episode;

5. Experiments

In this section, we first verify the effectiveness of ALF and ALIR on the benchmark datasets, followed by applying our method to real problems.

5.1 Datasets

Twelve datasets in KEEL [35] and UCI [34] platforms are adopted to verify the effectiveness of our research, which is often used in the field of active learning. The classes of datasets include three two-class and nine multi-class, and the number of instances range from 214 to 5473, and their specific characteristics are described in Table 1.

Table 1
The datasets information, including the number of corresponding features, labels, and instances

Dataset	Feature	Instance	Label	Dataset	Feature	Instance	Label
Austra	14	690	2	Led7digit	7	500	10
Biodeg	41	1055	2	Pageblocks	10	5473	5
Cancer	30	569	7	Segment	19	2310	7
Credit	20	1000	2	Sonar	18	846	4
Ecoli	7	336	8	Steel	27	1941	7
Ionosphere	9	214	7	Waveform	21	5000	3

5.2 Study on ALF

5.2.1 Setting

In this part, we compare ALF with the following two algorithms:

PBAC [7]: queries informative and representative examples by weighted formula between uncertainty and diversity.

Exploration (Explora) [14]: adjusts the parameter of informativeness and representativeness using a MAB formulation to select a batch of instances.

The datasets are divided into the test set, the training set, and the unlabeled dataset. Among them, training instances account for 10%, test instances account for 30%, and unlabeled instances account for 60%. Exploration and PBAC use the recommended setting of [14, 7]. We initialize $p=0.5$ for ALF. We evaluate the performance of the classifier by its classification accuracy (ACC) on the test set. To ensure the effectiveness of the experiment, we repeat the experiment ten times and take the average value as the final result, each with a random separation of the dataset. For all experiments, Logistic Regression (LR), Decision Tree (DT), and Support Vector Classification (SVC) are used as our classifier. Given space limitations, only the experimental results based on LR are shown in this paper, the detailed experimental results based on DT and SVC can be found in Appendix A.

5.2.2 Comparison with state-of-the-art methods

Table 2 shows the classification accuracy of three active learning algorithms and four combinations of query strategies with 50 queries. To compare the average classification accuracy, every five rows and three columns of the data section represent the results on a dataset. When comparing horizontally under each dataset, the best result for each strategies combination is shown in bold. The best performance for each dataset is shown in shades of gray.

Table 2
Comparison of average classification accuracy, the best performance for each dataset is shown in the gray shadow, the best result for each combination of each dataset is highlighted in boldface

Dataset	Austra			Biodeg			Cancer
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.801	0.803	0.813	0.818	0.826	0.821	0.900	0.923	0.921
Ent $+$ Ker	0.810	0.803	0.803	0.811	0.818	0.832	0.920	0.923	0.927
QBC $+$ Nod	0.801	0.800	0.810	0.812	0.807	0.841	0.901	0.923	0.921
QBC $+$ Ker	0.805	0.803	0.800	0.817	0.824	0.845	0.914	0.924	0.926
Mean	0.804	0.802	0.807	0.815	0.819	0.835	0.909	0.923	0.924
Dataset	Credit			Ecoli			Ionosphere
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.669	0.704	0.709	0.688	0.698	0.700	0.777	0.806	0.825
Ent $+$ Ker	0.682	0.705	0.709	0.678	0.699	0.707	0.809	0.807	0.812
QBC $+$ Nod	0.662	0.698	0.696	0.694	0.706	0.715	0.795	0.814	0.829
QBC $+$ Ker	0.681	0.706	0.710	0.675	0.705	0.708	0.814	0.813	0.812
Mean	0.674	0.703	0.706	0.684	0.702	0.708	0.799	0.810	0.820
Dataset	Led7digit			Page-blocks			Segement
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.562	0.585	0.594	0.954	0.955	0.956	0.887	0.894	0.895
Ent $+$ Ker	0.622	0.592	0.603	0.957	0.954	0.957	0.904	0.892	0.902
QBC $+$ Nod	0.562	0.583	0.595	0.954	0.953	0.956	0.892	0.887	0.896
QBC $+$ Ker	0.617	0.581	0.620	0.954	0.955	0.955	0.901	0.898	0.897
Mean	0.591	0.585	0.603	0.955	0.954	0.956	0.896	0.893	0.898
Dataset	Sonar			Steel			Waveform
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.502	0.514	0.525	0.542	0.529	0.546	0.832	0.836	0.837
Ent $+$ Ker	0.504	0.514	0.522	0.513	0.536	0.538	0.836	0.837	0.838
QBC $+$ Nod	0.503	0.500	0.501	0.541	0.543	0.545	0.830	0.829	0.833
QBC $+$ Ker	0.479	0.515	0.514	0.516	0.543	0.539	0.835	0.833	0.835
Mean	0.497	0.511	0.516	0.528	0.538	0.542	0.833	0.834	0.836

From above results, We observe 1) The average ACC values of ALF are superior to Exploration and PBAC in most cases.

2) ALF significantly outperforms other comparison algorithms on the biodeg, led7digit, and ionosphere datasets, especially on biodeg by 2.4%.

3) PBAC works best on some dataset but performs poorly on others due to the fixed parameter of informativeness and representativeness. The phenomenon of Exploration is similar to PBAC, which achieves good performance on some datasets and performs a deficiency on others as the adaptation of weight between uncertainty and diversity is limited.

4) The times of $\textit{Ent}+\textit{Nod}$ , $\textit{Ent}+\textit{Ker}$ , $\textit{QBC}+\textit{Nod}$ , and $\textit{QBC}+\textit{Ker}$ performed best on 12 datasets are 3, 6, 2, and 2 respectively.

5) There is no one combination of sampling strategies works best in every dataset since different query strategies have different performances on different datasets.

6) ALF outperforms the baseline methods for most cases as it focuses on the most valuable instances by an adaptive framework based on informativeness and representativeness.

5.3 Study on ALIR

Figure 4.

Comparison on AUC.

Table 3

Comparison on F1-score

		Number of queries
Dataset		10	20	30	40	50	60	70	80	Mean
Austra	QUIRE	0.7846	0.7866	0.7877	0.8048	0.8147	0.8150	0.8146	0.8199	0.8035
	BMDR	0.7997	0.7792	0.7806	0.7943	0.8102	0.8217	0.8198	0.8250	0.8038
	SPAL	0.7845	0.7942	0.8072	0.8141	0.8076	0.8152	0.8213	0.8202	0.8080
	RALF	0.7898	0.8013	0.814	0.8225	0.8284	0.8382	0.8394	0.8406	0.8218
	ALF	0.7862	0.7994	0.8120	0.8172	0.8313	0.8330	0.8441	0.8430	0.8208
	ALIR	0.8040	0.7988	0.8157	0.8208	0.8258	0.8407	0.8560	0.8511	0.8266
Biodeg	QUIRE	0.7728	0.7524	0.7892	0.7932	0.8016	0.7970	0.8089	0.8084	0.7904
	BMDR	0.7742	0.7781	0.7930	0.7942	0.8036	0.8107	0.8199	0.8212	0.7994
	SPAL	0.7716	0.7852	0.8006	0.8079	0.8109	0.8226	0.8210	0.8231	0.8054
	RALF	0.7839	0.8001	0.8022	0.8079	0.8084	0.8091	0.8157	0.8251	0.8066
	ALF	0.7880	0.7988	0.8095	0.8171	0.8100	0.8270	0.8242	0.8274	0.8128
	ALIR	0.7967	0.8215	0.8215	0.8290	0.8339	0.8206	0.8272	0.8446	0.8244
Cancer	QUIRE	0.9018	0.8936	0.8987	0.8987	0.9015	0.8989	0.9023	0.9035	0.8999
	BMDR	0.8960	0.8961	0.9095	0.9095	0.9103	0.9146	0.9144	0.9113	0.9077
	SPAL	0.8989	0.9064	0.9063	0.9050	0.9144	0.9113	0.9155	0.9137	0.9089
	RALF	0.9015	0.9144	0.9160	0.9207	0.9246	0.9286	0.9326	0.9349	0.9217
	ALF	0.9257	0.9153	0.9266	0.9303	0.9304	0.9361	0.9387	0.9397	0.9304
	ALIR	0.9226	0.9289	0.9281	0.9281	0.9389	0.9355	0.9389	0.9341	0.9319
Credit	QUIRE	0.6181	0.6093	0.6222	0.6241	0.6285	0.6129	0.5982	0.5973	0.6138
	BMDR	0.5730	0.6259	0.6148	0.6108	0.6078	0.6098	0.5801	0.5483	0.5963
	SPAL	0.6113	0.6264	0.6163	0.6098	0.6163	0.6098	0.6083	0.5992	0.6122
	RALF	0.5901	0.5156	0.4959	0.5070	0.5357	0.5292	0.5574	0.5559	0.5359
	ALF	0.6340	0.6189	0.6209	0.6355	0.6143	0.5952	0.6083	0.6118	0.6174
	ALIR	0.6390	0.6592	0.6476	0.6561	0.6612	0.6637	0.6647	0.6556	0.6559
Ecoli	QUIRE	0.5318	0.5447	0.5469	0.5451	0.5310	0.4940	0.5136	0.5015	0.5261
	BMDR	0.4890	0.4963	0.5005	0.5043	0.4766	0.4733	0.4967	0.5134	0.4938
	SPAL	0.4757	0.4963	0.5612	0.5454	0.5421	0.5110	0.5058	0.4943	0.5165
	RALF	0.4890	0.5330	0.4570	0.5321	0.5531	0.5727	0.5669	0.5841	0.5360
	ALF	0.4718	0.4914	0.5129	0.5454	0.5516	0.5574	0.5727	0.5660	0.5337
	ALIR	0.5254	0.5287	0.5497	0.5846	0.5813	0.5865	0.6099	0.6147	0.5726
Ionosphere	QUIRE	0.7191	0.7044	0.7076	0.7435	0.7333	0.7521	0.7406	0.7320	0.7291
	BMDR	0.7235	0.7153	0.6923	0.7004	0.7069	0.7527	0.7567	0.7673	0.7269
	SPAL	0.7069	0.7371	0.7595	0.7707	0.7690	0.7913	0.7934	0.7720	0.7625
	RALF	0.7455	0.7625	0.7785	0.7815	0.7798	0.7988	0.7900	0.7930	0.7787
	ALF	0.7330	0.7727	0.7723	0.7863	0.7852	0.7856	0.7876	0.7991	0.7777
	ALIR	0.7656	0.7859	0.7903	0.7971	0.7896	0.7849	0.7920	0.7866	0.7865
Led7digit	QUIRE	0.4083	0.4593	0.4149	0.4851	0.5295	0.5686	0.5871	0.5639	0.5021
	BMDR	0.4205	0.6522	0.7382	0.7426	0.5793	0.5767	0.5819	0.7329	0.6280
	SPAL	0.5425	0.5188	0.3546	0.2862	0.2836	0.4082	0.4915	0.5670	0.4316
	RALF	0.6653	0.7575	0.7303	0.7428	0.7680	0.7627	0.7662	0.7373	0.7413
	ALF	0.6653	0.7575	0.7619	0.7338	0.7461	0.7487	0.7557	0.7399	0.7386
	ALIR	0.7294	0.7434	0.7496	0.7627	0.7750	0.7733	0.7671	0.7654	0.7582
Page-blocks	QUIRE	0.7359	0.7359	0.7359	0.7408	0.7529	0.7720	0.7720	0.7677	0.7516
	BMDR	0.7827	0.7827	0.7227	0.7085	0.7610	0.7761	0.7719	0.7736	0.7599
	SPAL	0.7248	0.7673	0.7594	0.7569	0.7678	0.7736	0.7748	0.7744	0.7624
	RALF	0.7882	0.7952	0.7573	0.7861	0.7861	0.8015	0.7932	0.7819	0.7862
	ALF	0.7890	0.7894	0.7882	0.7139	0.7123	0.7189	0.7323	0.6685	0.7391
	ALIR	0.8065	0.7952	0.7923	0.8090	0.8082	0.7936	0.8082	0.8040	0.8021
Segement	QUIRE	0.8403	0.8498	0.8752	0.8620	0.8714	0.8896	0.8972	0.8972	0.8728
	BMDR	0.8960	0.8960	0.9014	0.9051	0.9051	0.9041	0.9104	0.9133	0.9039
	SPAL	0.8989	0.9021	0.8994	0.9071	0.9055	0.9115	0.9131	0.9030	0.9051
	RALF	0.9020	0.9098	0.9082	0.9114	0.9070	0.9095	0.9216	0.9232	0.9116

Table 3, continued
		Number of queries
Dataset		10	20	30	40	50	60	70	80	Mean
	ALF	0.9015	0.9015	0.9049	0.9140	0.9141	0.9160	0.9126	0.9122	0.9096
	ALIR	0.9052	0.9135	0.9196	0.9250	0.9233	0.9279	0.9285	0.9294	0.9216
Sonar	QUIRE	0.4295	0.5488	0.6093	0.6127	0.6303	0.6908	0.6824	0.6707	0.6093
	BMDR	0.5880	0.5791	0.6654	0.7409	0.7111	0.7101	0.7260	0.7121	0.6791
	SPAL	0.4242	0.5652	0.5771	0.5771	0.6069	0.6396	0.6277	0.6466	0.5831
	RALF	0.6257	0.6158	0.6198	0.5930	0.6118	0.6942	0.7091	0.7260	0.6494
	ALF	0.6059	0.5424	0.4808	0.6168	0.6257	0.6903	0.6793	0.7121	0.6192
	ALIR	0.5711	0.6744	0.7319	0.8004	0.7717	0.7736	0.8074	0.7607	0.7364
Steel	QUIRE	0.4207	0.5123	0.5295	0.5230	0.5555	0.5703	0.5389	0.4798	0.5163
	BMDR	0.3850	0.5095	0.5599	0.5310	0.6097	0.6694	0.6782	0.6441	0.5734
	SPAL	0.4010	0.3725	0.3974	0.3698	0.4443	0.4908	0.5260	0.5549	0.4446
	RALF	0.5591	0.5512	0.6152	0.6495	0.6923	0.6952	0.6851	0.6751	0.6403
	ALF	0.5413	0.5772	0.5910	0.6111	0.6673	0.6943	0.7030	0.7569	0.6428
	ALIR	0.6045	0.5678	0.6849	0.6932	0.7320	0.6448	0.6765	0.6751	0.6599
Waveform	QUIRE	0.9429	0.9288	0.9321	0.9266	0.9272	0.9396	0.9451	0.9451	0.9359
	BMDR	0.9137	0.9074	0.9323	0.9203	0.9198	0.9198	0.9198	0.9446	0.9222
	SPAL	0.9137	0.9074	0.9194	0.9264	0.9264	0.9198	0.9133	0.9328	0.9199
	RALF	0.9251	0.9192	0.9192	0.9313	0.9438	0.9440	0.9440	0.9502	0.9346
	ALF	0.9251	0.9250	0.9188	0.9380	0.9438	0.9440	0.9440	0.9502	0.9361
	ALIR	0.9195	0.9250	0.9380	0.9380	0.9438	0.9502	0.9502	0.9502	0.9394

5.3.1 Setting

In this part, we compare ALIR with five algorithms:

QUIRE [2]: selects informative and representative instances using the min-max approach.

BMDR [28]: favors the selection of discriminative and representative examples by minimizing the ERM risk bound of active learning.

SPAL [29]: queries informative, representative, and easy examples by minimizing a well-designed objective function.

RALF [1]: chooses informative and representative examples by a feedback-driven framework.

ALF: selects informative and representative examples by a dynamically adaptive framework based on the experience of previous learning processes.

The dataset division is the same as the previous section. We set $\lambda=0.5$ and $\gamma=1$ , where the most optimal decision can be made by learning from the previous feedback as far as possible. We initialize $p=0.5$ for ALF and ALIR.

ALF uses the best combination $\textit{Ent}+\textit{Ker}$ , Other comparison algorithms use the suggested setting according to each article. The Area Under ROC curve (AUC) values and F1-score are applied to evaluate the performance of the classifier where the F1-score combines recall and precision with equal weights. To ensure the effectiveness of the experiment, the instances random partition is repeated ten times and the result is the average result after running ten times. For all experiments, LR, DT and SVC are used as our classifiers. Given space limitations, only the experimental results based on LR are shown in this paper, the detailed experimental results based on DT and SVM can be found in Appendix B.

5.3.2 Comparison with state-of-the-art methods

Figure 4 displays AUC values on six active learning algorithms with an increasing number of labeled instances queries. The performance on F1-score with an increasing number of queries are shown in Table 3. The best result for each dataset is highlighted in boldface.

We observe 1) The AUC values of ALIR are superior to other state-of-the-art algorithms in most cases. The average AUC values of ALIR on datasets austra, sonar, credit, and waveform are significantly better than other algorithms, especially on dataset credit, which is improved by 4.93% on average.

2) ALIR has an obvious advantage in F1-score compared with other approaches. The average F1-score of ALIR on the dataset credit, page-blocks, segment, and sonar are distinctly better than others, especially on dataset sonar, which is improved by 10.84% on average.

3) With the continuous queries for unlabeled instances, the classification performance of ALF and ALIR is on the rise due to the adaptive and effective feedback during the query processes.

4) QUIRE and BMDR perform a deficiency in most cases due to their without developing an adaptive mechanism to tune the trade-off automatically. SPAL performs poorly on some datasets since it focuses on the instances that have high potential value but the feedbacks of previous learning are neglected. RALF often achieves a suboptimal on most datasets due to insufficient feedback. ALIR outperforms ALF for most cases as it can choose the optimal query strategies combination in each round of active learning.

5) In general, compared to the baselines, our approach ALIR achieves the best performance in most cases, as it can choose the optimal query strategies combination for each dataset, and adapt the weight between the informativeness and the representativeness criteria according to previous feedback of active learning processes, leading to a selection of the most informative and representative instance.

5.4 Application on ALIR

5.4.1 Datasets of practical application

Assumed that under the premise that high-complexity modules have a high defect rate by software defect prediction theory, some measurements of software products are applied to characterize the complexity of software products, and then the defect status of software modules can be predicted. According to the prediction results, software development organizations can focus their limited development and test resources on high-risk modules that are prone to defects, so as to find and eliminate defects more effectively, and improve the quality and reliability of software products. In order to apply the proposed method to this practical problem, the KC1 dataset (NASA) from the public NASA repository was selected, which contains 2109 records of software defects, each of which contains 21 features that attempt to objectively characterize code features that are associated with software quality. In this case, each record is classified as either non-defective or defective.

When diagnosing cardiovascular disease through ECG, the change of each waveform is actually concerned. For example, when the QRS wave becomes larger and wider, premature ventricular contractions may occur; when the ST segment is elevated, myocardial infarction may occur. Therefore, some morphological characteristics or other measurements of the ECG signal are usually used to visually represent the changes of various waveforms. In this way, through the most intuitive feature representation of waveform changes, the diagnosis results of cardiovascular and cerebrovascular diseases can be predicted. The ECG5000 dataset consists of 5000 heartbeats extracted from a 20-hour long electrocardiogram of a patient with severe congestive heart failure. Each record constants 140 features and is classified into one of the five categories: normal (58.4% of the entire instances), R-on-T Premature Ventricular Contraction (PVC) (35.3%), PVC (1.9%), supraventricular (3.9%), and unclassifiable (0.5%).

5.4.2 Setting

The training/testing instances segmentation is consistent with the previous section, the comparing algorithms include QUIRE, BMDR, SPAL, RALF, ALF, and ALIR. The classification accuracy (ACC) is applied to verify the experimental results. To get rid of the influence of randomness, we use the average of 10 times execution results as the final result.

5.4.3 Comparison with state-of-the-art methods

Figure 5.

Comparison on ACC.

Figure 5 demonstrates the classification accuracy of our method compared to the baseline algorithms on the real problem, ALIR achieves consistently better performance across the different numbers of labeled instances. The average ACC of ALIR on dataset KC1 is 1.5% higher than various benchmark algorithms, whereas it achieves 2.2% better than other methods on dataset ECG5000. The performance gap is more clear especially with increasing the number of queries where the selected instances contributed to model training. QUIRE suffers from a significant gap in performance when compared with ALIR, especially in dataset KC1. In general, empirical experiments demonstrate that the proposed ALIR can achieve promising results and clearly exceeds the established baseline.

6. Conclusion

This paper proposes a novel adaptive active learning algorithm, which can dynamically select sampling strategies and adjust the weights between them for different datasets. In this method, the weight between informativeness and representativeness is dynamically updated by measuring the success of the previous learning process. Further, by formulating the entire active learning process as a Markov decision process, the sampling strategies with maximum reward are chosen by a robust policy learned from the past decision process. Hence, our method achieves frequently optimal performance during the whole active learning process and is fast and efficient due to an adaptive process without dataset-specific tuning. Results on benchmark datasets and real datasets show a superior improvement of the classification performance with the proposed approach, compared to other state-of-the-art methods. We observe from our experiments that it is beneficial to dynamically update the trade-off parameter which balances the informativeness and representative during the query process. In the future, extending it to high-dimensional instances is the next step in the research, such as image processing, natural language processing, etc.

Footnotes

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos. 61563012 and 61802085), Innovation Project of Guangxi Graduate Education (YCSW2019162), and the Guangxi Key Laboratory of Embedded Technology and Intelligent System Foundation (No. 2018A-04).

Appendix

Experiments for ALF based on DT and SVC

Table A1 displays a comparison of average classification accuracy based on DT for ALF, and the one based on SVC is shown in Table A2.

Table A1

Comparison of average classification accuracy based on DT

Dataset	Austra			Biodeg			Segement
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.793	0.794	0.795	0.732	0.722	0.751	0.862	0.856	0.873
Ent $+$ Ker	0.793	0.805	0.799	0.712	0.740	0.725	0.868	0.866	0.868
QBC $+$ Nod	0.793	0.793	0.803	0.743	0.722	0.739	0.877	0.867	0.889
QBC $+$ Ker	0.787	0.801	0.812	0.713	0.737	0.739	0.878	0.880	0.867
Mean	0.792	0.798	0.802	0.725	0.730	0.739	0.871	0.867	0.874
Dataset	Cancer			Credit			Sonar
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.891	0.896	0.881	0.577	0.632	0.644	0.529	0.530	0.516
Ent $+$ Ker	0.905	0.896	0.921	0.641	0.609	0.605	0.514	0.522	0.533
QBC $+$ Nod	0.902	0.880	0.908	0.608	0.607	0.644	0.548	0.543	0.565
QBC $+$ Ker	0.900	0.888	0.901	0.613	0.625	0.627	0.497	0.508	0.570
Mean	0.900	0.890	0.903	0.610	0.618	0.630	0.522	0.526	0.546
Dataset	Ecoli			Ionosphere			Steel
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.573	0.634	0.638	0.821	0.834	0.853	0.535	0.555	0.565
Ent $+$ Ker	0.625	0.651	0.698	0.831	0.835	0.843	0.546	0.540	0.547
QBC $+$ Nod	0.574	0.644	0.626	0.858	0.816	0.863	0.550	0.545	0.557
QBC $+$ Ker	0.623	0.665	0.651	0.834	0.853	0.848	0.520	0.543	0.530
Mean	0.599	0.649	0.653	0.836	0.835	0.852	0.538	0.546	0.550
Dataset	Led7digit			Page-blocks			Waveform
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.589	0.582	0.607	0.950	0.948	0.952	0.717	0.724	0.720
Ent $+$ Ker	0.592	0.589	0.604	0.947	0.949	0.949	0.710	0.722	0.720
QBC $+$ Nod	0.588	0.607	0.632	0.950	0.946	0.948	0.723	0.717	0.725
QBC $+$ Ker	0.614	0.603	0.592	0.948	0.947	0.952	0.703	0.716	0.721
Mean	0.596	0.595	0.609	0.949	0.948	0.950	0.713	0.720	0.722

Table A2

Comparison of average classification accuracy based on SVC

Dataset	Austra			Biodeg			Segement
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.547	0.553	0.563	0.811	0.823	0.838	0.899	0.902	0.910
Ent $+$ Ker	0.553	0.545	0.574	0.791	0.815	0.815	0.908	0.902	0.899
QBC $+$ Nod	0.549	0.533	0.564	0.829	0.797	0.826	0.913	0.904	0.919
QBC $+$ Ker	0.547	0.543	0.552	0.790	0.827	0.832	0.905	0.901	0.902
Mean	0.549	0.544	0.563	0.805	0.816	0.828	0.906	0.902	0.908
Dataset	Cancer			Credit			Sonar
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.915	0.933	0.931	0.514	0.673	0.675	0.472	0.467	0.485
Ent $+$ Ker	0.924	0.932	0.935	0.674	0.672	0.673	0.471	0.467	0.495
QBC $+$ Nod	0.924	0.931	0.937	0.528	0.655	0.677	0.482	0.455	0.468
QBC $+$ Ker	0.920	0.930	0.937	0.673	0.673	0.674	0.489	0.459	0.504
Mean	0.921	0.932	0.935	0.597	0.668	0.675	0.479	0.462	0.488
Dataset	Ecoli			Ionosphere			Steel
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.700	0.708	0.709	0.905	0.901	0.905	0.498	0.483	0.500
Ent $+$ Ker	0.714	0.716	0.725	0.883	0.909	0.909	0.454	0.467	0.491
QBC $+$ Nod	0.680	0.725	0.699	0.905	0.897	0.910	0.474	0.484	0.489
QBC $+$ Ker	0.708	0.726	0.746	0.895	0.903	0.918	0.465	0.482	0.506
Mean	0.701	0.719	0.720	0.897	0.903	0.911	0.473	0.479	0.497
Dataset	Led7digit			Page-blocks			Waveform
Combination	PBAC	Explora	ALF	PBAC	Explora	ALF	PBAC	Explora	ALF
Ent $+$ Nod	0.635	0.641	0.639	0.927	0.931	0.932	0.806	0.808	0.810
Ent $+$ Ker	0.643	0.635	0.644	0.929	0.930	0.931	0.806	0.815	0.814
QBC $+$ Nod	0.624	0.621	0.631	0.923	0.921	0.925	0.798	0.797	0.813
QBC $+$ Ker	0.634	0.637	0.645	0.930	0.924	0.930	0.805	0.817	0.817
Mean	0.634	0.634	0.640	0.927	0.927	0.930	0.804	0.809	0.814

Experiments for ALIR based on DT and SVC

The performance of the F1-score with an increasing number of queries based on DT and SVC are shown in Tables A3 and A4. The best result for each dataset is highlighted in boldface.

Figures A1 and A2 also display AUC values on six active learning algorithms with an increasing number of labeled instances queries based on DT and SVC.

Table A3

Comparison of F1-score based on DT

		Number of queries
Dataset		10	20	30	40	50	60	70	80	Mean
Austra	QUIRE	0.7209	0.7119	0.7311	0.7599	0.7485	0.7503	0.7970	0.7692	0.7486
	BMDR	0.7245	0.7142	0.7838	0.7491	0.7692	0.7750	0.7588	0.7450	0.7525
	SPAL	0.7481	0.7332	0.7752	0.7395	0.7404	0.7660	0.7662	0.7911	0.7575
	RALF	0.7598	0.6575	0.7271	0.6938	0.7430	0.7389	0.6703	0.6846	0.7094
	ALF	0.7573	0.7283	0.7665	0.7013	0.6989	0.7128	0.6835	0.7476	0.7245
	ALIR	0.7568	0.7832	0.8098	0.7939	0.7843	0.8183	0.8005	0.8305	0.7972
Biodeg	QUIRE	0.7762	0.7616	0.7574	0.7617	0.7645	0.7706	0.7758	0.7805	0.7685
	BMDR	0.7744	0.7606	0.7555	0.7452	0.7655	0.7773	0.7655	0.7805	0.7656
	SPAL	0.7571	0.7638	0.7635	0.7712	0.7712	0.7561	0.7568	0.8168	0.7696
	RALF	0.7610	0.7475	0.7269	0.7465	0.7491	0.8082	0.8072	0.8255	0.7715
	ALF	0.7709	0.7475	0.7211	0.7500	0.7558	0.7532	0.7860	0.7770	0.7577
	ALIR	0.7770	0.7854	0.7767	0.7883	0.7889	0.8123	0.8213	0.8059	0.7945
Ecoli	QUIRE	0.4668	0.5340	0.4905	0.5037	0.5538	0.5415	0.5964	0.5746	0.5327
	BMDR	0.5255	0.4534	0.4987	0.5830	0.5217	0.5263	0.4994	0.5316	0.5175
	SPAL	0.5201	0.4626	0.5339	0.5462	0.5370	0.4941	0.4818	0.4703	0.5058
	RALF	0.5255	0.6022	0.5869	0.6191	0.6528	0.6375	0.6290	0.6459	0.6124
	ALF	0.5117	0.5600	0.6590	0.5991	0.6260	0.5991	0.6544	0.6636	0.6091
	ALIR	0.5715	0.5907	0.6728	0.6720	0.6743	0.6436	0.6321	0.6766	0.6417
Credit	QUIRE	0.5322	0.5280	0.5802	0.5665	0.5651	0.5675	0.5708	0.5764	0.5608
	BMDR	0.5594	0.5081	0.5779	0.6150	0.6051	0.5557	0.5248	0.5563	0.5628
	SPAL	0.5958	0.4716	0.5136	0.5260	0.5792	0.5507	0.5507	0.5977	0.5482
	RALF	0.5291	0.5211	0.5229	0.5143	0.4858	0.5143	0.4481	0.4895	0.5031
	ALF	0.5594	0.5748	0.5965	0.5285	0.5279	0.5211	0.4759	0.4605	0.5306
	ALIR	0.5742	0.5884	0.6107	0.6070	0.6268	0.6552	0.6249	0.6404	0.6160
Led7digit	QUIRE	0.6125	0.6953	0.6938	0.6878	0.6866	0.6651	0.6663	0.6597	0.6709
	BMDR	0.6671	0.6608	0.6608	0.6680	0.6690	0.6780	0.6886	0.6952	0.6734
	SPAL	0.6684	0.6671	0.6492	0.6492	0.6462	0.6723	0.6786	0.6786	0.6637
	RALF	0.6906	0.6902	0.6925	0.6882	0.6882	0.7104	0.7018	0.7167	0.6973
	ALF	0.6922	0.6886	0.6886	0.6945	0.6932	0.7210	0.7273	0.7273	0.7041
	ALIR	0.6876	0.7150	0.6952	0.7054	0.7134	0.7203	0.7349	0.7514	0.7154
Steel	QUIRE	0.7598	0.8428	0.8420	0.8538	0.8619	0.8817	0.9059	0.8924	0.8550
	BMDR	0.7696	0.8045	0.8377	0.8534	0.8651	0.8770	0.8368	0.8682	0.8390
	SPAL	0.7748	0.8132	0.8377	0.8604	0.8691	0.8551	0.9093	0.8922	0.8515
	RALF	0.7748	0.7975	0.8010	0.8420	0.8569	0.8656	0.8656	0.8813	0.8356
	ALF	0.7626	0.8298	0.7966	0.8420	0.8569	0.8656	0.8805	0.8875	0.8402
	ALIR	0.7591	0.8263	0.8691	0.8787	0.8751	0.8875	0.8883	0.8927	0.8596
Segment	QUIRE	0.8756	0.8579	0.8623	0.8683	0.8840	0.8803	0.8849	0.8778	0.8739
	BMDR	0.8761	0.8773	0.8711	0.8693	0.8826	0.8837	0.9016	0.8994	0.8826
	SPAL	0.8879	0.8788	0.8753	0.8750	0.8810	0.8859	0.8732	0.8863	0.8804
	RALF	0.8593	0.8642	0.8675	0.8675	0.8577	0.8797	0.8717	0.9096	0.8722
	ALF	0.8540	0.8686	0.8830	0.8890	0.8888	0.8881	0.8628	0.8825	0.8771
	ALIR	0.8761	0.8815	0.8835	0.8852	0.9105	0.9104	0.9094	0.9115	0.8960
Sonar	QUIRE	0.6190	0.6898	0.7203	0.7144	0.7525	0.7553	0.8194	0.8224	0.7366
	BMDR	0.7758	0.7602	0.8385	0.7732	0.7428	0.7428	0.7758	0.7933	0.7753
	SPAL	0.7401	0.7863	0.7706	0.8403	0.8403	0.8551	0.8394	0.8568	0.8161
	RALF	0.7088	0.7715	0.7863	0.7959	0.8150	0.8551	0.9047	0.9047	0.8178
	ALF	0.7401	0.7441	0.7332	0.7297	0.7715	0.8253	0.9221	0.9047	0.7963
	ALIR	0.7497	0.8403	0.8551	0.8725	0.9073	0.9221	0.9221	0.9221	0.8739
Page-blocks	QUIRE	0.7126	0.6850	0.5431	0.5443	0.5443	0.7012	0.7186	0.7084	0.6447
	BMDR	0.7201	0.6767	0.6574	0.7822	0.7747	0.7598	0.7822	0.7791	0.7415
	SPAL	0.7201	0.6289	0.7189	0.6357	0.7747	0.7598	0.7871	0.7903	0.7269
	RALF	0.7629	0.7977	0.7375	0.7617	0.7822	0.7958	0.7797	0.7729	0.7738

Table A3, continued
		Number of queries
Dataset		10	20	30	40	50	60	70	80	Mean
	ALF	0.7313	0.7673	0.7493	0.7282	0.7878	0.8039	0.8374	0.8256	0.7789
	ALIR	0.7282	0.7673	0.7859	0.7952	0.8138	0.8300	0.8343	0.8256	0.7975
Waveform	QUIRE	0.7571	0.7150	0.7763	0.6958	0.6633	0.6513	0.7342	0.7459	0.7174
	BMDR	0.7502	0.7890	0.7799	0.7805	0.7765	0.7918	0.7816	0.7816	0.7789
	SPAL	0.7836	0.7932	0.7706	0.7757	0.7715	0.7873	0.8006	0.7958	0.7848
	RALF	0.8436	0.8048	0.8048	0.8144	0.7785	0.8105	0.8292	0.8243	0.8138
	ALF	0.8214	0.7534	0.7624	0.7579	0.8130	0.8147	0.7850	0.8028	0.7888
	ALIR	0.8214	0.8108	0.8275	0.8456	0.8249	0.8342	0.8303	0.8204	0.8269
Ionosphere	QUIRE	0.7048	0.7118	0.7486	0.7549	0.7733	0.7936	0.8397	0.7910	0.7647
	BMDR	0.5609	0.5550	0.5912	0.6704	0.7544	0.7877	0.8014	0.8268	0.6935
	SPAL	0.6000	0.7789	0.8053	0.8053	0.8180	0.8199	0.8199	0.8375	0.7856
	RALF	0.6234	0.7348	0.7955	0.8112	0.8112	0.8493	0.8336	0.8121	0.7839
	ALF	0.6831	0.7193	0.7867	0.7916	0.8415	0.8415	0.8444	0.8600	0.7960
	ALIR	0.6743	0.7625	0.8219	0.8405	0.8581	0.8581	0.8659	0.8669	0.8185
Cancer	QUIRE	0.9121	0.9188	0.8749	0.9085	0.8854	0.8910	0.8968	0.9007	0.8985
	BMDR	0.9119	0.9198	0.9204	0.9200	0.9197	0.9198	0.9197	0.9207	0.9190
	SPAL	0.9054	0.9054	0.9198	0.9265	0.9197	0.9198	0.9204	0.9207	0.9172
	RALF	0.9264	0.8975	0.9141	0.9213	0.9213	0.9268	0.9268	0.9331	0.9209
	ALF	0.9125	0.9149	0.9031	0.9200	0.9279	0.9213	0.9204	0.9269	0.9184
	ALIR	0.9200	0.9332	0.9272	0.9327	0.9327	0.9395	0.9333	0.9396	0.9323

Table A4

Comparison of F1-score based on SVC

		Number of queries
Dataset		10	20	30	40	50	60	70	80	Mean
Austra	QUIRE	0.3618	0.3634	0.3799	0.3805	0.3744	0.3643	0.3789	0.3970	0.3750
	BMDR	0.4981	0.5835	0.4702	0.6147	0.5891	0.5755	0.5939	0.5684	0.5617
	SPAL	0.4981	0.4015	0.4702	0.5332	0.5835	0.5819	0.5763	0.4845	0.5162
	RALF	0.3910	0.4127	0.3991	0.4346	0.4638	0.4901	0.4869	0.4805	0.4448
	ALF	0.3910	0.4885	0.4997	0.4997	0.4502	0.4135	0.4478	0.4638	0.4568
	ALIR	0.5492	0.5763	0.6075	0.6202	0.6306	0.6258	0.6242	0.6250	0.6074
Biodeg	QUIRE	0.6185	0.6144	0.6157	0.6345	0.6032	0.4605	0.4368	0.4187	0.5503
	BMDR	0.7760	0.7867	0.7922	0.7859	0.7692	0.7692	0.7777	0.7868	0.7805
	SPAL	0.7652	0.7813	0.7706	0.7759	0.7833	0.7819	0.7944	0.7964	0.7811
	RALF	0.7803	0.7819	0.7797	0.7956	0.7819	0.7708	0.7879	0.7941	0.7840
	ALF	0.7878	0.7918	0.7961	0.7786	0.7780	0.7697	0.7737	0.7992	0.7844
	ALIR	0.8091	0.8108	0.8055	0.7984	0.8060	0.8151	0.8174	0.8029	0.8082
Ecoli	QUIRE	0.5237	0.5813	0.5653	0.5589	0.5589	0.4934	0.4934	0.5083	0.5354
	BMDR	0.4087	0.5641	0.5396	0.5474	0.5110	0.5118	0.4173	0.4182	0.4898
	SPAL	0.4359	0.4317	0.5396	0.5468	0.5295	0.5295	0.4106	0.4182	0.4802
	RALF	0.3640	0.4156	0.5304	0.6628	0.6527	0.7075	0.7691	0.7953	0.6122
	ALF	0.3574	0.5430	0.5455	0.5557	0.5877	0.7295	0.7927	0.7801	0.6115
	ALIR	0.4245	0.6392	0.7387	0.6839	0.7649	0.7649	0.7649	0.8282	0.7012
Page-blocks	QUIRE	0.3874	0.3903	0.3903	0.3918	0.4411	0.4426	0.4422	0.4500	0.4170
	BMDR	0.4382	0.4339	0.4603	0.4716	0.4808	0.4702	0.5033	0.5041	0.4703
	SPAL	0.4382	0.4328	0.4339	0.4504	0.4565	0.4565	0.4550	0.4583	0.4477
	RALF	0.4661	0.4722	0.4761	0.4716	0.4290	0.4315	0.4029	0.4029	0.4440
	ALF	0.4809	0.4681	0.4686	0.4725	0.4722	0.4718	0.4916	0.4916	0.4772
	ALIR	0.4547	0.4403	0.4608	0.4789	0.4683	0.5007	0.5018	0.5523	0.4822

Table A4, continued
		Number of queries
Dataset		10	20	30	40	50	60	70	80	Mean
Led7digit	QUIRE	0.5838	0.6363	0.6299	0.6312	0.6109	0.6170	0.6182	0.6182	0.6182
	BMDR	0.6224	0.6224	0.6576	0.6691	0.6640	0.6640	0.6599	0.6653	0.6531
	SPAL	0.6309	0.6201	0.6249	0.6255	0.6338	0.6338	0.6421	0.6449	0.6320
	RALF	0.6411	0.6811	0.6700	0.6691	0.6818	0.6802	0.6662	0.6567	0.6683
	ALF	0.6411	0.6633	0.6335	0.6681	0.6834	0.6865	0.6869	0.6732	0.6670
	ALIR	0.6351	0.6811	0.6367	0.6821	0.6935	0.6948	0.6948	0.6992	0.6772
Steel	QUIRE	0.5746	0.5717	0.5404	0.5364	0.4535	0.4661	0.4845	0.5303	0.5197
	BMDR	0.6405	0.6405	0.6169	0.6287	0.6618	0.6098	0.5979	0.6137	0.6262
	SPAL	0.6405	0.6350	0.5388	0.5388	0.5514	0.5135	0.5017	0.5490	0.5586
	RALF	0.7052	0.7123	0.6208	0.6807	0.6870	0.6334	0.7328	0.6981	0.6838
	ALF	0.6965	0.7052	0.6571	0.6492	0.6184	0.6697	0.6697	0.6697	0.6669
	ALIR	0.6531	0.7533	0.7675	0.7012	0.7399	0.7454	0.7296	0.7422	0.7290
Segment	QUIRE	0.9156	0.9142	0.9193	0.9182	0.9209	0.9176	0.9191	0.9205	0.9182
	BMDR	0.9059	0.9133	0.9125	0.9173	0.9192	0.9262	0.9314	0.9297	0.9194
	SPAL	0.9008	0.9022	0.9105	0.9183	0.9202	0.9223	0.9167	0.9182	0.9137
	RALF	0.9101	0.9211	0.9220	0.9232	0.9208	0.9179	0.9270	0.9277	0.9212
	ALF	0.9101	0.9097	0.9192	0.9247	0.9223	0.9162	0.9179	0.9257	0.9182
	ALIR	0.9238	0.9285	0.9261	0.9276	0.9267	0.9315	0.9327	0.9380	0.9294
Sonar	QUIRE	0.2624	0.5637	0.6781	0.5689	0.5970	0.4852	0.6372	0.6141	0.5508
	BMDR	0.5305	0.4415	0.6782	0.6940	0.7044	0.6933	0.7743	0.6798	0.6495
	SPAL	0.5503	0.5090	0.5924	0.6345	0.7068	0.6933	0.6551	0.6472	0.6236
	RALF	0.4177	0.3486	0.5837	0.6345	0.6750	0.6750	0.7052	0.7091	0.5936
	ALF	0.4177	0.3486	0.5448	0.6758	0.6480	0.6043	0.6123	0.6266	0.5598
	ALIR	0.6250	0.7496	0.7790	0.6258	0.7552	0.7393	0.7600	0.7933	0.7284
Credit	QUIRE	0.6068	0.6230	0.6429	0.6050	0.5909	0.6274	0.6346	0.6346	0.6207
	BMDR	0.5951	0.6120	0.6372	0.6334	0.6279	0.6717	0.6744	0.6821	0.6417
	SPAL	0.5261	0.5174	0.5272	0.5349	0.5349	0.5174	0.5803	0.6449	0.5479
	RALF	0.5032	0.5524	0.5715	0.6563	0.6481	0.6574	0.6574	0.6755	0.6152
	ALF	0.6077	0.6424	0.6536	0.6503	0.6536	0.6574	0.6574	0.6563	0.6473
	ALIR	0.5710	0.6476	0.6684	0.6635	0.6613	0.6778	0.6705	0.6705	0.6538
Waveform	QUIRE	0.9021	0.9021	0.8736	0.8795	0.8795	0.8737	0.8798	0.8737	0.8830
	BMDR	0.8207	0.8350	0.8485	0.8651	0.8767	0.8700	0.9038	0.9038	0.8655
	SPAL	0.8914	0.8203	0.8406	0.8651	0.8440	0.8974	0.8974	0.9038	0.8700
	RALF	0.8150	0.8794	0.8794	0.9098	0.9339	0.9339	0.9339	0.9207	0.9008
	ALF	0.8854	0.8982	0.9147	0.9204	0.9290	0.9290	0.9207	0.9335	0.9164
	ALIR	0.9098	0.9301	0.9411	0.9226	0.9347	0.9290	0.9222	0.9298	0.9274
Ionosphere	QUIRE	0.8679	0.8548	0.8659	0.8615	0.8802	0.8869	0.9192	0.9236	0.8825
	BMDR	0.8338	0.8169	0.8278	0.8642	0.8731	0.8790	0.9289	0.9567	0.8726
	SPAL	0.8147	0.8688	0.8688	0.8642	0.8680	0.8802	0.8938	0.8929	0.8689
	RALF	0.9014	0.9094	0.9166	0.8676	0.8922	0.8874	0.8925	0.8925	0.8950
	ALF	0.9014	0.9094	0.9010	0.8663	0.8764	0.8714	0.8769	0.8798	0.8853
	ALIR	0.9022	0.9428	0.9267	0.9174	0.9060	0.8993	0.8701	0.8756	0.9050
Cancer	QUIRE	0.8922	0.8922	0.8755	0.8701	0.8755	0.8866	0.8926	0.8980	0.8853
	BMDR	0.9215	0.9296	0.9355	0.9353	0.9353	0.9290	0.9290	0.9353	0.9313
	SPAL	0.9215	0.9105	0.9247	0.9277	0.9277	0.9277	0.9338	0.9307	0.9255
	RALF	0.9072	0.9199	0.9297	0.9419	0.9391	0.9422	0.9480	0.9393	0.9334
	ALF	0.9238	0.9266	0.9301	0.9268	0.9391	0.9260	0.9445	0.9456	0.9328
	ALIR	0.9251	0.9325	0.9361	0.9421	0.9422	0.9521	0.9579	0.9579	0.9432

Figure A1.

Comparison of AUC based on DT.

Figure A2.

Comparison of AUC based on SVC.

References

Ebert

Fritz

and Schiele

, Ralf: A reinforced active learning formulation for object class recognition, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 3626–3633.

Huang

Jin

and Zhou

, Active learning by querying informative and representative examples, IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (2014), 1936–1949.

Bappy

J.H.

Paul

Tuncel

and Amit K

R.C.

, The impact of typicality for informative representative selection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2017, pp. 5878–5887.

Wang

Zhang

Liu

Shen

and Tao

, Exploring representativeness and informativeness for active learning, IEEE Transactions on Cybernetics 47 (2017), 14–24.

Donmez

Carbonell

J.G.

and Bennett

P.N.

, Dual strategy active learning, in: European Conference on Machine Learning, Springer, 2007, pp. 116–127.

Zhou

, A brief introduction to weakly supervised learning, National Science Review 5 (2018), 44–53.

Cebron

and Berthold

M.R.

, Active learning for object classification: From exploration to exploitation, Data Mining and Knowledge Discovery 2 (2009), 283–299.

, Pool-based sequential active learning for regression, IEEE Transactions on Neural Networks and Learning Systems 5 (2019), 1348–1359.

Shao

Huang

Liu

and Zhang

, Querying representative and informative super-pixels for filament segmentation in bioimages, IEEE/ACM Transactions on Computational Biology and Bioinformatics 4 (2020), 1394–1405.

10.

Jin

and Chiu

, Active learning combining uncertainty and diversity for multi-class image classification, IET Computer Vision 3 (2015), 400–407.

11.

Settles

, Active learning literature survey, in: University of Wisconsin-Madison Department of Computer Sciences, 2009.

12.

Fang

and Cohn

, Learning how to active learn: A deep reinforcement learning approach, 2017.

13.

Cheng

Chen

Liu

Wang

Agrawal

and Choudhary

, Feedback-driven multiclass active learning for data streams, in: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, 2013, pp. 1311–1320.

14.

Osugi

Kim

and Scott

, Balancing exploration and exploitation: A new algorithm for active machine learning, in: Fifth IEEE International Conference on Data Mining (ICDM’05), IEEE, 2005.

15.

Freund

Seung

H.S.

Shamir

and Tishby

, Selective sampling using the query by committee algorithm, Machine Learning 2 (1997), 133–168.

16.

Balcan

M.F.

Broder

and Zhang

, Margin based active learning, in: International Conference on Computational Learning Theory, Springer, 2007, pp. 35–50.

17.

Chattopadhyay

Wang

Fan

Davidson

Panchanathan

and Ye

, Batch mode active sampling based on marginal probability distribution matching, ACM Transactions on Knowledge Discovery from Data (TKDD) 3 (2013), 1–25.

18.

Holub

Perona

and Burl

M.C.

, Entropy-based active learning for object recognition, in: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, 2008, pp. 1–8.

19.

Varadarajan

Deng

and Acero

, Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion, Computer Speech & Language 3 (2010), 433–444.

20.

Wang

Hwang

J.N.

Rose

and Wallace

, Uncertainty-based active learning via sparse modeling for image classification, IEEE Transactions on Image Processing 1 (2018), 316–329.

21.

Zhu

and Li

, A survey on instance selection for active learning, Knowledge and Information Systems 2 (2013), 249–283.

22.

Konyushkova

Raphael

and Fua

, Learning active learning from data, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 4228–4238.

23.

Zhang

and Zhou

, Adaptive online learning in dynamic environments, in: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 1330–1340.

24.

Wang

Huang

and Zhou

, Towards Identifying Causal Relation Between Instances and Labels, in: Proceedings of the 2019 SIAM International Conference on Data Mining, SIAM, 2019, pp. 289–297.

25.

Nguyen

H.T.

and Smeulders

, Active learning using pre-clustering, in: Proceedings of the Twenty-First International Conference on Machine Learning, 2004, p. 79.

26.

Flaherty

Arkin

and Jordan

M.I.

, Robust design of biological experiments, in: Advances in Neural Information Processing Systems, 2006, pp. 363–370.

27.

Wang

and Zhang

, Multi-class active learning: A hybrid informative and representative criterion inspired approach, in: 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, pp. 1510–1517.

28.

Wang

and Ye

, Querying discriminative and representative samples for batch mode active learning, in: ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 3, 2015, pp. 1–23.

29.

Tang

and Huang

, Self-paced active learning: Query the right thing at the right time, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 1, 2019, pp. 5117–5124.

30.

Karlos

Aridas

Kanas

V.G.

and Kotsiantis

, Classification of acoustical signals by combining active learning strategies with semi-supervised learning schemes, Neural Computing and Applications, Springer, 2021, 1–18.

31.

Liu

and Wu

, Integrating Informativeness, Representativeness and Diversity in Pool-Based Sequential Active Learning for Regression, in: 2020 International Joint Conference on Neural Networks (IJCNN), IEEE, 2020, pp. 1–7.

32.

Wang

Zhang

and Lin

, Cost-effective active learning for deep image classification, IEEE Transactions on Circuits and Systems for Video Technology 12 (2016), 2591–2600.

33.

Xiong

Jiao

Mao

and Zhang

, Active learning based on coupled KNN pseudo pruning, Neural Computing and Applications 7 (2012), 1669–1686.

34.

http://archive.ics.uci.edu/ml/datasets.php, UCI machine learning repository.

35.

https://sci2s.ugr.es/keel/category.php?cat=clas, KEEL repository.

An adaptive active learning algorithm with informativeness and representativeness

Abstract

Keywords

1. Introduction

3.1 Informativeness criteria

4.1 ALF

5.1 Datasets

Table 1 The datasets information, including the number of corresponding features, labels, and instances

5.2.1 Setting

5.2.2 Comparison with state-of-the-art methods

Table 2 Comparison of average classification accuracy, the best performance for each dataset is shown in the gray shadow, the best result for each combination of each dataset is highlighted in boldface

5.3.2 Comparison with state-of-the-art methods

5.4 Application on ALIR

5.4.1 Datasets of practical application

5.4.2 Setting

5.4.3 Comparison with state-of-the-art methods

Footnotes

Acknowledgments

Appendix

Experiments for ALF based on DT and SVC

Experiments for ALIR based on DT and SVC

References

Table 1
The datasets information, including the number of corresponding features, labels, and instances

Table 2
Comparison of average classification accuracy, the best performance for each dataset is shown in the gray shadow, the best result for each combination of each dataset is highlighted in boldface