Active ordinal classification by querying relative information

Abstract

Collecting and learning with auxiliary information is a way to further reduce the labeling cost of active learning. This paper studies the problem of active learning for ordinal classification by querying low-cost relative information (instance-pair relation information) through pairwise queries. Two challenges in this study that arise are how to train an ordinal classifier with absolute information (labeled data) and relative information simultaneously and how to select appropriate query pairs for querying. To solve the first problem, we convert the absolute and relative information into the class interval-labeled training instances form by introducing a class interval concept and two reasoning rules. Then, we design a new ordinal classification model for learning with the class interval-labeled training instances. For query pair selection, we specify that each query pair consists of an unlabeled instance and a labeled instance. The unlabeled instance is selected by a margin-based critical instance selection method, and the corresponding labeled instance is selected based on an expected cost minimization strategy. Extensive experiments on twelve public datasets validate that the proposed method is superior to the state-of-the-art methods.

Keywords

Active learning ordinal classification pairwise query relative information

1. Introduction

Ordinal classification (OC), also known as ordinal regression [1], is a multi-class supervised learning problem where the target variables exhibit a natural semantic total ordering. OC uses machine learning techniques to recognize human ordinal semantics and is a valuable study in many real applications, such as medical diagnosis [2, 3], credit risk prediction [4], and so on. As a supervised learning problem, ordinal classification requires a sufficient amount of labeled instances to train a model or extract rules. However, the acquisition of labels is expensive and time-consuming in real applications due to the dependence on domain knowledge and human experts. In this situation, active learning (AL) [5, 6, 7] that reduces the labeling cost by labeling the most informative instances is an economical way to induce an accurate ordinal classifier. However, if the available query budget is meager, the traditional AL methods are far from satisfactory for training a promising ordinal classifier.

Learning with auxiliary information has drawn increasing attention in active learning research since it can further reduce the labeling cost [8, 9, 1]. In this paper, we study the problem of active learning for ordinal classification (active ordinal classification) with pairwise queries by considering that human experts can provide low-cost relative information. In the considered scenario, the expert is asked “what is the relation between instances $\mathbf{x}_{i}$ and $\mathbf{x}_{j}$ ?”. The feedbacks are as follows: “ $\mathbf{x}_{i}\prec_{c}\mathbf{x}_{j}$ ”, which means the class of instance $\mathbf{x}_{i}$ is of a higher level than the one of instance $\mathbf{x}_{j}$ ; “ $\mathbf{x}_{i}=_{c}\mathbf{x}_{j}$ ”, which means the classes of instances $\mathbf{x}_{i}$ and $\mathbf{x}_{j}$ are at the same level. In an ordinal classification scenario, if the expert knows two instances do not belong to the same class, it is natural for the expert to know the ordinal relationship between them. Recently, many works have incorporated the instance-pair relation information into machine learning and active learning [8, 10, 11, 1, 12, 13]. Fu et al. [8] first introduce the pairwise query paradigm into the active learning for binary classification to reduce labeling costs. Tang et al. [1] have noted that collecting relative information is significantly easier than gathering explicit label information since novices and non-expert annotators can do this work. They have shown that collecting additional relative information improves the ordinal classification performance of their model. The above works motivate this paper’s novel study of active ordinal classification by querying relative information (instance-pair relation information).

Although the cost of collecting relative information is much lower than that of obtaining explicit label information, it is not negligible in most cases. Therefore, the core issue in this paper is how to effectively use a limited pairwise query budget to obtain a promising ordinal classifier. Consequently, two challenging issues arise, one is to select suitable query pairs, and the other is to train an ordinal classification model using relative information.

In the considered active learning scenario, a small set of labeled instances exists initially and one can obtain relative information by providing query pairs to experts. Therefore, the base learner in our active learner needs to have the ability to learn with labeled instances and relative information simultaneously. To this end, we introduce a concept of class interval and two reasoning rules. Based on these, the labeled instances and the relative information can be converted into a unified training instance form, such as $\langle\mathbf{x},[\mathcal{C}_{i},\mathcal{C}_{j}]\rangle$ , where $[\mathcal{C}_{i},\mathcal{C}_{j}]$ is a class interval, $\mathcal{C}_{i}$ and $\mathcal{C}_{j}$ are the $i$ -th and $j$ -th class, and has $i\leqslant j$ . Then, we construct an OC model based on a reduction-based ordinal classification framework [14] to learn with the instances with class-interval labels.

Intuitively, the smaller the class interval of a training instance, the greater its contribution to model training. Therefore, query pair building is a nontrivial task. We find that when a query pair is built with an unlabeled instance and a labeled instance, we can have a chance to reduce this unlabeled instance’s class interval based on the relative information. Moreover, an appropriate combination can lead to a substantial class interval reduction for the unlabeled instance in the query pair. When the class interval of an unlabeled instance reduces to the minimum, we thus obtain its explicit label, i.e., the absolute information. To maximize the utility of each pairwise query, we let one of the instances in each query pair be an informative unlabeled instance (critical instance) and the other a carefully selected labeled instance.

For query pair building, we introduce a margin-based critical instance selection method to provide an unlabeled instance for each query pair. This method prefers to select the unlabeled instance closest to the nearest decision boundary. To determine the labeled instance in a query pair, we shall determine which class of labeled instance is most appropriate to pair with the current critical instance. If the critical instance and its paired labeled instance have an identical class label, we can obtain the critical instance’s label through a one-time pairwise query. But, this case does not invariably occur. We aim to reduce the critical instance’s class interval as much as possible through each pairwise query, thus obtaining more labeled instances with limited pairwise queries. To this end, we design a labeled instance selection method from the perspective of expected cost minimization.

For the sake of brevity, the main contributions of this paper are summarized as follows.

To our knowledge, this paper is the first active learning study for ordinal classification by querying relative information.

We design a new method to learn an ordinal classifier based on absolute and relative information simultaneously.

We design an effective query pair selection method based on a margin-based critical instance selection method and an expected cost minimization-based labeled instance selection method. This query pair selection method helps to quickly reduce the class interval of each critical instance through pairwise queries, resulting in more explicit labels under a given pairwise query budget.

Extensive experiments show that the proposed method can utilize pairwise query resources more efficiently than to the state-of-the-art methods, resulting in a better ordinal classifier.

The remainder of this paper is structured as follows. Section 2 reviews the related works. The technical details of the proposed pairwise query active learning method for ordinal classification are described in Section 3. In Section 4, we comparatively study the performance of our method and report the experimental results. Following this, a brief conclusion is drawn in Section 5.

2. Related work

This section briefly reviews the related works from the aspects of active learning and the ordinal classification by using absolute information and relative information. Moreover, a reduction-based ordinal classification framework [14] as preparatory knowledge is recalled in this section.

2.1 Active learning

Active learning aims to induce a prediction model while minimizing the labeling cost through interactive instance selection and label inquiry. It is suitable for machine learning scenarios where a large amount of unlabeled data are available or easy to collect, but label acquisition is expensive. The fundamental issue of active learning is to determine which instances would be most valuable if it is labeled and used as training instances [15]. Informativeness and representativeness are two primary considerations for assessing the value of an unlabeled instance. Active learning strategies focusing on the informativeness of unlabeled instances include uncertainty sampling [5, 16, 17], query-by-committee [18, 19], expected change [20, 21], and so on. Active learning strategies that concentrate on the representativeness of unlabeled instances prefer to select the instances that can represent the data distribution. The commonly used methods of this type include clustering-based AL methods [22, 23], experimental designs [24, 25], and so on. Combining multiple assessment criteria for active instance selection is usually encouraged in the active learning community and often results in robust active learning performance.

Although great progress has been made in active learning research on classification, little effort has been devoted to the active learning problem for ordinal classification. Soons and Feelders [26] first proposed an active learning method for ordinal classification by exploiting the monotonicity constraints in the data. However, this work is only suitable for monotonic ordinal classification data [27] and cannot scale up to the general ordinal classification data. Recently, Li et al. [6] designed an active ordinal classification method based on the adjacent category logistic model and an A-optimal experimental design criterion. The major shortcoming of this method is that it has to calculate the inverse of a large matrix in each iteration for each unlabeled instance, which may limit its usability in practice. In the study of the imbalanced ordinal classification problem, Ge et al. [10] proposed an uncertainty sampling-based active learning method to achieve the minority class oversampling in the ordinal data. This method labels the instance with the smallest distance from its nearest decision boundary. Although this method may suffer from sampling redundancy, it generally performs satisfactorily. To our knowledge, the above two methods are the only two active learning methods for the general ordinal classification. In practice, obtaining labels directly from human experts is often expensive or time-consuming. When the query budget is insufficient, collecting sufficient training instances using the aforementioned pointwise query AL methods is unaffordable.

In recent years, active learning by soliciting low-cost instance-pair relation information has attracted more attention. Nevertheless, most of those studies concentrate on active clustering [28] and active learning for ranking [29]. Since there are essential differences between learning for ranking and ordinal classification [30], the active ranking methods cannot be used for active ordinal classification. Fu et al. [8] first developed a pairwise query active learning method by asking the annotator “whether instances $\mathbf{x}_{i}$ and $\mathbf{x}_{j}$ belong to the same class?”. However, this method is only suitable for binary classification problems and cannot immediately extend to multiclass problems. Chien et al. [31] presented a hypergraph-based pairwise query active learning method. Although this method is compatible with multiclass classification, it needs to pair the critical instance with the labeled instances of different classes, which costs $\mathcal{O}(K-1)$ times pairwise queries to obtain one explicit label. Here, the $K$ is the number of classes. In addition, this method is computationally expensive due to the breadth-first searching for the shortest path in a hypergraph. So far, no research has been devoted to pairwise query active learning for ordinal classification. In addition, the existing pairwise query AL methods are unsuitable for solving this problem. The above situation motivates us to carry out the new study in this paper.

2.2 Ordinal classification with absolute and relative information

It is believed that non-expert annotators or novices can provide low-cost relative information, and the cost is much lower than that of asking experts for absolute information. Therefore, To reduce the labeling cost of ordinal classification problems, researchers have recently constructed various new ordinal classification models to learn with absolute and low-cost relative information simultaneously. Tang et al. [1] proposed a nearest neighbors-based ordinal classification model by fusing absolute and relative information. This method was further extended based on a distance metric learning technique in [32]. However, the performance of these methods is heavily dependent on the number of labeled instances. If there are not enough labeled instances, these methods usually perform unsatisfactorily. Therefore, these methods are unsuitable for active learning scenarios where only a small number of labeled samples exist. Sader et al. [12] proposed a novel proportional odds model, where the relative information is formulated as the constraint component in the loss function. Subsequently, Tang et al. [13] extended this scheme to several existing ordinal classification models and conducted a comparative study. The comparative study shows that the methods SVLORR and LDLORR generally perform better. However, one apparent shortcoming of these methods is that they were only suitable for linear separable ordinal data and cannot be immediately extended to nonlinear methods. In this study, we overcome this limitation. The proposed method is suitable for nonlinear data by leveraging the kernel trick.

Although the above works paid much effort into designing ordinal classification models for learning with absolute and relative information simultaneously, they failed to consider how to select valuable query pairs. When requesting relative information, these models randomly select query pairs from the unlabeled pool for querying, which leads to inefficient use of pairwise query resources. On the one hand, these methods do not guarantee that the instances in the query pair are informative. On the other hand, when the two instances in a query pair belong to an identical class, one cannot obtain any valuable information. All the above issues have been addressed in this paper. This paper builds each query pair with an informative unlabeled instance and a carefully selected labeled instance. In this setup, we can obtain useful information from the annotator regardless of whether the two instances in a query pair belong to an identical class. In particular, the proposed method can produce labeled instances during pairwise query active learning.

2.3 Reduction-based ordinal classification framework

To solve the ordinal classification problem more effectively, Li and Lin [14] have proposed a reduction framework to reduce an ordinal classification problem to a binary classification problem and gave a theoretical explanation. The model based on this framework is essentially a threshold-based OC model. We instantiate an OC model in this paper based on this reduction-based framework to learn with labeled instances and instance-pair relation information simultaneously. Therefore, in what follows, we will recall this framework’s formulations.

To predict the class label of an instance $\mathbf{x}$ , the threshold-based ordinal classification model typically learns $K-1$ orderd thresholds: $\theta_{1}<\theta_{2}<\cdots<\theta_{K-1}$ , where $\theta_{0}=-\infty$ and $\theta_{K}=+\infty$ are typically assumed. Thus, the instance $\mathbf{x}$ is classified as $\mathcal{C}_{k}$ in the case that the predictive output $h(\mathbf{x})=\mathbf{w}^{T}\mathbf{x}$ falls in the range of $\theta_{k-1}<h(\mathbf{x})\leqslant\theta_{k}$ , where $\mathbf{w}\in\mathbb{R}^{d}$ , $\mathbf{w}^{T}\mathbf{x}$ is the inner product of $\mathbf{x}$ and $\mathbf{w}$ .

For each original training instance $\left\langle\mathbf{x}_{i},y_{i}\right\rangle$ , the reduction framework extends it into $K-1$ binary training instances by the following rule:

$\displaystyle\langle\mathbf{x}_{i}^{k},y_{i}^{k}\rangle,\quad k=\{1,\cdots,K-1\},$ $\displaystyle\mathbf{x}_{i}^{k}=(\mathbf{x}_{i},\mathbf{e}_{k})\in\mathbb{R}^{% d+K-1},$ (1) $\displaystyle y_{i}^{k}=1-2I[y_{i}\prec\mathcal{C}_{k+1}],$

where $\mathbf{e}_{k}\in\mathbf{R}^{K-1}$ denotes a vector with the $k$ -th element as value 1 and the rest of the elements are all 0, and $I[\cdot]$ is an indicator function that returns 1 if the inside condition holds; otherwise, a zero is returned. Thus, for each extended instance $\mathbf{x}_{i}^{k}$ , there is a binary class label $y_{i}^{k}\in\{-1,1\}$ associated with it.

The weight vector in the extended binary classification problem

$\displaystyle\bar{\mathbf{w}}=(\mathbf{w},-\theta)\in\mathbb{R}^{d+K-1}$ (2)

can be learned to predict the output of $\mathbf{x}_{i}^{k}$ , such that $g(\mathbf{x}_{i}^{k})=(\mathbf{w},-\theta)^{T}\mathbf{x}_{i}^{k}=\mathbf{w}^{T% }\mathbf{x}_{i}-\theta_{k}=h(\mathbf{x}_{i})-\theta_{k}$ , where $\theta=[\theta_{1},\cdots,\theta_{K-1}]$ . Therefore, the $g(\mathbf{x}_{i}^{k})$ can be explained as the distance from $\mathbf{x}_{i}$ to the $k$ -th threshold. Whereafter, the predictive ordinal label $\hat{y_{i}}=\mathcal{C}_{l}$ of instance $\mathbf{x}_{i}$ can be obtained with the $l$ computed as

$\displaystyle l=1+\sum_{k=1}^{K-1}I[g(\mathbf{x}_{i}^{k})>0]$ (3)

For an unlabeled instance $\mathbf{x}_{i}$ , its cumulative probabilities can be computed as [33]

$\displaystyle P(y_{i}\prec\mathcal{C}_{k+1}|\mathbf{x}_{i})=\frac{1}{1+\exp(-(% \theta_{k}-\mathbf{w}^{T}\mathbf{x}_{i}))}$ $\displaystyle\quad=\frac{1}{1+\exp(\bar{\mathbf{w}}^{T}\phi(\mathbf{x}_{i}^{k}% ))},$ (4) $\displaystyle k=1,\cdots,K-1,$

where $K$ is the number of classes. Consequently, we can obtain the posterior probability estimate of $\mathbf{x}_{i}$ by decomposing the cumulative probabilities as follows

$\displaystyle P(y_{i}=\mathcal{C}_{k}|\mathbf{x}_{i})=\left\{\begin{array}[]{% ll}P(y_{i}\prec\mathcal{C}_{k+1}|\mathbf{x}_{i}),&k=1\\ P(y_{i}\prec\mathcal{C}_{k+1}|\mathbf{x}_{i})-P(y_{i}\prec\mathcal{C}_{k}|% \mathbf{x}_{i}),&1<k<K\\ 1-P(y_{i}\prec\mathcal{C}_{k}|\mathbf{x}_{i}),&k=K\\ \end{array}\right..$ (5)

Note that the function $g(\mathbf{x})$ in Eq. (3) can be any of the existing well-established binary prediction models [14], such as support vector machine, logistic regression, and so on. In Section 3, we instantiate this reduction framework with the kernel extreme learning machine [34] and introduce how to train the new OC model using labeled instances and relative information.

3. Active ordinal classification with pairwise queries

This section provides the technical detail of the proposed pairwise query active ordinal classification method. The problem setting and method overview are first introduced in Subsection 3.1 to facilitate the descriptions. Then, the base learner and the query pair selection method are successively presented in the subsequent subsections.

3.1 Problem setting and method overview

In the considered scenario, let $\mathcal{T}=\{\langle\mathbf{x}_{i},y_{i}\rangle\}^{n}_{i=1}$ be the initial training set (also referred to as seed set) where each instance $\mathbf{x}_{i}\in\mathcal{L}\subseteq\mathbb{R}^{d}$ is a feature vector and associated with a label $y_{i}\in\mathcal{Y}=\{\mathcal{C}_{1},\mathcal{C}_{2},\cdots,\mathcal{C}_{K}\}$ . There is a total ordering among the classes, such as $\mathcal{C}_{1}\prec\mathcal{C}_{2}\prec\cdots\prec\mathcal{C}_{K}$ , where the notation “ $\prec$ ” indicates a certain ordering relation or grading relation. Let $\mathcal{U}=\{\mathbf{x}_{j}\}_{j=n+1}^{N}$ be the unlabeld pool, where $n\ll N$ . Thereinafter, if not specified, $K$ indicates the number of classes.

Instead of directly querying the instance label in a traditional active learning way, we consider a pairwise query setting in this paper. In each pairwise query, when a query pair $(\mathbf{x}_{i},\mathbf{x}_{j})$ is determined, the annotator is inquired “what is the relation between instances $\mathbf{x}_{i}$ and $\mathbf{x}_{j}$ ?”. The feedbacks of the annotator are as follows:

“ $\mathbf{x}_{i}\prec_{c}\mathbf{x}_{j}$ ”, which means the class of $\mathbf{x}_{i}$ is of a higher level than that of $\mathbf{x}_{j}$ . Without loss of generality, if the class of $\mathbf{x}_{j}$ is of a higher level than that of $\mathbf{x}_{i}$ , the feedback will be “ $\mathbf{x}_{j}\prec_{c}\mathbf{x}_{i}$ ”.

“ $\mathbf{x}_{i}=_{c}\mathbf{x}_{j}$ ” or “ $\mathbf{x}_{j}=_{c}\mathbf{x}_{i}$ ”, which means the classes of $\mathbf{x}_{i}$ and $\mathbf{x}_{j}$ are at the same level.

In order to use the labeled instance and the instance-pair relation information simultaneously in the pairwise query active learning process, we introduce a class interval concept and two reasoning rules.

(Class Interval).

Given an instance $\mathbf{x}_{i}$ , we define there is a class interval $I(\mathbf{x}_{i})=[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]$ associates with it, where $L_{i},R_{i}\in\mathbb{N}+$ , and has $1\leqslant L_{i}\leqslant R_{i}\leqslant K$ . It is only known that the real label $y_{i}=\mathcal{C}_{l}$ of $\mathbf{x}_{i}$ lies in the class interval $[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]$ , i.e., $L_{i}\leqslant l\leqslant R_{i}$ . The size of the class interval of $\mathbf{x}_{i}$ is defind as $IS(\mathbf{x}_{i})=R_{i}-L_{i}+1$ .

Without loss of generality, the class interval of an unlabeled instance $\mathbf{x}_{i}$ is $[\mathcal{C}_{1},\mathcal{C}_{K}]$ at the initial time. If the class interval size of an instance is larger than 1, this instance is essentially still an unlabeled instance. Intuitively, an unlabeled instance belongs to each class within its class interval with an identical probability if no additional information exists. Given a labeled instance $\langle\mathbf{x}_{i},\mathcal{C}_{k}\rangle$ , we say $\mathbf{x}_{i}$ corresponds to a narrow class interval $[\mathcal{C}_{k},\mathcal{C}_{k}]$ . Therefore, we can obtain an explicit label once the class interval size of an unlabeled instance is reduced to 1. Based on the concept of class interval, two reasoning rules are designed as follows.

.

In the case that the relative information is “ $\mathbf{x}_{i}\prec_{c}\mathbf{x}_{j}$ ”, according to the ordinal constraint, the class interval of $\mathbf{x}_{i}$ can be inferred as:

$\displaystyle I(\mathbf{x}_{i})=[\mathcal{C}_{L_{i}},\min\{\mathcal{C}_{R_{i}}% ,\mathcal{C}_{R_{j}-1}\}];$ (6)

and the class interval of $\mathbf{x}_{j}$ can be inferred as:

$\displaystyle I(\mathbf{x}_{j})=[\max\{\mathcal{C}_{L_{i}+1},\mathcal{C}_{L_{j% }}\},\mathcal{C}_{R_{j}}].$ (7)

.

In the case that the relative information is “ $\mathbf{x}_{i}=_{c}\mathbf{x}_{j}$ ”, the class intervals of $\mathbf{x}_{i}$ and $\mathbf{x}_{j}$ can be inferred as:

$\displaystyle I(\mathbf{x}_{i})=I(\mathbf{x}_{j})=[\max\{\mathcal{C}_{L_{i}},% \mathcal{C}_{L_{j}}\},\min\{\mathcal{C}_{R_{i}},\mathcal{C}_{R_{j}}\}].$ (8)

Based on the above two reasoning rules, we can update the class interval of the critical instance in the query pair after information acquisition from the annotator. Accordingly, we can convert the labeled instances and the relative information into the class interval-labeled form, i.e., $\langle\mathbf{x}_{i},[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]\rangle$ . We call the training instance of this form the class interval-labeled training instance. The instance that its class interval size is smaller than $K$ can be used to train the base learner. Thus, we can construct a interval-labeled training set $\mathcal{L}_{I}=\{\langle\mathbf{x}_{i},[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i% }}]\rangle\}_{i=1}^{m}$ , $n\leqslant m$ . For any unlabeled instance, we can always obtain its explicit label through a few pairwise queries and reasonings. Therefore, the goal of our method is to obtain more labeled instances with the least number of pairwise queries, thus producing a promising ordinal classifier.

In our method, each query pair is built with an unlabeled instance and a labeled instance. Such that we can obtain useful information for model training from each pairwise query and reasoning, and we have a chance to infer the critical instance’s explicit label with a few pairwise queries. If the two instances in the query pair are all unlabeled, such as $(\mathbf{x}_{i},\mathbf{x}_{j})$ , where $\mathbf{x}_{i}\in\mathcal{U}$ and $\mathbf{x}_{j}\in\mathcal{U}$ . It is just when the intersection size of the class intervals of the two instances is less than or equal to 2 that there is a chance to obtain the explicit label. This usually requires a large number of pairwise queries. Compared to the latter, the former query pair building form can obtain more explicit labels with a given pairwise query budget, facilitating the learning of a good ordinal classifier.

To visually show the difference between pointwise query active learning and pairwise query active learning, we depict the traditional pointwise query active learning framework and the proposed pairwise query active learning framework in Fig. 1a and b, respectively. In the traditional pointwise query active learning, the annotator must provide a reliable label for each query instance. In such a situation, the label acquisition usually depends on experts with domain knowledge or even rigorous physical or chemical testing. In the proposed pairwise query active learning, we can induce an ordinal classifier by interactively asking for low-cost relative information. Since it is not difficult to compare two instances with each other in the context of ordinal classification, non-experts or novices can provide relative information [1]. Therefore, a non-expert or novice can be employed as an annotator.

Figure 1.

Active ordinal classification frameworks with pointwise queries and pairwise queries.

We summarize the algorithmic procedure of the proposed method in Algorithm 1. In the algorithm, lines 1 and 2 correspond to the initialization stage, where the initial training instances in $\mathcal{T}$ are converted into the class interval-labeled training instances. Based on the class interval-labeled training set $\mathcal{L}_{I}$ , the base learner $\mathcal{M}$ is induced. The base learner will be presented in Section 3.2. In each iteration, the algorithm selects one critical unlabeled instance and joins it into $\mathcal{L}_{I}$ ; this corresponds to line 4. The critical unlabeled instances selection method will be introduced in Section 3.3.1. In each iteration, for each class interval-labeled instance, if its class interval size is larger than 1, we will pair a labeled instance with it to build a query pair for querying. The labeled instance selection method will be introduced in Section 3.3.2. After the annotator feeds back the relative information, we will update the class interval of the unlabeled instance and retrain the base learner. The above operations correspond to line 5 to line 15. The algorithm is terminated when the pairwise query budget is exhausted.

[h] : Active Ordinal Classification Based on Pairwise Queries (AOCpair).[1] Seed set $\mathcal{T}$ , unlabeled set $\mathcal{U}$ , pairwise query budget $b$ , base learner $\mathcal{M}$ . The trained base learner $\mathcal{M}$ , the extened labeled instance set $\mathcal{T}$ . $B\leftarrow b$ ; $\mathcal{L}_{I}\leftarrow\emptyset$ ; Convert the training instances in $\mathcal{T}$ into the class interval-labeled instances and join them into $\mathcal{L}_{I}$ ; Train $\mathcal{M}$ with $\mathcal{L}_{I}$ ; $B>0$ Select a critical instance $\mathbf{x}^{\ast}\in\mathcal{U}$ ; $\mathcal{L}_{I}\leftarrow\mathcal{L}_{I}\cup\{\langle\mathbf{x}^{\ast},[% \mathcal{C}_{1},\mathcal{C}_{K}]\rangle\}$ ; $i\leftarrow 1$ ; $B>0$ and $i\leqslant|\mathcal{L}_{I}|$ $IS(\mathbf{x}_{i})>1$ Build a query pair $(\mathbf{x}_{i},\mathbf{x}_{j})$ ; $//$ $\langle\mathbf{x}_{i},[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]\rangle\in% \mathcal{L}_{I}$ , $\langle\mathbf{x}_{j},y_{j}\rangle\in\mathcal{T}$ Inquire the instance-pair relation information about $(\mathbf{x}_{i},\mathbf{x}_{j})$ ; Update $\mathbf{x}_{i}$ ’s class interval according to the reasoning rules; $B\leftarrow B-1$ ; $IS(\mathbf{x}_{i})=1$ $\mathcal{T}\leftarrow\mathcal{T}\cup\{\langle\mathbf{x}_{i},y_{i}\rangle\}$ ; $\mathcal{U}\leftarrow\mathcal{U}/\{\mathbf{x}_{i}\}$ ; Retrain $\mathcal{M}$ with $\mathcal{L}_{I}$ ; $i\leftarrow i+1$ ;

3.2 Ordinal classification based on class interval-labeled instances

Before introducing how to use the class interval-labeled instances to train an OC model, we first instantiate an OC model based on the reduction-based framework [14] introduced in Section 2.3. As mentioned in Section 2.3, the $g(\mathbf{x})$ can be any well-developed binary prediction model. Therefore, we employ a kernel extreme learning machine (KELM) model to instantiate the reduction-based ordinal classification framework. The KELM model is simple to implement and has comparable performance to the SVM model [34].

An extreme learning machine is a generalized single-hidden-layer feedforward neural network where the hidden layer need not be tuned, and we can formulate it as

$\displaystyle g(\mathbf{x})=\mathbf{h}(\mathbf{x})\beta=\mathbf{h}(\mathbf{x})% \mathbf{H}^{T}\left(\frac{1}{C}\mathbf{I}+\mathbf{H}\mathbf{H}^{T}\right)^{-1}% \mathbf{T}$ (9)

where the $\beta=[\beta_{1},\cdots,\beta_{L}]^{T}$ is the vector of the output weights between the hidden layer of $L$ nodes and the output node, $\mathbf{h}(\mathbf{x})=[h_{1}(\mathbf{x}),\cdots,h_{L}(\mathbf{x})]$ is the output vector of the hidden layer about input $\mathbf{x}$ , $\mathbf{H}=[\mathbf{h}(\mathbf{x}_{1}),\cdots,\mathbf{h}(\mathbf{x}_{n})]^{T}$ , $\mathbf{I}$ is an identity matrix, and $\mathbf{T}=[y_{1},\cdots,y_{n}]^{T}$ . For binary classification, the decision function is

$\displaystyle\hat{y}=\textit{sign}(\mathbf{h}(\mathbf{x})\beta)$ (10)

Following the work in [34], the kernel-based extreme learning machine model takes the hidden layer as an unknown feature mapping $\mathbf{h}(\mathbf{x})$ from the input space $\mathbb{R}^{d}$ to a feature space. Thus, the Mercer’s condition can be applied and a kernel matrix can be defined as $\mathbf{K}=\mathbf{H}\mathbf{H}^{T}$ , and has $\mathbf{h}(\mathbf{x}_{i})\mathbf{h}(\mathbf{x}_{j})=\mathcal{K}(\mathbf{x}_{i% },\mathbf{x}_{j})$ , where $\mathcal{K}(\mathbf{x}_{i},\mathbf{x}_{j})$ is a kernel function. Then, the output weight vector $\beta$ can be computed as

$\displaystyle\beta=\left(\frac{1}{C}\mathbf{I}+\mathbf{K}\right)^{-1}\mathbf{T},$ (11)

and the predictive output can be formalized as

$\displaystyle g(\mathbf{x})=\sum\limits_{i=1}^{n}\beta_{i}\mathcal{K}(\mathbf{% x}_{i},\mathbf{x})=\mathbf{k}(\mathbf{x})\beta=\mathbf{k}(\mathbf{x})\left(% \frac{1}{C}\mathbf{I}+\mathbf{K}\right)^{-1}\mathbf{T},$ (12)

where $\mathbf{k}(\mathbf{x})=[\mathcal{K}(\mathbf{x}_{1},\mathbf{x}),\cdots,\mathcal% {K}(\mathbf{x}_{n},\mathbf{x})]\in\mathbb{R}^{1\times n}$ .

We can instantiate the reduction-based OC framework by plugging the $g(\mathbf{x})$ in Eq. (12) into Eq. (3). Thus, for an unobserved instance $\mathbf{x}_{i}\in\mathcal{U}$ , we can obtain

$\displaystyle g(\mathbf{x}_{i}^{k_{1}})=\bar{\mathcal{W}}^{T}(\phi(\mathbf{x}_% {i}),\mathbf{e}_{k_{1}})$ $\displaystyle\quad=(\mathcal{W},-\theta)^{T}(\phi(\mathbf{x}_{i}),\mathbf{e}_{% k_{1}})$ $\displaystyle\quad=\sum\limits_{j=1}^{n}\sum\limits_{k_{2}=1}^{K-1}\beta_{i}^{% k_{2}}\mathcal{K}(\mathbf{x}_{i}^{k_{1}},\mathbf{x}_{j}^{k_{2}})$ (13) $\displaystyle\quad=\sum\limits_{j=1}^{n}\sum\limits_{k_{2}=1}^{K-1}\beta_{i}^{% k_{2}}(\phi(\mathbf{x}_{i})^{T}\phi(\mathbf{x}_{j})+\mathbf{e}_{k_{1}}^{T}% \mathbf{e}_{k_{2}}),$ $\displaystyle k_{1}=1,\cdots,K-1,$

where $(\phi(\mathbf{x}_{i})^{T}\phi(\mathbf{x}_{j})+\mathbf{e}_{k_{1}}^{T}\mathbf{e}% _{k_{2}})$ is a resultant kernel [35, 10]. In this paper, we consider the perceptron kernel [35], i.e., $\phi(\mathbf{x}_{i})^{T}\phi(\mathbf{x}_{j})=-\|\mathbf{x}_{i}-\mathbf{x}_{j}% \|_{2}$ . Here, $g(\mathbf{x}_{i}^{k_{1}})$ can be interpreted as the distance from $\mathbf{x}_{i}$ to the $k_{1}$ -th decision hyperplane. Of course, $g(\mathbf{x}_{i}^{k_{1}})$ could be a negative value. Then, instance $\mathbf{x}_{i}$ ’s ordinal scale label can be calculated based on Eq. (3). We refer to the above ordinal classification model as the RKELM model.

In order to train the RKELM model using class interval-labeled training instances, we change the original extended binary training instances generation rule in Eq. (2.3) into a new rule. Suppose $\langle\mathbf{x}_{i},[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]\rangle$ is a class interval-labeled training instance, the extended binary training instances are generated as follows

$\displaystyle\langle\mathbf{x}_{i}^{k},y_{i}^{k}\rangle,$ $\displaystyle k=\left\{\begin{array}[]{ll}\{R_{i},\cdots,K-1\},&\text{if }L_{i% }=1\wedge R_{i}<K\\ \{1,\cdots,L_{i}-1\}&\text{if }1<L_{i}\wedge R_{i}=K\\ \{1,\cdots,L_{i}-1,R_{i},\cdots,K-1\},&\text{if }1<L_{i}\wedge R_{i}<K\\ \end{array}\right.,$ (14) $\displaystyle\mathbf{x}_{i}^{k}=(\mathbf{x}_{i},\mathbf{e}_{k})\in\mathbb{R}^{% d+K-1}$ $\displaystyle y_{i}^{k}=\left\{\begin{array}[]{ll}-1,&\mathcal{C}_{R_{i}}\prec% \mathcal{C}_{k+1}\\ +1,&\mathcal{C}_{k}\prec\mathcal{C}_{L_{i}}.\\ \end{array}\right.$

Instead of generates $K-1$ binary instances for each training instance $\langle\mathbf{x}_{i},y_{i}\rangle$ , the RKELM model with the new rule generates $K-IS(\mathbf{x}_{i})$ binary instances for each class interval-labeled training instance $\langle\mathbf{x}_{i},[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]\rangle$ . Table 1 shows the examples of extended binary instances generation for an instance when it associates with different class intervals. In the table, “ $y_{i}^{k}=?$ ” denotes that the binary label of $\mathbf{x}_{i}^{k}$ is undetermined, and the instance $\mathbf{x}_{i}^{k}$ is therefore not involved in the extended binary training set. The complexity of the RKELM model is $\mathcal{O}(m^{3}(K-1)^{3})$ , where $m=|\mathcal{L}_{I}|$ is the number of class interval-labeled training instances.

Table 1

Examples of extended binary instances generation for an instance under different class intervals ( $K=5$ )

$\langle\mathbf{x}_{i},I(\mathbf{x}_{i})\rangle$	$[\mathcal{C}_{1},\mathcal{C}_{1}]$	$[\mathcal{C}_{2},\mathcal{C}_{2}]$	$[\mathcal{C}_{3},\mathcal{C}_{3}]$	$[\mathcal{C}_{4},\mathcal{C}_{4}]$	$[\mathcal{C}_{5},\mathcal{C}_{5}]$	$[\mathcal{C}_{1},\mathcal{C}_{3}]$	$[\mathcal{C}_{2},\mathcal{C}_{3}]$	$[\mathcal{C}_{3},\mathcal{C}_{5}]$
( $\mathbf{x}_{i},1,0,0,0$ )	$y_{i}^{1}=-1$	$y_{i}^{1}=+1$	$y_{i}^{1}=+1$	$y_{i}^{1}=+1$	$y_{i}^{1}=+1$	$y_{i}^{1}=?$	$y_{i}^{1}=+1$	$y_{i}^{1}=+1$
( $\mathbf{x}_{i},0,1,0,0$ )	$y_{i}^{2}=-1$	$y_{i}^{2}=-1$	$y_{i}^{2}=+1$	$y_{i}^{2}=+1$	$y_{i}^{2}=+1$	$y_{i}^{2}=?$	$y_{i}^{2}=?$	$y_{i}^{2}=+1$
( $\mathbf{x}_{i},0,0,1,0$ )	$y_{i}^{3}=-1$	$y_{i}^{3}=-1$	$y_{i}^{3}=-1$	$y_{i}^{3}=+1$	$y_{i}^{3}=+1$	$y_{i}^{3}=-1$	$y_{i}^{3}=-1$	$y_{i}^{3}=?$
( $\mathbf{x}_{i},0,0,0,1$ )	$y_{i}^{4}=-1$	$y_{i}^{4}=-1$	$y_{i}^{4}=-1$	$y_{i}^{4}=-1$	$y_{i}^{4}=+1$	$y_{i}^{4}=-1$	$y_{i}^{4}=-1$	$y_{i}^{4}=?$

For an unlabeled instance $\mathbf{x}_{i}\in\mathcal{U}$ , suppose it associates with a class interval $[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]$ , its cumulative probabilities are computed as

$\displaystyle P(y_{i}\prec\mathcal{C}_{k+1}|\mathbf{x}_{i})=\frac{1}{1+\exp(g(% \mathbf{x}_{i}^{k}))},$ (15) $\displaystyle L_{i}\leqslant k\leqslant R_{i}-1.$

Accordingly, we can obtain the posterior probabilities of $\mathbf{x}_{i}\in\mathcal{U}$ by decomposing the cumulative probabilities as follows

$\displaystyle P(y_{i}=\mathcal{C}_{k}|\mathbf{x}_{i})=\left\{\begin{array}[]{% ll}P(y_{i}\prec\mathcal{C}_{k+1}|\mathbf{x}_{i}),&k=L_{i}\\ P(y_{i}\prec\mathcal{C}_{k+1}|\mathbf{x}_{i})-P(y_{i}\prec\mathcal{C}_{k}|% \mathbf{x}_{i}),&L_{i}<k<R_{i}\\ 1-P(y_{i}\prec\mathcal{C}_{k}|\mathbf{x}_{i}),&k=R_{i}\\ \end{array}\right..$ (16)

3.3 Query pair selection

In our method, each query pair is built with a critical unlabeled instance and a labeled instance. Therefore, in what follows, we will describe how to choose the two instances separately.

3.3.1 Critical unlabeled instance selection

Uncertainty sampling is one of the most commonly used supervised AL strategies and is among the top performers in earlier extensive benchmark experiments [36]. In ordinal classification, the hard-to-predict instances are usually located between adjacent classes. Therefore, we prefer to select the instance closest to the decision boundary. For an unlabeled instance $\mathbf{x}_{i}\in\mathcal{U}$ , the distance from it to the $k$ -th decision boundary in the RKELM model can be computed as

$\displaystyle\Delta(\mathbf{x}_{i},\theta_{k})=|(\mathcal{W},-\theta)^{T}(\phi% (\mathbf{x}_{i}),\mathbf{e}_{k})|$ $\displaystyle\quad=|g(\mathbf{x}_{i}^{k})|,$ (17) $\displaystyle k=1,\cdots,K-1,$

where $|x|$ denotes the absolute value of $x$ . Since there are $K-1$ decision boundaries (thresholds), we define the critical instance as the instance with the smallest distance from its nearest decision boundary. We can formulated the critical instance as

$\displaystyle\mathbf{x}^{\ast}={\arg\min}_{x_{i}\in\mathcal{U}}{\min}_{k\in[1,% \cdots,K-1]}\Delta(\mathbf{x}_{i},\theta_{k}),$ (18)

It is worth noting that this paper focuses on proposing a pairwise query active learning paradigm for ordinal classification rather than a pointwise query active instance selection method. Therefore, we do not design a complex active instance selection method here. In case there are incomplete or noise values in the ordinal data, one can use the AL method in our previous work [37] to select critical instances.

3.3.2 Labeled instance selection

This subsection solves the problem of how to effectively select a labeled instance to pair with a critical unlabeled instance. We aim to obtain more labeled instances by fewer pairwise queries. Therefore, for each query pair building, we attempt to reduce the class interval of the critical instance as much as possible through each information acquisition and reasoning. Since the reasoning results are affected directly by the class of the labeled instance, the specific task of labeled instance selection is to determine which class of labeled instance is more suitable for pairing with the critical instance.

To reduce the class interval of the critical instance, the class of the labeled instance in the query pair must lie within the class interval of the critical instance. Suppose $[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]$ is the class interval of the critical instance $\mathbf{x}_{i}\in\mathcal{U}$ . When there is no additional information, it might be reasonable to pair a labeled instance $\mathbf{x}_{j}$ that belongs to the median class of $[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]$ , i.e., $\mathcal{C}_{\lfloor(L_{i}+R_{i})/2\rfloor}$ , to $\mathbf{x}_{i}$ . We refer to this method as the median-class assignment, which costs $\mathcal{O}(\lfloor\log_{2}IS(\mathbf{x}_{i})\rfloor)$ times queries to obtain $\mathbf{x}_{i}$ ’s explicit label. Fortunately, the posterior estimates derived from the base learner can be used to guide the query pair building. Therefore, in the following, we will analyze the above issue from a decision-theoretic view and propose an expected cost minimization strategy.

Let $P(y_{i}=\mathcal{C}_{l}|\mathbf{x}_{i})$ (shorted as $P_{l}^{i}$ ) denotes the posterior estimate of a critical instance $\mathbf{x}_{i}$ , which is computed by Eqs (3.2) and (16), where $l=L_{i},\cdots,R_{i}$ . Let $IS^{\prime}(\mathbf{x}_{i})$ be the class interval size of $\mathbf{x}_{i}$ after one time pairwise query and reasoning. As we have mentioned above, the smaller of $IS^{\prime}(\mathbf{x}_{i})$ , the greater the contribution of the class interval-labeled training instance $\langle\mathbf{x}_{i},[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]\rangle$ . Therefore, in order to effectively use each pairwise query, we should minimize the value of $IS^{\prime}(\mathbf{x}_{i})$ . We term this idea the expected class interval minimization (EIM). We can compute the expectation of $IS^{\prime}(\mathbf{x}_{i})$ by pairing $\mathbf{x}_{i}$ with different classes of labeled instances as follows

$\displaystyle\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})% =\left\{\begin{array}[]{ll}P_{L_{i}}^{i}+\sum\limits_{l=L_{i}+1}^{R_{i}}(R_{i}% -L_{i})P_{l}^{i},&k=L_{i}\\ \sum\limits_{l=L_{i}}^{k-1}(k-L_{i})P_{l}^{i}+P_{k}^{i}+\sum\limits_{l=k+1}^{R% _{i}}(R_{i}-k)P_{l}^{i},&L_{i}<k<R_{i}\\ \sum\limits_{l=L_{i}}^{R_{i}-1}(R_{i}-L_{i})P_{l}^{i}+P_{R_{i}}^{i},&k=R_{i}\\ \end{array}\right.$ (19)

where $y_{j}$ is the label of labeled instance $\mathbf{x}_{j}$ . Based on Eq. (19), we can derive the following theorem.

.

Given a query pair $(\mathbf{x}_{i},\mathbf{x}_{j})$ , in which $\mathbf{x}_{i}\in\mathcal{U}$ and $\mathbf{x}_{j}\in\mathcal{L}$ . Suppose $I(\mathbf{x}_{i})=[\mathcal{C}_{L_{i}},\mathcal{C}_{R_{i}}]$ , and has $IS(\mathbf{x}_{i})\geqslant 3$ . Then, the two inequalities $\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{L_{i}})\geqslant% \textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{L_{i}+1})$ and $\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{R_{i}})\geqslant% \textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{R_{i}-1})$ always hold.

Proof..

According to Eq. (19), we have $\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{L_{i}})-\textit{% EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{L_{i}+1})=(R_{i}-L_{i}-1% )P_{L_{i}+1}^{i}+\sum\limits_{l=L_{i}+2}^{R_{i}}P_{l}^{i}\geqslant 0$ . When $P_{L_{i}}^{i}=1$ the equation $(R_{i}-L_{i}-1)P_{L_{i}+1}^{i}+\sum\limits_{l=L_{i}+2}^{R_{i}}P_{l}^{i}=0$ holds. Meanwhile, we also have $\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{R_{i}})-\textit{% EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{R_{i}-1})=\sum\limits_{l% =L_{i}}^{R_{i}-2}P_{l}^{i}+(R_{i}-L_{i}-1)P_{R_{i}-1}^{i}\geqslant 0$ . When $P_{R_{i}}^{i}=1$ the equation $\sum\limits_{l=L_{i}}^{R_{i}-2}P_{l}^{i}+(R_{i}-L_{i}-1)P_{R_{i}-1}^{i}=0$ holds. ∎

According to the above analysis, we can constrain the class of the labeled instance within $[\mathcal{C}_{L_{i}+1},\mathcal{C}_{R_{i}-1}]$ , thus obtaining a higher opportunity to reduce the class interval of $\mathbf{x}_{i}\in\mathcal{U}$ significantly, where $L_{i}+1\leqslant R_{i}-1$ . Therefore, Eq. (19) can be reduced to the following formula

$\displaystyle\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})% =\sum\limits_{l=L_{i}}^{k-1}(k-L_{i})P_{l}^{i}+P_{k}^{i}+\sum\limits_{l=k+1}^{% R_{i}}(R_{i}-k)P_{l}^{i},$ (20) $\displaystyle L_{i}+1\leqslant k\leqslant R_{i}-1,$

Based on the policy of EIM, the class which produces the minimal value of $\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})$ is the target class. For ease of understanding Eq. (3.3.2), Table 2 presents an example of calculating the $\textit{EIS}(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})$ . In this example, the class interval of $\mathbf{x}_{i}\in\mathcal{U}$ is $[\mathcal{C}_{1},\mathcal{C}_{6}]$ .

Table 2

Decision table for EIM ( $I(\mathbf{x}_{i})=[\mathcal{C}_{1},\mathcal{C}_{6}]$ )

$(\mathbf{x}_{i},\mathbf{x}_{j})$	$y_{j}=\mathcal{C}_{2}$	$y_{j}=\mathcal{C}_{3}$	$y_{j}=\mathcal{C}_{4}$	$y_{j}=\mathcal{C}_{5}$
$y_{i}=\mathcal{C}_{1}$	$P_{1}^{i}$	$2P_{1}^{i}$	$3P_{1}^{i}$	$4P_{1}^{i}$
$y_{i}=\mathcal{C}_{2}$	$P_{2}^{i}$	$2P_{2}^{i}$	$3P_{2}^{i}$	$4P_{2}^{i}$
$y_{i}=\mathcal{C}_{3}$	$4P_{3}^{i}$	$P_{3}^{i}$	$3P_{3}^{i}$	$4P_{3}^{i}$
$y_{i}=\mathcal{C}_{4}$	$4P_{4}^{i}$	$3P_{4}^{i}$	$P_{4}^{i}$	$4P_{4}^{i}$
$y_{i}=\mathcal{C}_{5}$	$4P_{5}^{i}$	$3P_{5}^{i}$	$2P_{5}^{i}$	$P_{5}^{i}$
$y_{i}=\mathcal{C}_{6}$	$4P_{6}^{i}$	$3P_{6}^{i}$	$2P_{6}^{i}$	$P_{6}^{i}$
$EIS(\mathbf{x}_{i}\|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})$	$P_{1}^{i}+P_{2}^{i}+4\sum\limits_{l=3}^{6}P_{l}^{i}$	$2\sum\limits_{l=1}^{2}P_{l}^{i}+P_{3}^{i}+3\sum\limits_{l=4}^{6}P_{l}^{i}$	$3\sum\limits_{l=1}^{3}P_{l}^{i}+P_{4}^{i}+2\sum\limits_{l=5}^{6}P_{l}^{i}$	$4\sum\limits_{l=1}^{4}P_{l}^{i}+P_{5}^{i}+P_{6}^{i}$

It is intuitive and reasonable for the EIM strategy to reduce the class interval of the current critical instance as much as possible with the current pairwise query. However, it failed to consider the subsequent query costs when the explicit label is not obtained in the current pairwise query. To overcome this shortcoming, we use the query costs to replace the class interval size in Eq. (3.3.2) and endow the equation with a new interpretation.

The query costs for obtaining an explicit label contain current and future costs. Here, we defined the costs as the pairwise query budget consumption. Let $\mathbf{x}_{i}\in\mathcal{U}$ be a critical instance, and $IS^{\prime}(\mathbf{x}_{i})$ be the class interval size of $\mathbf{x}_{i}$ after one time pairwise query and reasoning. If $IS^{\prime}(\mathbf{x}_{i})=1$ , i.e., $\mathbf{x}_{i}$ ’s explicit label is obtained, thus only one pairwise query budget is consumed. This case does not need future costs. In the case that $IS^{\prime}(\mathbf{x}_{i})>1$ , i.e., we failed to obtain $\mathbf{x}_{i}$ ’s label in the current pairwise query, but one query budget is already consumed (the current cost). To obtain $\mathbf{x}_{i}$ ’s label, we still need one or more pairwise queries (the future costs) in the following iterations. However, the future costs are unknown. The only available information in this situation is the value of $IS^{\prime}(\mathbf{x}_{i})$ . Therefore, we approximately estimate the further cost as $\lfloor\log_{2}IS^{\prime}(\mathbf{x}_{i})\rfloor$ , i.e., the costs of the aforementioned median-class assignment method. When the $IS^{\prime}(\mathbf{x}_{i})\leqslant 3$ , the future cost is 1, which is consistent with the value of $\lfloor\log_{2}IS^{\prime}(\mathbf{x}_{i})\rfloor$ . Based on the above analysis, Eq. (3.3.2) can be rewritten as follows

$\displaystyle EC(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})=\!\left% \{\begin{array}[]{ll}P^{i}_{L_{i}}+P^{i}_{L_{i}+1}+\sum\limits_{l={L_{i}}+3}^{% R_{i}}P^{i}_{l}(1+\lfloor\log_{2}(IS(\mathbf{x}_{i})-2)\rfloor),&k=L_{i}+1\\ \begin{array}[]{l}\sum\limits_{l={L_{i}}}^{k-1}P^{i}_{l}(1+\lfloor\log_{2}(k-L% _{i})\rfloor)+P^{i}_{k}+\sum\limits_{l=k+1}^{R_{i}}P^{i}_{l}\\ \quad(1+\lfloor\log_{2}(R_{i}-k)\rfloor)\end{array},&L_{i}+1<k<R_{i}-1\\ \sum\limits_{l=L_{i}}^{{R_{i}}-2}P^{i}_{l}(1+\lfloor\log_{2}(IS(\mathbf{x}_{i}% )-2)\rfloor)+P^{i}_{{R_{i}}-1}+P^{i}_{R_{i}},&k=R_{i}-1\\ \end{array}\right..$ (21)

Table 3

Decision table for ECM ( $I(\mathbf{x}_{i})=[\mathcal{C}_{1},\mathcal{C}_{6}]$ )

$(\mathbf{x}_{i},\mathbf{x}_{j})$	$y_{j}=\mathcal{C}_{2}$	$y_{j}=\mathcal{C}_{3}$	$y_{j}=\mathcal{C}_{4}$	$y_{j}=\mathcal{C}_{5}$
$y_{i}=\mathcal{C}_{1}$	$P_{1}^{i}$	$P_{1}^{i}(1+\lfloor\log_{2}2\rfloor)$	$P_{1}^{i}(1+\lfloor\log_{2}3\rfloor)$	$P_{1}^{i}(1+\lfloor\log_{2}4\rfloor)$
$y_{i}=\mathcal{C}_{2}$	$P_{2}^{i}$	$P_{2}^{i}(1+\lfloor\log_{2}2\rfloor)$	$P_{2}^{i}(1+\lfloor\log_{2}3\rfloor)$	$P_{2}^{i}(1+\lfloor\log_{2}4\rfloor)$
$y_{i}=\mathcal{C}_{3}$	$P_{3}^{i}(1+\lfloor\log_{2}4\rfloor)$	$P_{3}^{i}$	$P_{3}^{i}(1+\lfloor\log_{2}3\rfloor)$	$P_{3}^{i}(1+\lfloor\log_{2}4\rfloor)$
$y_{i}=\mathcal{C}_{4}$	$P_{4}^{i}(1+\lfloor\log_{2}4\rfloor)$	$P_{4}^{i}(1+\lfloor\log_{2}3\rfloor)$	$P_{4}^{i}$	$P_{4}^{i}(1+\lfloor\log_{2}4\rfloor)$
$y_{i}=\mathcal{C}_{5}$	$P_{5}^{i}(1+\lfloor\log_{2}4\rfloor)$	$P_{5}^{i}(1+\lfloor\log_{2}3\rfloor)$	$P_{5}^{i}(1+\lfloor\log_{2}2\rfloor)$	$P_{5}^{i}$
$y_{i}=\mathcal{C}_{6}$	$P_{6}^{i}(1+\lfloor\log_{2}4\rfloor)$	$P_{6}^{i}(1+\lfloor\log_{2}3\rfloor)$	$P_{6}^{i}(1+\lfloor\log_{2}2\rfloor)$	$P_{6}^{i}$
$EC(\mathbf{x}_{i}\|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})$	$P_{1}^{i}+P_{2}^{i}+3\sum\limits_{l=3}^{6}P_{l}^{i}$	$2\sum\limits_{l=1}^{2}P_{l}^{i}+P_{3}^{i}+2\sum\limits_{l=4}^{6}P_{l}^{i}$	$2\sum\limits_{l=1}^{3}P_{l}^{i}+P_{4}^{i}+2\sum\limits_{l=5}^{6}P_{l}^{i}$	$3\sum\limits_{l=1}^{4}P_{l}^{i}+P_{5}^{i}+P_{6}^{i}$

The new equation can be interpreted as the expected cost of obtaining $\mathbf{x}_{i}$ ’s explicit label. Therefore, we can determine the class of the labeled instance by minimizing $EC(\mathbf{x}_{i}|\mathbf{x}_{j};y_{j}=\mathcal{C}_{k})$ . We refer to this method as expected cost minimization (ECM). For ease of understanding, Table 3 illustrates an example of calculating the expected cost. In this example, the class interval of $\mathbf{x}_{i}\in\mathcal{U}$ is $[\mathcal{C}_{1},\mathcal{C}_{6}]$ .

Based on the above analysis, the class of the labeled instance in a query pair can be determined as follows

$\displaystyle\mathcal{C}^{\ast}={\arg\min}_{\mathcal{C}_{k}\in\{\mathcal{C}_{L% _{i}+1},\cdots,\mathcal{C}_{R_{i}-1}\}}EC(\mathbf{x}_{i}|y_{j}=\mathcal{C}_{k}).$ (22)

Then, one of the labeled instances that belongs to class $\mathcal{C}^{\ast}$ is selected to pair with the critical instance $\mathbf{x}_{i}$ .

3.4 Computational complexity

Let $N=|\mathcal{T}\cup\mathcal{U}|$ be the number of all instances and $K$ be the number of classes. Suppose in the current iteration, the size of the training set is $n=|\mathcal{T}|$ and the size of the class interval-labeled training set is $m=|\mathcal{L}_{I}|$ . We will then analyze the computational complexity of one pairwise query from the aspects of base learner training, critical instances selection, and labeled instance selection, respectively.

The time complexity of training the based learner RKELM model is $\mathcal{O}(m^{3}(K-1)^{3})$ . Selecting the critical unlabeled instance for the query pair requires the following two operations: (1) calculate $g(\mathbf{x}_{i})$ in Eq. (3.2) for all the instances in $\mathcal{U}$ requires $\mathcal{O}((N-m)^{2}(K-1)^{2})$ time; (2) obtain the critical unlabeled instance based on Eq. (18) requires $\mathcal{O}((N-m)(K-1))$ time. Therefore, the time complexity of selecting a critical unlabeled instance is $\mathcal{O}((N-m)^{2}(K-1)^{2})$ . For the labeled instance selection, the main computational cost is the calculation of the expected cost with Eq. (21), which requires $\mathcal{O}(K^{2})$ time in the worst case.

In summary, suppose $N\gg m$ , the time complexity of performing one pairwise query is $\mathcal{O}(N^{2}K^{2}+m^{3}K^{3})$ .

4. Experiment

In this section, we empirically study the performance of the proposed pairwise query active ordinal classification method. Subsection 4.1 reports the used datasets and experimental configuration. The experimental results are reported in Subsection 4.2. Finally, we discuss the feasibility conditions of the proposed pairwise query active ordinal classification method in Subsection 4.3. All the experiments are implemented on a Windows 10 64-bit operating system with 32GB RAM and a Intel(R) Core(TM) i7-8700 CPU@3.20 GHz processor. The programming language is Python. The source codes are available at https://github.com/DeniuHe/AOCpair.

4.1 Datasets and experimental configuration

Table 4
Information of the used datasets

No.	Datasets	#Instances	#Features	#Classes	Distribution
1	SWD [30]	1000	10	4	$[32,352,399,217]$
2	Car [30]	1728	21	4	$[1210,384,69,65]$
3	Automobile [30]	205	71	5	$[25,67,54,32,27]$
4	Cleveland ${}^{1}$	297	13	5	$[160,54,35,35,13]$
5	Housing-5bin [30]	506	13	5	$[102,101,101,101,101]$
6	Stock-5bin [30]	950	9	5	$[190,190,190,190,190]$
7	Computer-5bin [30]	8192	12	5	$[1639,1639,1638,1638,1638]$
8	Winequality-red [30]	1599	11	6	$[10,53,681,638,199,18]$
9	Obesity ${}^{2}$	2111	29	7	$[272,287,290,290,351,297,324]$
10	Housing-10bin [30]	506	13	10	$[51,51,51,51,51,51,50,50,50,50]$
11	Stock-10bin [30]	950	9	10	$[95,95,95,95,95,95,95,95,95,95]$
12	Computer-10bin [30]	8192	12	10	$[820,820,819,819,819,819,819,819,819,819]$

${}^{1}$ https://sci2s.ugr.es/keel/datasets.php. ${}^{2}$ https://archive.ics.uci.edu/ml/index.php.

Table 4 summarizes the information of the twelve used datasets. The datasets Cleveland and Obesity are from the KEEL dataset repository and the UCI dataset repository, respectively. The other ten datasets are from reference [30]. Since a part of the attributes in Obesity are nominal attributes, we have preprocessed those nominal attributes using one-hot encoding. Before experiments, all the datasets are standardized by the following Z-score:

$\displaystyle x_{ij}=\frac{x_{ij}-\textit{mean}(x_{j})}{\textit{std}(x_{j})},$ (23)

where $x_{ij}$ denotes the $j$ -th attribute value of instance $\mathbf{x}_{i}$ , and $\textit{mean}(x_{j})$ and $\textit{std}(x_{j})$ are the mean value and the standard deviation of the $j$ -th attribute, respectively.

In the experiments, each of the datasets is split based on four times five-fold cross-validation, i.e., each dataset is split into an unlabeled pool (80% of the data) and a testing set (20% of the data) in each run (a total of 20 runs). The metrics of Mean Zero-one Error (MZE), Mean Absolute Error (MAE), and Macro F1 score (F1) are employed in the experiments. MZE and MAE are the longstanding benchmark metrics to measure the performance of ordinal classification [38]. Macro F1 score is a commonly used metric to measure the performance of multi-class classification.

To verify the effectiveness of our method (denoted as AOCpair), we compare it with the following three state-of-the-art methods:

LDLORR [13] is the method of linear discriminant learning for ordinal classification with labeled instances and instance-pair relation information.

SVLORR [13] is the method of support vector learning for ordinal classification with labeled instances and instance-pair relation information.

KNNORR [1] is the recently proposed kNN-based ordinal classification method by fusing absolute and relative information. In this method, the number of pointwise neighbors and the number of pairwise neighbors are both set as $K$ .

The above methods are the three existing methods most related to our method. These three methods select the query pairs from the pool set and request the relative information interactively from the annotators. In the proposed method, the coefficient $C$ in the RKELM model is fixed as 10. We assume that the annotators provide the ground-truth instance-pair relation information in each pairwise query. In each run, the performances of the compared methods are evaluated on the testing set. Finally, the average results of 20 runs for the different methods are reported.

We are interested in whether our approach can achieve a better ordinal classification performance under a given pairwise query budget. To comparatively study the performance of the proposed method and the three competitors, we set the given pairwise query budget as $10K$ and $20K$ and compare those methods under three different sizes ( $K$ , $3K$ , and $5K$ ) of the initial training set (seed set). The seed set consists of instances selected $1/3/5$ from each class randomly from the unlabeled pool.

In addition to the above comparison, we also conducted an ablation study. Our method selects the query pairs based on a margin-based unlabeled instance selection and an ECM-based labeled instance selection method. To verify the effectiveness of the ECM-based labeled selection method, we replace it with four different alternatives. Thus, the following five methods are compared in the ablation study.

AOCpair-ECM is the proposed method.

AOCpair-EIM is a method in which the labeled instance in each query pair is selected based on the expected class interval minimization (EIM) strategy, which has been introduced in Section 3.3.2.

AOCpair-Med is a method in which the labeled instance in each query pair is selected based on the median-class assignment strategy, which has been mentioned in Section 3.3.2.

AOCpair-Near selects the labeled instance that is closest to the critical unlabeled instance. Thus, the two instances in each query pair are more likely have the same class label.

AOCpair-Post selects the labeled instance that belongs to the class on which the critical instance has the maximum posterior probability estimate.

To compare the above five methods, we set the size of the seed set as $K$ and set the pairwise query budget as $20K$ . We will visualize the evaluation results of the five methods by plotting the learning curves of MZE, MAE, and F1.

4.2 Experimental results and analysis

Tables 5–7 summarize the classification results of AOCpair and the three competitors under three different initial training set sizes. When the size of the initial training set is $K$ , the results in Table 5 show

Table 5
Classification results of the four compared methods when the number of initial training instances is set as $K$ (the best results are marked in boldface)

Metric	Datasets	10K				20K
		SVLORR	LDLORR	KNNORR	AOCpair	SVLORR	LDLORR	KNNORR	AOCpair
MZE	SWD	0.64 $\pm$ 0.08 ${}^{\ast}$	0.67 $\pm$ 0.07 ${}^{\ast}$	0.65 $\pm$ 0.07 ${}^{\ast}$	0.50 $\pm$ 0.03	0.66 $\pm$ 0.09 ${}^{\ast}$	0.66 $\pm$ 0.10 ${}^{\ast}$	0.64 $\pm$ 0.07 ${}^{\ast}$	0.48 $\pm$ 0.03
	Car	0.50 $\pm$ 0.15 ${}^{\ast}$	0.44 $\pm$ 0.09 ${}^{\ast}$	0.54 $\pm$ 0.09 ${}^{\ast}$	0.20 $\pm$ 0.04	0.41 $\pm$ 0.15 ${}^{\ast}$	0.37 $\pm$ 0.12 ${}^{\ast}$	0.52 $\pm$ 0.07 ${}^{\ast}$	0.14 $\pm$ 0.02
	Automobile	0.56 $\pm$ 0.10 ${}^{\ast}$	0.62 $\pm$ 0.10 ${}^{\ast}$	0.64 $\pm$ 0.09 ${}^{\ast}$	0.48 $\pm$ 0.11	0.55 $\pm$ 0.09 ${}^{\ast}$	0.60 $\pm$ 0.10 ${}^{\ast}$	0.64 $\pm$ 0.08 ${}^{\ast}$	0.38 $\pm$ 0.10
	Cleveland	0.54 $\pm$ 0.11 ${}^{\ast}$	0.62 $\pm$ 0.12 ${}^{\ast}$	0.59 $\pm$ 0.12 ${}^{\ast}$	0.46 $\pm$ 0.07	0.55 $\pm$ 0.12 ${}^{\ast}$	0.57 $\pm$ 0.14 ${}^{\ast}$	0.61 $\pm$ 0.13 ${}^{\ast}$	0.46 $\pm$ 0.05
	Housing-5bin	0.55 $\pm$ 0.07 ${}^{\ast}$	0.56 $\pm$ 0.09 ${}^{\ast}$	0.58 $\pm$ 0.07 ${}^{\ast}$	0.43 $\pm$ 0.06	0.49 $\pm$ 0.08 ${}^{\ast}$	0.53 $\pm$ 0.08 ${}^{\ast}$	0.57 $\pm$ 0.07 ${}^{\ast}$	0.38 $\pm$ 0.04
	Stock-5bin	0.49 $\pm$ 0.07 ${}^{\ast}$	0.48 $\pm$ 0.05 ${}^{\ast}$	0.45 $\pm$ 0.07 ${}^{\ast}$	0.24 $\pm$ 0.05	0.48 $\pm$ 0.06 ${}^{\ast}$	0.49 $\pm$ 0.07 ${}^{\ast}$	0.40 $\pm$ 0.06 ${}^{\ast}$	0.17 $\pm$ 0.03
	Computer-5bin	0.50 $\pm$ 0.06 ${}^{\ast}$	0.51 $\pm$ 0.05 ${}^{\ast}$	0.56 $\pm$ 0.04 ${}^{\ast}$	0.43 $\pm$ 0.03	0.50 $\pm$ 0.05 ${}^{\ast}$	0.51 $\pm$ 0.05 ${}^{\ast}$	0.52 $\pm$ 0.05 ${}^{\ast}$	0.40 $\pm$ 0.03
	Winequality-red	0.78 $\pm$ 0.06 ${}^{\ast}$	0.79 $\pm$ 0.09 ${}^{\ast}$	0.71 $\pm$ 0.07 ${}^{\ast}$	0.45 $\pm$ 0.02	0.76 $\pm$ 0.08 ${}^{\ast}$	0.76 $\pm$ 0.11 ${}^{\ast}$	0.70 $\pm$ 0.07 ${}^{\ast}$	0.43 $\pm$ 0.02
	Obesity	0.49 $\pm$ 0.09 ${}^{\ast}$	0.47 $\pm$ 0.08 ${}^{\ast}$	0.62 $\pm$ 0.05 ${}^{\ast}$	0.39 $\pm$ 0.04	0.37 $\pm$ 0.09 ${}^{\ast}$	0.36 $\pm$ 0.06 ${}^{\ast}$	0.58 $\pm$ 0.04 ${}^{\ast}$	0.29 $\pm$ 0.05
	Housing-10bin	0.67 $\pm$ 0.05 ${}^{\ast}$	0.65 $\pm$ 0.05	0.71 $\pm$ 0.06 ${}^{\ast}$	0.64 $\pm$ 0.04	0.66 $\pm$ 0.05 ${}^{\ast}$	0.66 $\pm$ 0.04 ${}^{\ast}$	0.69 $\pm$ 0.04 ${}^{\ast}$	0.59 $\pm$ 0.05
	Stock-10bin	0.64 $\pm$ 0.05 ${}^{\ast}$	0.65 $\pm$ 0.04 ${}^{\ast}$	0.57 $\pm$ 0.06 ${}^{\ast}$	0.42 $\pm$ 0.04	0.64 $\pm$ 0.05 ${}^{\ast}$	0.64 $\pm$ 0.05 ${}^{\ast}$	0.55 $\pm$ 0.06 ${}^{\ast}$	0.34 $\pm$ 0.05
	Computer-10bin	0.65 $\pm$ 0.03 ${}^{\ast}$	0.66 $\pm$ 0.04 ${}^{\ast}$	0.71 $\pm$ 0.03 ${}^{\ast}$	0.63 $\pm$ 0.01	0.64 $\pm$ 0.04 ${}^{\ast}$	0.64 $\pm$ 0.04 ${}^{\ast}$	0.69 $\pm$ 0.03 ${}^{\ast}$	0.61 $\pm$ 0.02
	p-value	2.89E-05				2.47E-05
MAE	SWD	0.81 $\pm$ 0.16 ${}^{\ast}$	0.87 $\pm$ 0.17 ${}^{\ast}$	0.85 $\pm$ 0.15 ${}^{\ast}$	0.54 $\pm$ 0.04	0.84 $\pm$ 0.18 ${}^{\ast}$	0.85 $\pm$ 0.21 ${}^{\ast}$	0.83 $\pm$ 0.13 ${}^{\ast}$	0.53 $\pm$ 0.04
	Car	0.62 $\pm$ 0.24 ${}^{\ast}$	0.55 $\pm$ 0.14 ${}^{\ast}$	0.83 $\pm$ 0.18 ${}^{\ast}$	0.22 $\pm$ 0.04	0.48 $\pm$ 0.20 ${}^{\ast}$	0.43 $\pm$ 0.18 ${}^{\ast}$	0.76 $\pm$ 0.13 ${}^{\ast}$	0.15 $\pm$ 0.02
	Automobile	0.73 $\pm$ 0.17 ${}^{\ast}$	0.86 $\pm$ 0.23 ${}^{\ast}$	0.89 $\pm$ 0.13 ${}^{\ast}$	0.58 $\pm$ 0.15	0.68 $\pm$ 0.13 ${}^{\ast}$	0.78 $\pm$ 0.18 ${}^{\ast}$	0.91 $\pm$ 0.17 ${}^{\ast}$	0.47 $\pm$ 0.14
	Cleveland	0.90 $\pm$ 0.30 ${}^{\ast}$	0.99 $\pm$ 0.26 ${}^{\ast}$	0.92 $\pm$ 0.22 ${}^{\ast}$	0.63 $\pm$ 0.08	0.89 $\pm$ 0.20 ${}^{\ast}$	0.91 $\pm$ 0.27 ${}^{\ast}$	0.96 $\pm$ 0.23 ${}^{\ast}$	0.62 $\pm$ 0.07
	Housing-5bin	0.75 $\pm$ 0.14 ${}^{\ast}$	0.75 $\pm$ 0.16 ${}^{\ast}$	0.78 $\pm$ 0.17 ${}^{\ast}$	0.50 $\pm$ 0.07	0.63 $\pm$ 0.15 ${}^{\ast}$	0.71 $\pm$ 0.19 ${}^{\ast}$	0.74 $\pm$ 0.13 ${}^{\ast}$	0.43 $\pm$ 0.05
	Stock-5bin	0.61 $\pm$ 0.10 ${}^{\ast}$	0.62 $\pm$ 0.09 ${}^{\ast}$	0.54 $\pm$ 0.12 ${}^{\ast}$	0.25 $\pm$ 0.06	0.57 $\pm$ 0.10 ${}^{\ast}$	0.60 $\pm$ 0.12 ${}^{\ast}$	0.46 $\pm$ 0.08 ${}^{\ast}$	0.18 $\pm$ 0.04
	Computer-5bin	0.64 $\pm$ 0.12 ${}^{\ast}$	0.67 $\pm$ 0.10 ${}^{\ast}$	0.73 $\pm$ 0.10 ${}^{\ast}$	0.49 $\pm$ 0.04	0.61 $\pm$ 0.07 ${}^{\ast}$	0.64 $\pm$ 0.10 ${}^{\ast}$	0.66 $\pm$ 0.08 ${}^{\ast}$	0.45 $\pm$ 0.03
	Winequality-red	1.26 $\pm$ 0.20 ${}^{\ast}$	1.31 $\pm$ 0.28 ${}^{\ast}$	0.98 $\pm$ 0.15 ${}^{\ast}$	0.49 $\pm$ 0.03	1.16 $\pm$ 0.20 ${}^{\ast}$	1.21 $\pm$ 0.32 ${}^{\ast}$	0.99 $\pm$ 0.16 ${}^{\ast}$	0.47 $\pm$ 0.03
	Obesity	0.60 $\pm$ 0.16 ${}^{\ast}$	0.59 $\pm$ 0.13 ${}^{\ast}$	0.97 $\pm$ 0.09 ${}^{\ast}$	0.46 $\pm$ 0.06	0.40 $\pm$ 0.11 ${}^{\ast}$	0.41 $\pm$ 0.08 ${}^{\ast}$	0.90 $\pm$ 0.09 ${}^{\ast}$	0.32 $\pm$ 0.06
	Housing-10bin	1.12 $\pm$ 0.17 ${}^{\ast}$	1.16 $\pm$ 0.20 ${}^{\ast}$	1.27 $\pm$ 0.20 ${}^{\ast}$	0.98 $\pm$ 0.11	1.14 $\pm$ 0.16 ${}^{\ast}$	1.13 $\pm$ 0.13 ${}^{\ast}$	1.20 $\pm$ 0.09 ${}^{\ast}$	0.85 $\pm$ 0.10
	Stock-10bin	1.06 $\pm$ 0.17 ${}^{\ast}$	1.05 $\pm$ 0.11 ${}^{\ast}$	0.80 $\pm$ 0.15 ${}^{\ast}$	0.48 $\pm$ 0.06	1.13 $\pm$ 0.24 ${}^{\ast}$	1.08 $\pm$ 0.12 ${}^{\ast}$	0.75 $\pm$ 0.11 ${}^{\ast}$	0.36 $\pm$ 0.05
	Computer-10bin	1.09 $\pm$ 0.11 ${}^{\ast}$	1.16 $\pm$ 0.13 ${}^{\ast}$	1.21 $\pm$ 0.11 ${}^{\ast}$	0.95 $\pm$ 0.03	1.06 $\pm$ 0.14 ${}^{\ast}$	1.08 $\pm$ 0.16 ${}^{\ast}$	1.14 $\pm$ 0.09 ${}^{\ast}$	0.90 $\pm$ 0.04
	p-value	2.34E-05				2.88E-05
F1	SWD	0.32 $\pm$ 0.07 ${}^{\ast}$	0.29 $\pm$ 0.07 ${}^{\ast}$	0.31 $\pm$ 0.07 ${}^{\ast}$	0.39 $\pm$ 0.06	0.30 $\pm$ 0.08 ${}^{\ast}$	0.30 $\pm$ 0.09 ${}^{\ast}$	0.32 $\pm$ 0.07 ${}^{\ast}$	0.42 $\pm$ 0.04
	Car	0.39 $\pm$ 0.11 ${}^{\ast}$	0.40 $\pm$ 0.05 ${}^{\ast}$	0.32 $\pm$ 0.04 ${}^{\ast}$	0.50 $\pm$ 0.08	0.44 $\pm$ 0.09 ${}^{\ast}$	0.48 $\pm$ 0.09 ${}^{\ast}$	0.33 $\pm$ 0.05 ${}^{\ast}$	0.63 $\pm$ 0.07
	Automobile	0.42 $\pm$ 0.11	0.37 $\pm$ 0.11 ${}^{\ast}$	0.32 $\pm$ 0.09 ${}^{\ast}$	0.46 $\pm$ 0.14	0.44 $\pm$ 0.08 ${}^{\ast}$	0.39 $\pm$ 0.10 ${}^{\ast}$	0.33 $\pm$ 0.09 ${}^{\ast}$	0.59 $\pm$ 0.12
	Cleveland	0.29 $\pm$ 0.08	0.25 $\pm$ 0.06 ${}^{\ast}$	0.28 $\pm$ 0.07	0.31 $\pm$ 0.06	0.29 $\pm$ 0.06	0.25 $\pm$ 0.07 ${}^{\ast}$	0.27 $\pm$ 0.08 ${}^{\ast}$	0.33 $\pm$ 0.07
	Housing-5bin	0.43 $\pm$ 0.08 ${}^{\ast}$	0.42 $\pm$ 0.10 ${}^{\ast}$	0.41 $\pm$ 0.07 ${}^{\ast}$	0.57 $\pm$ 0.06	0.48 $\pm$ 0.09 ${}^{\ast}$	0.44 $\pm$ 0.10 ${}^{\ast}$	0.42 $\pm$ 0.08 ${}^{\ast}$	0.62 $\pm$ 0.04
	Stock-5bin	0.49 $\pm$ 0.07 ${}^{\ast}$	0.48 $\pm$ 0.05 ${}^{\ast}$	0.53 $\pm$ 0.08 ${}^{\ast}$	0.76 $\pm$ 0.06	0.49 $\pm$ 0.07 ${}^{\ast}$	0.48 $\pm$ 0.08 ${}^{\ast}$	0.58 $\pm$ 0.07 ${}^{\ast}$	0.82 $\pm$ 0.04
	Computer-5bin	0.49 $\pm$ 0.06 ${}^{\ast}$	0.47 $\pm$ 0.05 ${}^{\ast}$	0.43 $\pm$ 0.05 ${}^{\ast}$	0.57 $\pm$ 0.03	0.48 $\pm$ 0.06 ${}^{\ast}$	0.47 $\pm$ 0.06 ${}^{\ast}$	0.46 $\pm$ 0.06 ${}^{\ast}$	0.59 $\pm$ 0.02
	Winequality-red	0.16 $\pm$ 0.03 ${}^{\ast}$	0.15 $\pm$ 0.04 ${}^{\ast}$	0.20 $\pm$ 0.04 ${}^{\ast}$	0.22 $\pm$ 0.04	0.17 $\pm$ 0.04 ${}^{\ast}$	0.17 $\pm$ 0.06 ${}^{\ast}$	0.21 $\pm$ 0.03	0.22 $\pm$ 0.02
	Obesity	0.50 $\pm$ 0.09 ${}^{\ast}$	0.52 $\pm$ 0.08 ${}^{\ast}$	0.37 $\pm$ 0.05 ${}^{\ast}$	0.60 $\pm$ 0.04	0.61 $\pm$ 0.09 ${}^{\ast}$	0.62 $\pm$ 0.06 ${}^{\ast}$	0.40 $\pm$ 0.04 ${}^{\ast}$	0.71 $\pm$ 0.05
	Housing-10bin	0.31 $\pm$ 0.05 ${}^{\ast}$	0.32 $\pm$ 0.06 ${}^{\ast}$	0.27 $\pm$ 0.06 ${}^{\ast}$	0.35 $\pm$ 0.05	0.32 $\pm$ 0.05 ${}^{\ast}$	0.31 $\pm$ 0.04 ${}^{\ast}$	0.30 $\pm$ 0.05 ${}^{\ast}$	0.41 $\pm$ 0.05
	Stock-10bin	0.33 $\pm$ 0.06 ${}^{\ast}$	0.32 $\pm$ 0.04 ${}^{\ast}$	0.42 $\pm$ 0.06 ${}^{\ast}$	0.56 $\pm$ 0.04	0.33 $\pm$ 0.06 ${}^{\ast}$	0.32 $\pm$ 0.04 ${}^{\ast}$	0.43 $\pm$ 0.06 ${}^{\ast}$	0.65 $\pm$ 0.05
	Computer-10bin	0.32 $\pm$ 0.03 ${}^{\ast}$	0.31 $\pm$ 0.04 ${}^{\ast}$	0.29 $\pm$ 0.04 ${}^{\ast}$	0.37 $\pm$ 0.02	0.33 $\pm$ 0.05 ${}^{\ast}$	0.33 $\pm$ 0.04 ${}^{\ast}$	0.31 $\pm$ 0.03 ${}^{\ast}$	0.39 $\pm$ 0.02
	p-value	2.06E-05				2.79E-05

Table 6

Classification results of the four compared methods when the number of initial training instances is set as $3K$ (the best results are marked in boldface)

Metric	Datasets	10K				20K
		SVLORR	LDLORR	KNNORR	AOCpair	SVLORR	LDLORR	KNNORR	AOCpair
MZE	SWD	0.63 $\pm$ 0.07 ${}^{\ast}$	0.65 $\pm$ 0.05 ${}^{\ast}$	0.64 $\pm$ 0.03 ${}^{\ast}$	0.49 $\pm$ 0.04	0.59 $\pm$ 0.06 ${}^{\ast}$	0.65 $\pm$ 0.07 ${}^{\ast}$	0.64 $\pm$ 0.04 ${}^{\ast}$	0.49 $\pm$ 0.04
	Car	0.46 $\pm$ 0.11 ${}^{\ast}$	0.41 $\pm$ 0.11 ${}^{\ast}$	0.52 $\pm$ 0.04 ${}^{\ast}$	0.19 $\pm$ 0.02	0.43 $\pm$ 0.11 ${}^{\ast}$	0.35 $\pm$ 0.08 ${}^{\ast}$	0.48 $\pm$ 0.04 ${}^{\ast}$	0.13 $\pm$ 0.02
	Automobile	0.54 $\pm$ 0.08 ${}^{\ast}$	0.59 $\pm$ 0.10 ${}^{\ast}$	0.57 $\pm$ 0.10 ${}^{\ast}$	0.44 $\pm$ 0.09	0.51 $\pm$ 0.11 ${}^{\ast}$	0.57 $\pm$ 0.07 ${}^{\ast}$	0.57 $\pm$ 0.08 ${}^{\ast}$	0.34 $\pm$ 0.08
	Cleveland	0.59 $\pm$ 0.07 ${}^{\ast}$	0.56 $\pm$ 0.10 ${}^{\ast}$	0.58 $\pm$ 0.10 ${}^{\ast}$	0.46 $\pm$ 0.04	0.57 $\pm$ 0.08 ${}^{\ast}$	0.52 $\pm$ 0.09 ${}^{\ast}$	0.57 $\pm$ 0.07 ${}^{\ast}$	0.43 $\pm$ 0.04
	Housing-5bin	0.49 $\pm$ 0.06 ${}^{\ast}$	0.49 $\pm$ 0.05 ${}^{\ast}$	0.51 $\pm$ 0.05 ${}^{\ast}$	0.41 $\pm$ 0.06	0.48 $\pm$ 0.04 ${}^{\ast}$	0.49 $\pm$ 0.07 ${}^{\ast}$	0.51 $\pm$ 0.05 ${}^{\ast}$	0.36 $\pm$ 0.05
	Stock-5bin	0.43 $\pm$ 0.05 ${}^{\ast}$	0.44 $\pm$ 0.04 ${}^{\ast}$	0.36 $\pm$ 0.04 ${}^{\ast}$	0.22 $\pm$ 0.03	0.42 $\pm$ 0.04 ${}^{\ast}$	0.41 $\pm$ 0.05 ${}^{\ast}$	0.35 $\pm$ 0.04 ${}^{\ast}$	0.17 $\pm$ 0.03
	Computer-5bin	0.46 $\pm$ 0.04 ${}^{\ast}$	0.49 $\pm$ 0.05 ${}^{\ast}$	0.51 $\pm$ 0.03 ${}^{\ast}$	0.42 $\pm$ 0.02	0.45 $\pm$ 0.03 ${}^{\ast}$	0.46 $\pm$ 0.04 ${}^{\ast}$	0.50 $\pm$ 0.04 ${}^{\ast}$	0.40 $\pm$ 0.02
	Winequality-red	0.70 $\pm$ 0.07 ${}^{\ast}$	0.76 $\pm$ 0.07 ${}^{\ast}$	0.64 $\pm$ 0.05 ${}^{\ast}$	0.45 $\pm$ 0.04	0.74 $\pm$ 0.07 ${}^{\ast}$	0.74 $\pm$ 0.06 ${}^{\ast}$	0.65 $\pm$ 0.06 ${}^{\ast}$	0.43 $\pm$ 0.03
	Obesity	0.39 $\pm$ 0.07 ${}^{\ast}$	0.33 $\pm$ 0.06	0.53 $\pm$ 0.04 ${}^{\ast}$	0.35 $\pm$ 0.05	0.35 $\pm$ 0.06 ${}^{\ast}$	0.30 $\pm$ 0.05 ${}^{\ast}$	0.52 $\pm$ 0.04 ${}^{\ast}$	0.25 $\pm$ 0.04
	Housing-10bin	0.68 $\pm$ 0.06 ${}^{\ast}$	0.68 $\pm$ 0.04 ${}^{\ast}$	0.72 $\pm$ 0.05 ${}^{\ast}$	0.60 $\pm$ 0.05	0.68 $\pm$ 0.04 ${}^{\ast}$	0.66 $\pm$ 0.05 ${}^{\ast}$	0.70 $\pm$ 0.05 ${}^{\ast}$	0.57 $\pm$ 0.05
	Stock-10bin	0.66 $\pm$ 0.05 ${}^{\ast}$	0.63 $\pm$ 0.04 ${}^{\ast}$	0.50 $\pm$ 0.04 ${}^{\ast}$	0.38 $\pm$ 0.04	0.66 $\pm$ 0.05 ${}^{\ast}$	0.62 $\pm$ 0.04 ${}^{\ast}$	0.50 $\pm$ 0.04 ${}^{\ast}$	0.31 $\pm$ 0.03
	Computer-10bin	0.65 $\pm$ 0.04 ${}^{\ast}$	0.65 $\pm$ 0.07 ${}^{\ast}$	0.67 $\pm$ 0.03 ${}^{\ast}$	0.62 $\pm$ 0.02	0.63 $\pm$ 0.04 ${}^{\ast}$	0.62 $\pm$ 0.08	0.66 $\pm$ 0.02 ${}^{\ast}$	0.60 $\pm$ 0.02
	p-value	1.25E-04				3.20E-05
MAE	SWD	0.78 $\pm$ 0.14 ${}^{\ast}$	0.82 $\pm$ 0.10 ${}^{\ast}$	0.80 $\pm$ 0.07 ${}^{\ast}$	0.55 $\pm$ 0.05	0.70 $\pm$ 0.09 ${}^{\ast}$	0.82 $\pm$ 0.15 ${}^{\ast}$	0.81 $\pm$ 0.08 ${}^{\ast}$	0.55 $\pm$ 0.05
	Car	0.53 $\pm$ 0.13 ${}^{\ast}$	0.54 $\pm$ 0.18 ${}^{\ast}$	0.80 $\pm$ 0.08 ${}^{\ast}$	0.20 $\pm$ 0.03	0.48 $\pm$ 0.14 ${}^{\ast}$	0.41 $\pm$ 0.12 ${}^{\ast}$	0.73 $\pm$ 0.07 ${}^{\ast}$	0.13 $\pm$ 0.03
	Automobile	0.65 $\pm$ 0.12 ${}^{\ast}$	0.83 $\pm$ 0.17 ${}^{\ast}$	0.80 $\pm$ 0.16 ${}^{\ast}$	0.52 $\pm$ 0.12	0.60 $\pm$ 0.14 ${}^{\ast}$	0.79 $\pm$ 0.14 ${}^{\ast}$	0.79 $\pm$ 0.13 ${}^{\ast}$	0.40 $\pm$ 0.10
	Cleveland	0.86 $\pm$ 0.14 ${}^{\ast}$	0.89 $\pm$ 0.23 ${}^{\ast}$	0.90 $\pm$ 0.19 ${}^{\ast}$	0.64 $\pm$ 0.08	0.84 $\pm$ 0.12 ${}^{\ast}$	0.79 $\pm$ 0.16 ${}^{\ast}$	0.87 $\pm$ 0.14 ${}^{\ast}$	0.60 $\pm$ 0.07
	Housing-5bin	0.59 $\pm$ 0.09 ${}^{\ast}$	0.62 $\pm$ 0.11 ${}^{\ast}$	0.65 $\pm$ 0.07 ${}^{\ast}$	0.48 $\pm$ 0.07	0.57 $\pm$ 0.07 ${}^{\ast}$	0.61 $\pm$ 0.10 ${}^{\ast}$	0.66 $\pm$ 0.10 ${}^{\ast}$	0.41 $\pm$ 0.06
	Stock-5bin	0.50 $\pm$ 0.05 ${}^{\ast}$	0.52 $\pm$ 0.07 ${}^{\ast}$	0.39 $\pm$ 0.07 ${}^{\ast}$	0.23 $\pm$ 0.03	0.48 $\pm$ 0.04 ${}^{\ast}$	0.47 $\pm$ 0.06 ${}^{\ast}$	0.38 $\pm$ 0.06 ${}^{\ast}$	0.17 $\pm$ 0.03
	Computer-5bin	0.53 $\pm$ 0.05 ${}^{\ast}$	0.60 $\pm$ 0.08 ${}^{\ast}$	0.64 $\pm$ 0.04 ${}^{\ast}$	0.47 $\pm$ 0.03	0.52 $\pm$ 0.04 ${}^{\ast}$	0.55 $\pm$ 0.05 ${}^{\ast}$	0.62 $\pm$ 0.05 ${}^{\ast}$	0.45 $\pm$ 0.03
	Winequality-red	1.02 $\pm$ 0.16 ${}^{\ast}$	1.22 $\pm$ 0.23 ${}^{\ast}$	0.86 $\pm$ 0.09 ${}^{\ast}$	0.50 $\pm$ 0.04	1.12 $\pm$ 0.18 ${}^{\ast}$	1.13 $\pm$ 0.19 ${}^{\ast}$	0.91 $\pm$ 0.11 ${}^{\ast}$	0.47 $\pm$ 0.03
	Obesity	0.44 $\pm$ 0.09	0.39 $\pm$ 0.08	0.86 $\pm$ 0.10 ${}^{\ast}$	0.39 $\pm$ 0.05	0.37 $\pm$ 0.07 ${}^{\ast}$	0.32 $\pm$ 0.06 ${}^{\ast}$	0.83 $\pm$ 0.08 ${}^{\ast}$	0.28 $\pm$ 0.04
	Housing-10bin	1.23 $\pm$ 0.22 ${}^{\ast}$	1.19 $\pm$ 0.14 ${}^{\ast}$	1.21 $\pm$ 0.12 ${}^{\ast}$	0.90 $\pm$ 0.07	1.19 $\pm$ 0.13 ${}^{\ast}$	1.11 $\pm$ 0.12 ${}^{\ast}$	1.19 $\pm$ 0.12 ${}^{\ast}$	0.84 $\pm$ 0.08
	Stock-10bin	1.21 $\pm$ 0.19 ${}^{\ast}$	1.04 $\pm$ 0.08 ${}^{\ast}$	0.66 $\pm$ 0.07 ${}^{\ast}$	0.43 $\pm$ 0.05	1.24 $\pm$ 0.20 ${}^{\ast}$	1.02 $\pm$ 0.10 ${}^{\ast}$	0.64 $\pm$ 0.08 ${}^{\ast}$	0.33 $\pm$ 0.03
	Computer-10bin	1.06 $\pm$ 0.10 ${}^{\ast}$	1.21 $\pm$ 0.73 ${}^{\ast}$	1.15 $\pm$ 0.06 ${}^{\ast}$	0.94 $\pm$ 0.06	0.99 $\pm$ 0.09 ${}^{\ast}$	1.13 $\pm$ 0.75 ${}^{\ast}$	1.11 $\pm$ 0.05 ${}^{\ast}$	0.89 $\pm$ 0.04
	p-value	5.49E-05				4.54E-05
F1	SWD	0.33 $\pm$ 0.07 ${}^{\ast}$	0.32 $\pm$ 0.04 ${}^{\ast}$	0.33 $\pm$ 0.03 ${}^{\ast}$	0.44 $\pm$ 0.06	0.37 $\pm$ 0.06 ${}^{\ast}$	0.32 $\pm$ 0.06 ${}^{\ast}$	0.33 $\pm$ 0.03 ${}^{\ast}$	0.43 $\pm$ 0.04
	Car	0.45 $\pm$ 0.06 ${}^{\ast}$	0.42 $\pm$ 0.08 ${}^{\ast}$	0.35 $\pm$ 0.03 ${}^{\ast}$	0.56 $\pm$ 0.08	0.48 $\pm$ 0.08 ${}^{\ast}$	0.47 $\pm$ 0.06 ${}^{\ast}$	0.36 $\pm$ 0.03 ${}^{\ast}$	0.67 $\pm$ 0.09
	Automobile	0.44 $\pm$ 0.10 ${}^{\ast}$	0.41 $\pm$ 0.10 ${}^{\ast}$	0.42 $\pm$ 0.11 ${}^{\ast}$	0.50 $\pm$ 0.12	0.47 $\pm$ 0.12 ${}^{\ast}$	0.43 $\pm$ 0.07 ${}^{\ast}$	0.41 $\pm$ 0.09 ${}^{\ast}$	0.63 $\pm$ 0.10
	Cleveland	0.29 $\pm$ 0.05 ${}^{\ast}$	0.29 $\pm$ 0.07	0.30 $\pm$ 0.07	0.32 $\pm$ 0.05	0.29 $\pm$ 0.05 ${}^{\ast}$	0.31 $\pm$ 0.06	0.30 $\pm$ 0.06 ${}^{\ast}$	0.34 $\pm$ 0.05
	Housing-5bin	0.50 $\pm$ 0.06 ${}^{\ast}$	0.51 $\pm$ 0.06 ${}^{\ast}$	0.48 $\pm$ 0.05 ${}^{\ast}$	0.58 $\pm$ 0.06	0.51 $\pm$ 0.04 ${}^{\ast}$	0.50 $\pm$ 0.07 ${}^{\ast}$	0.47 $\pm$ 0.06 ${}^{\ast}$	0.64 $\pm$ 0.05
	Stock-5bin	0.56 $\pm$ 0.05 ${}^{\ast}$	0.55 $\pm$ 0.05 ${}^{\ast}$	0.63 $\pm$ 0.05 ${}^{\ast}$	0.77 $\pm$ 0.04	0.56 $\pm$ 0.05 ${}^{\ast}$	0.58 $\pm$ 0.05 ${}^{\ast}$	0.64 $\pm$ 0.05 ${}^{\ast}$	0.83 $\pm$ 0.04
	Computer-5bin	0.53 $\pm$ 0.05 ${}^{\ast}$	0.51 $\pm$ 0.05 ${}^{\ast}$	0.49 $\pm$ 0.02 ${}^{\ast}$	0.58 $\pm$ 0.02	0.54 $\pm$ 0.04 ${}^{\ast}$	0.54 $\pm$ 0.03 ${}^{\ast}$	0.49 $\pm$ 0.04 ${}^{\ast}$	0.60 $\pm$ 0.02
	Winequality-red	0.19 $\pm$ 0.04 ${}^{\ast}$	0.17 $\pm$ 0.04 ${}^{\ast}$	0.23 $\pm$ 0.03 ${}^{\ast}$	0.25 $\pm$ 0.04	0.17 $\pm$ 0.04 ${}^{\ast}$	0.18 $\pm$ 0.04 ${}^{\ast}$	0.23 $\pm$ 0.03	0.23 $\pm$ 0.03
	Obesity	0.59 $\pm$ 0.07 ${}^{\ast}$	0.66 $\pm$ 0.06	0.46 $\pm$ 0.04 ${}^{\ast}$	0.65 $\pm$ 0.05	0.64 $\pm$ 0.06 ${}^{\ast}$	0.70 $\pm$ 0.05 ${}^{\ast}$	0.47 $\pm$ 0.04 ${}^{\ast}$	0.75 $\pm$ 0.04
	Housing-10bin	0.29 $\pm$ 0.06 ${}^{\ast}$	0.29 $\pm$ 0.04 ${}^{\ast}$	0.28 $\pm$ 0.05 ${}^{\ast}$	0.40 $\pm$ 0.05	0.29 $\pm$ 0.05 ${}^{\ast}$	0.30 $\pm$ 0.05 ${}^{\ast}$	0.30 $\pm$ 0.05 ${}^{\ast}$	0.42 $\pm$ 0.05
	Stock-10bin	0.29 $\pm$ 0.05 ${}^{\ast}$	0.33 $\pm$ 0.03 ${}^{\ast}$	0.49 $\pm$ 0.04 ${}^{\ast}$	0.62 $\pm$ 0.04	0.30 $\pm$ 0.06 ${}^{\ast}$	0.34 $\pm$ 0.03 ${}^{\ast}$	0.49 $\pm$ 0.04 ${}^{\ast}$	0.69 $\pm$ 0.04
	Computer-10bin	0.34 $\pm$ 0.04 ${}^{\ast}$	0.34 $\pm$ 0.07 ${}^{\ast}$	0.33 $\pm$ 0.03 ${}^{\ast}$	0.38 $\pm$ 0.02	0.35 $\pm$ 0.05 ${}^{\ast}$	0.36 $\pm$ 0.07 ${}^{\ast}$	0.34 $\pm$ 0.02 ${}^{\ast}$	0.40 $\pm$ 0.02
	p-value	1.33E-04				7.88E-05

Table 7

Classification results of the four compared methods when the number of initial training instances is set as $5K$ (the best results are marked in boldface)

Metric	Datasets	10K				20K
		SVLORR	LDLORR	KNNORR	AOCpair	SVLORR	LDLORR	KNNORR	AOCpair
MZE	SWD	0.62 $\pm$ 0.05 ${}^{\ast}$	0.65 $\pm$ 0.06 ${}^{\ast}$	0.63 $\pm$ 0.05 ${}^{\ast}$	0.50 $\pm$ 0.03	0.62 $\pm$ 0.03 ${}^{\ast}$	0.61 $\pm$ 0.06 ${}^{\ast}$	0.63 $\pm$ 0.05 ${}^{\ast}$	0.49 $\pm$ 0.03
	Car	0.46 $\pm$ 0.12 ${}^{\ast}$	0.37 $\pm$ 0.08 ${}^{\ast}$	0.48 $\pm$ 0.04 ${}^{\ast}$	0.18 $\pm$ 0.03	0.44 $\pm$ 0.10 ${}^{\ast}$	0.30 $\pm$ 0.07 ${}^{\ast}$	0.46 $\pm$ 0.04 ${}^{\ast}$	0.13 $\pm$ 0.02
	Automobile	0.51 $\pm$ 0.09 ${}^{\ast}$	0.61 $\pm$ 0.11 ${}^{\ast}$	0.53 $\pm$ 0.09 ${}^{\ast}$	0.37 $\pm$ 0.07	0.57 $\pm$ 0.08 ${}^{\ast}$	0.58 $\pm$ 0.08 ${}^{\ast}$	0.53 $\pm$ 0.09 ${}^{\ast}$	0.31 $\pm$ 0.07
	Cleveland	0.58 $\pm$ 0.09 ${}^{\ast}$	0.51 $\pm$ 0.07 ${}^{\ast}$	0.57 $\pm$ 0.10 ${}^{\ast}$	0.45 $\pm$ 0.07	0.52 $\pm$ 0.08 ${}^{\ast}$	0.48 $\pm$ 0.06 ${}^{\ast}$	0.56 $\pm$ 0.07 ${}^{\ast}$	0.44 $\pm$ 0.05
	Housing-5bin	0.48 $\pm$ 0.05 ${}^{\ast}$	0.50 $\pm$ 0.06 ${}^{\ast}$	0.48 $\pm$ 0.04 ${}^{\ast}$	0.38 $\pm$ 0.04	0.47 $\pm$ 0.05 ${}^{\ast}$	0.46 $\pm$ 0.05 ${}^{\ast}$	0.48 $\pm$ 0.03 ${}^{\ast}$	0.34 $\pm$ 0.04
	Stock-5bin	0.43 $\pm$ 0.05 ${}^{\ast}$	0.43 $\pm$ 0.06 ${}^{\ast}$	0.32 $\pm$ 0.05 ${}^{\ast}$	0.20 $\pm$ 0.03	0.44 $\pm$ 0.07 ${}^{\ast}$	0.40 $\pm$ 0.04 ${}^{\ast}$	0.31 $\pm$ 0.05 ${}^{\ast}$	0.16 $\pm$ 0.02
	Computer-5bin	0.47 $\pm$ 0.05 ${}^{\ast}$	0.45 $\pm$ 0.03 ${}^{\ast}$	0.50 $\pm$ 0.03 ${}^{\ast}$	0.42 $\pm$ 0.02	0.45 $\pm$ 0.04 ${}^{\ast}$	0.42 $\pm$ 0.03 ${}^{\ast}$	0.49 $\pm$ 0.03 ${}^{\ast}$	0.40 $\pm$ 0.02
	Winequality-red	0.73 $\pm$ 0.08 ${}^{\ast}$	0.73 $\pm$ 0.07 ${}^{\ast}$	0.64 $\pm$ 0.05 ${}^{\ast}$	0.45 $\pm$ 0.03	0.70 $\pm$ 0.07 ${}^{\ast}$	0.74 $\pm$ 0.07 ${}^{\ast}$	0.67 $\pm$ 0.05 ${}^{\ast}$	0.44 $\pm$ 0.02
	Obesity	0.36 $\pm$ 0.06	0.30 $\pm$ 0.07	0.49 $\pm$ 0.03 ${}^{\ast}$	0.33 $\pm$ 0.05	0.36 $\pm$ 0.07 ${}^{\ast}$	0.27 $\pm$ 0.06	0.48 $\pm$ 0.04 ${}^{\ast}$	0.26 $\pm$ 0.05
	Housing-10bin	0.68 $\pm$ 0.05 ${}^{\ast}$	0.68 $\pm$ 0.05 ${}^{\ast}$	0.68 $\pm$ 0.04 ${}^{\ast}$	0.60 $\pm$ 0.04	0.69 $\pm$ 0.05 ${}^{\ast}$	0.68 $\pm$ 0.03 ${}^{\ast}$	0.67 $\pm$ 0.05 ${}^{\ast}$	0.57 $\pm$ 0.03
	Stock-10bin	0.67 $\pm$ 0.04 ${}^{\ast}$	0.63 $\pm$ 0.04 ${}^{\ast}$	0.45 $\pm$ 0.04 ${}^{\ast}$	0.35 $\pm$ 0.03	0.68 $\pm$ 0.05 ${}^{\ast}$	0.65 $\pm$ 0.04 ${}^{\ast}$	0.44 $\pm$ 0.04 ${}^{\ast}$	0.30 $\pm$ 0.03
	Computer-10bin	0.64 $\pm$ 0.04 ${}^{\ast}$	0.62 $\pm$ 0.03	0.65 $\pm$ 0.02 ${}^{\ast}$	0.61 $\pm$ 0.02	0.64 $\pm$ 0.04 ${}^{\ast}$	0.61 $\pm$ 0.03	0.65 $\pm$ 0.02 ${}^{\ast}$	0.59 $\pm$ 0.02
	p-value	1.23E-04				3.18E-05
MAE	SWD	0.76 $\pm$ 0.10 ${}^{\ast}$	0.81 $\pm$ 0.11 ${}^{\ast}$	0.78 $\pm$ 0.06 ${}^{\ast}$	0.55 $\pm$ 0.03	0.75 $\pm$ 0.06 ${}^{\ast}$	0.73 $\pm$ 0.09 ${}^{\ast}$	0.79 $\pm$ 0.07 ${}^{\ast}$	0.55 $\pm$ 0.04
	Car	0.51 $\pm$ 0.13 ${}^{\ast}$	0.45 $\pm$ 0.12 ${}^{\ast}$	0.72 $\pm$ 0.09 ${}^{\ast}$	0.19 $\pm$ 0.03	0.47 $\pm$ 0.11 ${}^{\ast}$	0.34 $\pm$ 0.09 ${}^{\ast}$	0.67 $\pm$ 0.07 ${}^{\ast}$	0.13 $\pm$ 0.02
	Automobile	0.62 $\pm$ 0.14 ${}^{\ast}$	0.86 $\pm$ 0.24 ${}^{\ast}$	0.76 $\pm$ 0.17 ${}^{\ast}$	0.44 $\pm$ 0.10	0.70 $\pm$ 0.13 ${}^{\ast}$	0.82 $\pm$ 0.17 ${}^{\ast}$	0.75 $\pm$ 0.18 ${}^{\ast}$	0.37 $\pm$ 0.09
	Cleveland	0.93 $\pm$ 0.17 ${}^{\ast}$	0.81 $\pm$ 0.12 ${}^{\ast}$	0.88 $\pm$ 0.16 ${}^{\ast}$	0.64 $\pm$ 0.09	0.82 $\pm$ 0.15 ${}^{\ast}$	0.72 $\pm$ 0.09 ${}^{\ast}$	0.84 $\pm$ 0.14 ${}^{\ast}$	0.59 $\pm$ 0.07
	Housing-5bin	0.56 $\pm$ 0.07 ${}^{\ast}$	0.65 $\pm$ 0.10 ${}^{\ast}$	0.61 $\pm$ 0.05 ${}^{\ast}$	0.44 $\pm$ 0.05	0.56 $\pm$ 0.07 ${}^{\ast}$	0.56 $\pm$ 0.08 ${}^{\ast}$	0.61 $\pm$ 0.05 ${}^{\ast}$	0.39 $\pm$ 0.05
	Stock-5bin	0.49 $\pm$ 0.09 ${}^{\ast}$	0.49 $\pm$ 0.07 ${}^{\ast}$	0.36 $\pm$ 0.06 ${}^{\ast}$	0.20 $\pm$ 0.03	0.54 $\pm$ 0.16 ${}^{\ast}$	0.45 $\pm$ 0.04 ${}^{\ast}$	0.33 $\pm$ 0.05 ${}^{\ast}$	0.16 $\pm$ 0.02
	Computer-5bin	0.55 $\pm$ 0.07 ${}^{\ast}$	0.57 $\pm$ 0.06 ${}^{\ast}$	0.62 $\pm$ 0.06 ${}^{\ast}$	0.47 $\pm$ 0.03	0.52 $\pm$ 0.06 ${}^{\ast}$	0.50 $\pm$ 0.04 ${}^{\ast}$	0.62 $\pm$ 0.06 ${}^{\ast}$	0.44 $\pm$ 0.02
	Winequality-red	1.05 $\pm$ 0.14 ${}^{\ast}$	1.15 $\pm$ 0.21 ${}^{\ast}$	0.88 $\pm$ 0.09 ${}^{\ast}$	0.50 $\pm$ 0.03	0.99 $\pm$ 0.15 ${}^{\ast}$	1.17 $\pm$ 0.22 ${}^{\ast}$	0.92 $\pm$ 0.10 ${}^{\ast}$	0.48 $\pm$ 0.02
	Obesity	0.38 $\pm$ 0.07	0.34 $\pm$ 0.09	0.78 $\pm$ 0.09 ${}^{\ast}$	0.37 $\pm$ 0.07	0.39 $\pm$ 0.09 ${}^{\ast}$	0.29 $\pm$ 0.06	0.74 $\pm$ 0.08 ${}^{\ast}$	0.28 $\pm$ 0.06
	Housing-10bin	1.28 $\pm$ 0.21 ${}^{\ast}$	1.19 $\pm$ 0.17 ${}^{\ast}$	1.15 $\pm$ 0.15 ${}^{\ast}$	0.87 $\pm$ 0.08	1.24 $\pm$ 0.16 ${}^{\ast}$	1.16 $\pm$ 0.13 ${}^{\ast}$	1.11 $\pm$ 0.14 ${}^{\ast}$	0.80 $\pm$ 0.08
	Stock-10bin	1.28 $\pm$ 0.29 ${}^{\ast}$	1.01 $\pm$ 0.09 ${}^{\ast}$	0.56 $\pm$ 0.07 ${}^{\ast}$	0.39 $\pm$ 0.04	1.31 $\pm$ 0.21 ${}^{\ast}$	1.09 $\pm$ 0.13 ${}^{\ast}$	0.53 $\pm$ 0.05 ${}^{\ast}$	0.32 $\pm$ 0.03
	Computer-10bin	1.06 $\pm$ 0.12 ${}^{\ast}$	1.02 $\pm$ 0.11 ${}^{\ast}$	1.12 $\pm$ 0.06 ${}^{\ast}$	0.91 $\pm$ 0.06	1.06 $\pm$ 0.12 ${}^{\ast}$	0.98 $\pm$ 0.11 ${}^{\ast}$	1.10 $\pm$ 0.06 ${}^{\ast}$	0.85 $\pm$ 0.04
	p-value	2.22E-04				3.38E-05
F1	SWD	0.34 $\pm$ 0.04 ${}^{\ast}$	0.32 $\pm$ 0.05 ${}^{\ast}$	0.34 $\pm$ 0.04 ${}^{\ast}$	0.42 $\pm$ 0.03	0.34 $\pm$ 0.03 ${}^{\ast}$	0.36 $\pm$ 0.05 ${}^{\ast}$	0.34 $\pm$ 0.04 ${}^{\ast}$	0.44 $\pm$ 0.04
	Car	0.47 $\pm$ 0.10 ${}^{\ast}$	0.47 $\pm$ 0.07 ${}^{\ast}$	0.37 $\pm$ 0.04 ${}^{\ast}$	0.64 $\pm$ 0.08	0.49 $\pm$ 0.08 ${}^{\ast}$	0.54 $\pm$ 0.07 ${}^{\ast}$	0.38 $\pm$ 0.03 ${}^{\ast}$	0.71 $\pm$ 0.05
	Automobile	0.47 $\pm$ 0.09 ${}^{\ast}$	0.40 $\pm$ 0.12 ${}^{\ast}$	0.46 $\pm$ 0.09 ${}^{\ast}$	0.60 $\pm$ 0.08	0.41 $\pm$ 0.08 ${}^{\ast}$	0.42 $\pm$ 0.07 ${}^{\ast}$	0.46 $\pm$ 0.09 ${}^{\ast}$	0.67 $\pm$ 0.07
	Cleveland	0.28 $\pm$ 0.07 ${}^{\ast}$	0.32 $\pm$ 0.06	0.29 $\pm$ 0.07	0.33 $\pm$ 0.08	0.32 $\pm$ 0.06	0.33 $\pm$ 0.06	0.30 $\pm$ 0.06	0.33 $\pm$ 0.07
	Housing-5bin	0.52 $\pm$ 0.06 ${}^{\ast}$	0.49 $\pm$ 0.07 ${}^{\ast}$	0.51 $\pm$ 0.04 ${}^{\ast}$	0.62 $\pm$ 0.04	0.52 $\pm$ 0.06 ${}^{\ast}$	0.53 $\pm$ 0.06 ${}^{\ast}$	0.51 $\pm$ 0.04 ${}^{\ast}$	0.66 $\pm$ 0.04
	Stock-5bin	0.55 $\pm$ 0.07 ${}^{\ast}$	0.56 $\pm$ 0.06 ${}^{\ast}$	0.67 $\pm$ 0.05 ${}^{\ast}$	0.80 $\pm$ 0.03	0.53 $\pm$ 0.10 ${}^{\ast}$	0.59 $\pm$ 0.04 ${}^{\ast}$	0.69 $\pm$ 0.05 ${}^{\ast}$	0.84 $\pm$ 0.02
	Computer-5bin	0.52 $\pm$ 0.05 ${}^{\ast}$	0.54 $\pm$ 0.03 ${}^{\ast}$	0.50 $\pm$ 0.03 ${}^{\ast}$	0.58 $\pm$ 0.02	0.54 $\pm$ 0.05 ${}^{\ast}$	0.57 $\pm$ 0.03 ${}^{\ast}$	0.51 $\pm$ 0.03 ${}^{\ast}$	0.60 $\pm$ 0.02
	Winequality-red	0.18 $\pm$ 0.04 ${}^{\ast}$	0.18 $\pm$ 0.04 ${}^{\ast}$	0.24 $\pm$ 0.03 ${}^{\ast}$	0.29 $\pm$ 0.05	0.19 $\pm$ 0.04 ${}^{\ast}$	0.18 $\pm$ 0.04 ${}^{\ast}$	0.22 $\pm$ 0.03 ${}^{\ast}$	0.27 $\pm$ 0.06
	Obesity	0.63 $\pm$ 0.06	0.69 $\pm$ 0.07	0.50 $\pm$ 0.04 ${}^{\ast}$	0.67 $\pm$ 0.05	0.63 $\pm$ 0.07 ${}^{\ast}$	0.72 $\pm$ 0.06	0.51 $\pm$ 0.04 ${}^{\ast}$	0.74 $\pm$ 0.05
	Housing-10bin	0.30 $\pm$ 0.06 ${}^{\ast}$	0.30 $\pm$ 0.06 ${}^{\ast}$	0.32 $\pm$ 0.04 ${}^{\ast}$	0.40 $\pm$ 0.04	0.28 $\pm$ 0.06 ${}^{\ast}$	0.29 $\pm$ 0.03 ${}^{\ast}$	0.33 $\pm$ 0.05 ${}^{\ast}$	0.43 $\pm$ 0.03
	Stock-10bin	0.29 $\pm$ 0.05 ${}^{\ast}$	0.34 $\pm$ 0.04 ${}^{\ast}$	0.54 $\pm$ 0.04 ${}^{\ast}$	0.64 $\pm$ 0.03	0.28 $\pm$ 0.05 ${}^{\ast}$	0.33 $\pm$ 0.04 ${}^{\ast}$	0.55 $\pm$ 0.04 ${}^{\ast}$	0.70 $\pm$ 0.03
	Computer-10bin	0.34 $\pm$ 0.05 ${}^{\ast}$	0.36 $\pm$ 0.04 ${}^{\ast}$	0.35 $\pm$ 0.02 ${}^{\ast}$	0.39 $\pm$ 0.02	0.34 $\pm$ 0.05 ${}^{\ast}$	0.38 $\pm$ 0.04 ${}^{\ast}$	0.35 $\pm$ 0.02 ${}^{\ast}$	0.41 $\pm$ 0.02
	p-value	1.43E-04				2.00E-05

that the proposed method AOCpair consistently outperforms the competitors on all the datasets about the metrics MZE, MAE, and F1 under the given pairwise query budgets as $10K$ and $20K$ . The results in Tables 6 and 7 show that when the size of the initial training set is $3K$ and $5K$ , our method is still superior to the competitors on most datasets.

To detect whether there are significant differences between the compared methods, we have conducted the Friedman test [39] on the three metrics at a confidence level of $\alpha=0.05$ . The $p$ -values of the Friedman tests are shown in the three tables. Since the $p$ -values are all less than $\alpha$ , statistically significant differences exist between at least two methods on each performance measure under different performing conditions. Furthermore, to detect whether a compared method performs significantly differently from the AOCpair on each dataset under different performing conditions, we conduct the Wilcoxon signed-rank test [40] between AOCpair and the competitors at a confidence level of $\alpha=0.05$ . The marker “ $\ast$ ” denotes that there is a statistically significant difference. The results of the Wilcoxon tests in the three tables demonstrate that the AOCpair significantly outperforms the compared methods on most datasets under different performing conditions.

The above experimental results illustrate that the proposed method can use the query resources more effectively. Since the cost of the pairwise query is often non-negligible in practical applications, the proposed method can further reduce the cost of inducing a good ordinal classifier. Multiple factors result in the outstanding performance of the proposed method. The proposed method builds each query pair with an informative unlabeled instance and a labeled instance. This setup ensures that the annotator can provide useful information in each pairwise query. The reasoning rules can reduce the class interval of the critical unlabeled instance after each pairwise query. In particular, the proposed method employs a margin-based critical instance selection method to select the most informative instance for each query pair; this ensures the informativeness of the class-interval labeled training instances. In addition, the ECM-based labeled instance selection method can quickly reduce the critical instances’ class interval, enabling the algorithm to produce more labeled instances, which benefits the learning of a good ordinal classifier.

Figures 2–4 show the learning curves of the AOCpair method with different query pair selection strategies on metrics MZE, MAE, and F1. The results show that the AOCpair-ECM generally performs better on most datasets. This indicates that the ECM strategy is more effective than the other four strategies in query pair building. The ECM strategy employs a cost minimization idea by considering the current and future costs of pairwise queries so that more explicit labels can be obtained under a given pairwise query budget, thus leading to a better learning performance.

Figure 2.

Learning curves of MAE for the AOCpair methods with different query pair selection strategies.

Figure 3.

Learning curves of MAE for the AOCpair methods with different query pair selection strategies.

Figure 4.

Learning curves of F1 for the AOCpair methods with different query pair selection strategies.

4.3 Feasibility analysis

In this subsection, we will discuss the conditions under which our pairwise query active ordinal classification method is preferable to the pointwise query one.

Suppose the price per pointwise query is $P\in\mathbb{R}^{+}$ . The proposed method costs $\mathcal{O}(\lfloor\frac{K}{2}\rfloor)$ times pairwise queries to obtain one explicit label in the worst situation. Therefore, if the price per pairwise query is lower than $\frac{P}{\lfloor\frac{K}{2}\rfloor}$ , the proposed method is theoretically preferable to the pointwise query method, i.e., the margin-based sampling in Section 3.3.1. As mentioned in Section 3.3.2, the method AOCpair-Med costs $\lfloor\log_{2}K\rfloor$ times pairwise queries to obtain one explicit label in the worst situation. The learning curves in Figs 2–4 show that the proposed method is generally superior to AOCpair-Med. Therefore, we can empirically conclude that if the pairwise query price is lower than $\frac{P}{\lfloor\log_{2}K\rfloor}$ , the proposed method will be superior to the traditional pointwise query paradigm.

To verify the above analysis, we plot the learning curves of MAE for the AOCpair-based pairwise query active learning and the margin-based pointwise query active learning in Fig. 5. The maximum pointwise query budget is set as $10K$ . When both methods achieve the same performance, we can see that the pairwise query method consumes no more than $\lfloor\log_{2}K\rfloor$ times the number of queries consumed by the pointwise method. This is consistent with the above analysis.

Figure 5.

Learning curves of MAE for pairwise query and pointwise query on the twelve datasets.

5. Conclusions

This paper proposes a pairwise query active learning method for ordinal classification. The proposed method can learn a promising ordinal classifier by soliciting low-cost relative information, reducing the labeling cost of active ordinal classification. By introducing the concept of class interval and two reasoning rules, we convert the labeled instances and the instance-pair relation information into class interval-labeled training instances. Then a new ordinal classification model is designed to learn with the class interval-labeled training instances. For query pair selection, we specify that each query pair is built with an informative unlabeled instance and a labeled instance. Consequently, the class interval of the critical unlabeled instance can be reduced after each pairwise query and reasoning. The informative unlabeled instance in each query pair is selected by a margin-based sampling method. This ensures the informativeness of the class interval-labeled training instances. The corresponding labeled instance is selected based on an expected cost minimization strategy. This strategy is beneficial for quickly reducing the critical instances’ class interval, enabling the algorithm to produce more labeled instances. Finally, the experiments on several public datasets demonstrate the effectiveness of the proposed method.

Two works merit further investigation: (1) In our method, the critical instance in each query pair is selected based on a margin-based sampling method. This method can be replaced with more robust active instance selection methods, thus achieving a better pairwise query active learning performance. In ordinal classification, the ordering information that the cost of misclassifying an instance as an adjacent class is lower than that of misclassifying it as a more disparate class is commonly used in the solution. We can use this ordering information to design a better active instance selection method. (2) There is no guarantee that human annotators always provide ground-truth information in real applications. Therefore, developing a pairwise query active ordinal classification method that can learn with noise relative information is worthwhile.

Footnotes

Acknowledgments

This work was supported by Chongqing Key Laboratory of Computational Intelligence.

References

Tang

Pérez-Fernández

and De Baets

, Fusing absolute and relative information for augmenting the method of nearest neighbors for ordinal classification, Information Fusion 56 (2020), 128–140.

Georgoulas

Karvelis

Gavrilis

Stylios

C.D.

and Nikolakopoulos

, An ordinal classification approach for CTG categorization, in: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju Island, South Korea, July 11–15, 2017, IEEE, 2017, pp. 2642–2645.

Feldmann

and Konig

, Ordinal classification in medical prognosis, Methods of Information in Medicine 41(02) (2002), 154–159.

Kim

and Ahn

, A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach, Computers and Operations Research 39(8) (2012), 1800–1811.

Tong

and Koller

, Support vector machine active learning with applications to text classification, Journal of Machine Learning Research 2 (2001), 45–66. doi: 10.1162/153244302760185243.

Chen

Wang

and Chang

Y.I.

, Active learning in multiple-class classification problems via individualized binary models, Computational Statistic and Data Analysis 145 (2020), 106911.

Santos

D.P.D.

Prudêncio

R.B.C.

and Carvalho

A.C.P.D.L.F.D.

, Empirical investigation of active learning strategies, Neurocomputing 326–327 (2019), 15–27.

Zhu

and Zhang

, Active learning without knowing individual instance labels: A pairwise label homogeneity query approach, IEEE Transactions on Knowledge and Data Engineering 26(4) (2014), 808–822. doi: 10.1109/TKDE.2013.165.

Xue

and Hauskrecht

, Active learning of multi-class classification models from ordered class sets, in: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019, AAAI Press, 2019, pp. 5589–5596.

10.

Chen

Zhang

Hou

and Yuan

, Active learning for imbalanced ordinal regression, IEEE Access 8 (2020), 180608–180617.

11.

Sheth

D.Y.

and Rajkumar

, Active ranking from pairwise comparisons with dynamically arriving items and voters, in: CoDS-COMAD 2020: 7th ACM IKDD CoDS and 25th COMAD, Hyderabad India, January 5–7, 2020, ACM, 2020, pp. 229–233.

12.

Sader

Verwaeren

Pérez-Fernández

and Baets

B.D.

, Integrating expert and novice evaluations for augmenting ordinal regression models, Information Fusion 51 (2019), 1–9. doi: 10.1016/j.inffus.2018.10.012.

13.

Tang

Pérez-Fernández

and Baets

B.D.

, A comparative study of machine learning methods for ordinal classification with absolute and relative information, Knowledge-Based Systems 230 (2021), 107358. doi: 10.1016/j.knosys.2021.107358.

14.

and Lin

H.T.

, Ordinal regression by extended binary classification, in: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4–7, 2006 Schölkopf

Platt

J.C.

and Hofmann

, eds, MIT Press, 2006, pp. 865–872.

15.

Min

Cai

and Zhou

, Laplacian optimal design for image retrieval, in: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23–27, 2007 Kraaij

de Vries

A.P.

Clarke

C.L.A.

Fuhr

and Kando

, eds, ACM, 2007, pp. 119–126. doi: 10.1145/1277741.1277764.

16.

Jing

Zhang

and Zhang

, Entropy-based active learning with support vector machines for content-based image retrieval, in: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, ICME 2004, 27–30 June 2004, Taipei, Taiwan, IEEE Computer Society, 2004, pp. 85–88.

17.

Culotta

and McCallum

, Reducing labeling effort for structured prediction tasks, in: Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, July 9–13, 2005, Pittsburgh, Pennsylvania, USA Veloso

M.M.

and Kambhampati

, eds, AAAI Press/The MIT Press, 2005, pp. 746–751.

18.

Seung

H.S.

Opper

and Sompolinsky

, Query by committee, in: Proceedings of the Fifth Annual ACM Conference on Computational Learning Theory, COLT 1992, Pittsburgh, PA, USA, July 27–29, 1992, ACM, 1992, pp. 287–294.

19.

Kee

Castillo

E.D.

and Runger

, Query-by-committee improvement with diversity and density in batch active learning, Information Sciences 454-455 (2018), 401–418. doi: 10.1016/j.ins.2018.05.014.

20.

Roy

and McCallum

, Toward Optimal Active Learning through Sampling Estimation of Error Reduction, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28– July 1, 2001, Morgan Kaufmann, 2001, pp. 441–448.

21.

Park

S.H.

and Kim

S.B.

, Robust expected model change for active learning in regression, Applied Intelligence 50(2) (2020), 296–313. doi: 10.1007/s10489-019-01519-z.

22.

Dasgupta

and Hsu

, Hierarchical sampling for active learning, in: Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5–9, 2008 Cohen

W.W.

McCallum

and Roweis

S.T.

, eds, ACM International Conference Proceeding Series, Vol. 307, ACM, 2008, pp. 208–215.

23.

Wang

Min

Zhang

and Wu

, Active learning through density clustering, Expert Systems with Applications 85 (2017), 305–317.

24.

and Tresp

, Active learning via transductive experimental design, in: Machine Learning, Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25–29, 2006, ACM International Conference Proceeding Series, Vol. 148, ACM, 2006, pp. 1081–1088. doi: 10.1145/1143844.1143980.

25.

Park

S.H.

and Kim

S.B.

, Active semi-supervised learning with multiple complementary information, Expert Systems with Applications 126 (2019), 30–40. doi: 10.1016/j.eswa.2019.02.017.

26.

Soons

and Feelders

, Exploiting monotonicity constraints in active learning for ordinal classification, in: Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, April 24–26, 2014, SIAM, 2014, pp. 659–667.

27.

Wang

and Qian

, Fusing complete monotonic decision trees, IEEE Transactions on Knowledge and Data Engineering 29(10) (2017), 2223–2235.

28.

Mazumdar

and Saha

, Query complexity of clustering with side information, in: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, 2017, pp. 4682–4693.

29.

Davidson

Qian

Wang

and Wang

, Active learning to fank using pairwise supervision, in: Proceedings of the 13th SIAM International Conference on Data Mining, May 2–4, 2013. Austin, Texas, USA, SIAM, 2013, pp. 297–305.

30.

Gutiérrez

P.A.

Pérez-Ortiz

Sánchez-Monedero

Fernández-Navarro

and Hervás-Martínez

, Ordinal regression methods: survey and experimental study, IEEE Transactions on Knowledge and Data Engineering 28(1) (2016), 127–146.

31.

Chien

Zhou

and Li

, HS

{}^{2}

: active learning over hypergraphs with pointwise and pairwise queries, in: The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16–18 April 2019, Naha, Okinawa, Japan, Vol. 89, PMLR, 2019, pp. 2466–2475.

32.

Tang

Pérez-Fernández

and Baets

B.D.

, Distance metric learning for augmenting the method of nearest neighbors for ordinal classification with absolute and relative information, Information Fusion 65 (2021), 72–83. doi: 10.1016/j.inffus.2020.08.004.

33.

McCullagh

, Regression models for ordinal data, Journal of the Royal Statistical Society: Series B (Methodological) 42(2) (1980), 109–127. doi: 10.1111/j.2517-6161.1980.tb01109.x.

34.

Huang

G.B.

Zhou

Ding

and Zhang

, Extreme learning machine for regression and multiclass classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42(2) (2012), 513–529. doi: 10.1109/TSMCB.2011.2168604.

35.

Seah

C.W.

Tsang

T.W.

and Ong

Y.S.

, Transductive ordinal regression, IEEE Transactions on Neural Networks and Learning Systems 23(7) (2012), 1074–1086.

36.

Yang

and Loog

, A benchmark and comparison of active learning for logistic regression, Pattern Recognition 83 (2018), 401–415.

37.

, Active learning for ordinal classification on incomplete data, Intelligent Data Analysis (2023). doi: 10.3233/IDA-226664.

38.

Gutiérrez

P.A.

and García

, Current prospects on ordinal and monotonic classification, Progress in Artificial Intelligence 5(3) (2016), 171–179.

39.

Friedman

, A comparison of alternative tests of significance for the problem of m rankings, The Annals of Mathematical Statistics 11(1) (1940), 86–92.

40.

Wilcoxon

, Individual comparisons by ranking methods, Biometrics Bulletin 1(6) (1945), 80–83.

Active ordinal classification by querying relative information

Abstract

Keywords

1. Introduction

2. Related work

2.1 Active learning

2.2 Ordinal classification with absolute and relative information

2.3 Reduction-based ordinal classification framework

3.1 Problem setting and method overview

(Class Interval).

.

.

3.3.1 Critical unlabeled instance selection

.

Proof..

4. Experiment

4.1 Datasets and experimental configuration

Table 4 Information of the used datasets

Table 5 Classification results of the four compared methods when the number of initial training instances is set as K (the best results are marked in boldface)

Footnotes

Acknowledgments

References

Table 4
Information of the used datasets

Table 5
Classification results of the four compared methods when the number of initial training instances is set as $K$ (the best results are marked in boldface)