Abstract
In multi-label classification settings, one of the most common problems is the massive label output space. To alleviate this, some methods opt to exploit label correlations to reduce the output space during prediction. However, these methods sacrifice efficiency or ignore global label correlations. In addition, label imbalances are another problem that is prevalent in multi-label classification. Current methods of correcting for imbalance oftentimes use single-label methods, which fail to consider label correlations. In this paper, we introduce general frameworks that incorporate topic modeling to seamlessly address both problems. We show that these frameworks can allow even the most naïve methods, such as Binary Relevance, to perform similarly to state-of-the-art methods. Furthermore, we show that our frameworks can also adapt state-of-the-art methods to perform better than the methods by themselves.
Introduction
In many real-world applications, objects are oftentimes members of several different groups, which makes the single label assumption of multiclass classification unrealistic. For example, an image may be annotated with classes “sea” and “sunset”, or a document maybe annotated with “culture” and “history”. Multi-label classification is the problem of categorizing these objects, or instances, into any number of classes. Formally speaking, let
However, there are two primary challenges that are commonly associated with multi-label classification:
Size of the output space. Due to the nature of multi-label classification, the output space of the learning problem is massive. For example, the number of possible label sets for 20 class labels is 2
Label imbalance. This is the problem in which one class may label many more instances than other classes. Although label imbalance is a complication that exists in many machine learning problems, it is more complex in multi-label classification settings [25]. Some methods attempt to generalize to the domain of the multi-label classification problem by using traditional single-label methods for correcting imbalance, such as under- and oversampling. However, these methods are incapable of exploiting label correlations. Others reconstruct the specific multi-label algorithm to adapt to the imbalance problem. However, these methods fail to generalize to other methods.
In this paper, we first introduce our work, LDAML, which specifically tackles massive output spaces with label correlation exploitation. LDAML is a general framework that can adapt and improve most multi-label classification methods. By exploiting global correlations with topic modeling, we showed that LDAML can improve naïve methods such as Binary Relevance to the point that they are on par with state-of-the-art methods. Our method accomplishes this without significantly increasing the time cost. However, LDAML does not take into account the label imbalance problem. Then we introduce an extended framework that builds upon LDAML while also addressing label imbalance.
In the rest of this paper, we briefly introduce related work, present the new framework, report on our experiments, and finally conclude our work.
In the past few decades, many algorithms have been proposed to solve multi-label classification problems. These learning algorithms can be categorized into two major types [25].
Problem transformation methods. The key idea of this category is to change the data to fit the algorithm. A classic method of this type is Binary Relevance (BR) [2]. BR converts the multi-label classification problem to a single-label classification problem by independently training one binary classifier for each label.
Algorithm adaptation methods. The key idea of this category is to change the algorithm to fit the data. Standard algorithms in this category include MLkNN [24], which fits single-label
Corresponding to problem 1 in the introduction, exploiting correlations among labels can drastically reduce the size of the output space, thus improve learning performance. A large number of methods that explicitly consider label correlations has been proposed. Some of these methods, such as CLR [8] and COCOA [23], utilize pairwise label correlations. CLR ranks labels by generating all possible label pairs. COCOA predicts labels by building several multiclass imbalance learners to pair with other labels. However, these label correlations consider only pairwise relationships, which are expensive to compute, especially with massive label spaces.
Another category of methods exploits global label correlations with different structures. Classifier Chains (CC) [14] transform multi-label classification problems into chains of binary classification problems. To find the optimal order of label chains, several follow-up algorithms have been proposed based on CC: Ensembles of Classifier Chains (ECC) [14], Entropy Chain Classifiers (ETCC) [13], and Group sensitive Classifier Chains for multi-label classification(GCC) [11]. Random
In addition, corresponding to problem 2 in the introduction, label imbalance in multi-label classification can be viewed from two perspectives [21]: imbalance among different labels and imbalance within individual labels. The former is related to the greatly differing rate of positive instances of different labels. The latter only depends on the degree of imbalance of each individual label.
One technique is to transform the label imbalance problem in multi-label learning into the class imbalance problem in single-label learning, and then solving it with techniques such as under- and over-sampling, ensemble learning, and cost-sensitive learning [9, 22, 16, 12]. Some methods, such as ML-ROS, ML-RUS [3], and Multi-Label Synthetic Minority Over-sampling Technique (MLSMOTE) [4], directly apply techniques that solve the class imbalance problem in single-label learning to multi-label learning. Some methods also try to build training sets based on the inherent properties of majority class to solve the imbalance problem.
Moreover, imbalance-aware algorithms have recently been proposed. COCOA [23] builds a binary-class imbalance classifier for the current class, aggregating it with several multiclass imbalance classifiers for other classes to make final predictions. [21] solved the label imbalance problem in multi-label learning by formulating the problem as a constrained minimization consisting of a submodular objective function. [6] introduced an extension of structured forests and proposed a imbalance-aware formulation by altering the manner in which splitting functions are learned.
LDAML framework
Our approach to the problem of multi-label classification is to exploit label correlations by introducing topic modeling, specifically in the form of latent Dirichlet allocation (LDA) [1]. Unlike traditional methods that exploit correlations with label subsets or label chains, LDAML achieves this goal by discovering abstract label topics. As in the introduction section, let
Exploiting label topics from the training set. We introduce LDA into training set
Choose the label topic number Choose Choose For each label
Choose a topic Choose a label
Then, we compute the instance-topic probability distribution matrix
Discrete distribution of topics. After calculating
Discretization of the instance-topic probability distribution matrix[1] The instance-topic probability distribution matrix
Predict topics of test examples. We assume that the topic probability distribution of the test dataset is identical to the training dataset. Therefore, the topic of the test dataset can be predicted with the training dataset. Let
The choice of topic number
Predict labels of test examples. Since we assume that the label topics introduce label correlation information, we augment the original feature space with the topic space as features in the label prediction stage. Let
LDAML[1]
As stated in the introduction, there are two key challenges in the multi-label learning, one of which is the huge number of possible label sets for prediction in multi-label data, which is exponential to the size of the label space. To address this, the LDAML framework focuses on exploiting correlations among class labels with topic modeling. Another key challenge is the class imbalance problem, which typically leads to performance degradation in multi-label learning. To tackle this problem, we propose the external framework, LDAML-IMB, which is based on LDAML.
Coupling labels with topics. In the LDAML framework, we obtained the topic distribution of each label set. We couple each label
In total, there are four possible classes, which are the four possible combinations of
Building multiclass imbalance learners.
The single label imbalance learning techniques such as random or synthetic under- or over-sampling can be applied to the multi-label class dataset
Computing whole confidence of learners.
For each class label
Here, the
Computing the best threshold of confidence.
In traditional methods, the threshold
where
Algorithm 3.2 summarizes the complete procedure of the proposed LDAML-IMB framework.
LDAML-IMB[1]
In this section, we propose LDAML-IMB framework, which aims to exploit the label correlations as well as deal with the label imbalance problem. The first part, LDAML, exploits the label correlations with topic models, and the second part adjusts the first part with the imbalance problem with several multi-class imbalance learners coupling with the topics from LDAML. The LDAML-IMB can degenerate to the LDAML if we ignore the second part.
Dataset description
We use statistical metrics commonly used to describe other multi-label datasets to describe our own. Given a multi-label dataset
Statistical descriptions of the dataset
Statistical descriptions of the dataset
In this paper, we chose four benchmark multi-label datasets to use in our experiments. These datasets cover two types of data: music (CAL500, Emotions) and images (Corel5k, flags). Meanwhile, these data sets also cover different scales of instance and label class. So the experiment on these experimental data is typical and convincing.
We compared LDAML to several state-of-the-art learning algorithms. We selected algorithms to cover nearly all categories of multi-label classification. From the transformation method categories, we compared with the lazy method MLkNN [24]; transform to binary classification CC and ECC; transform to label ranking method CLR [8] and transform to multi-class classification method RAkEL [18] and COCOA [23]. Notably, COCOA also considers the imbalance problem in addition to label correlations. From the labels correlations categories, BR and MLkNN are the first-order method, CLR is the second-order method, while CC, ECC, COCOA, and RAkEL are high-order methods.
Experimental settings
We designed two groups of experiments to compare the algorithms. The first group aimed to show the performance with increasing topic number
We instantiated all of the algorithms as follows. We adopted the SMO method, which is provided by the widely-used Weka platform, as the base binary classifier in all multi-label methods [10]. The representative comparison multi-label algorithms, such as MLKNN, CLR, ECC, and RAKEL, are provided by the MULAN multi-label learning library [19].
Experiment results
In multi-label scenarios, there are many metrics for evaluation, just like Subset Accuracy, Hamming Loss, One error, Coverage, Ranking Loss etc. However, the F-measure, which integrates both precision and recall, is the most-used evaluation metric. This is because it can provide better insight into classification performance than conventional metrics such as accuracy [7]. We choose both micro- and macro-averaged F-measure as evaluation metrics, since they are computed by instances and labels, respectively. On the other hand, in class imbalance scenarios, macro-averaged F-measure and macro-AUC are the most-used evaluation metrics. Thus, we evaluated each method’s performance with three metrics: the micro-averaged F-measure, macro-averaged F-measure, and macro-AUC. The micro- and macro-averaged F-measures are label-based measures that separately evaluate the generalization performance of each class label’s predictor, while macro-AUC is a label-based ranking metric. For all three metrics, a higher value indicates better performance.
LDAML(H)-K refers to the LDAML framework, where
Experiment results with different topic numbers on CAL500 (
)
Experiment results with different topic numbers on CAL500 (
From the result in Table 2, we observe the following:
Even the simplest multi-label algorithm, BR, can achieve better performance across all evaluation metrics by utilizing the LDAML framework. By using the LDAML-IMB framework, we obtain significant increases in performance, especially when evaluating with the imbalance metrics macro-averaged F-measure and macro-AUC. With increasing topic number
The LDAML results indicate that even though BR performs the worst, the LDAML framework can achieve similar or better performance than state-of-the-art algorithms by exploiting label correlations to enhance BR. More importantly, the only additional time cost is to compute the topic probability distribution matrix in a limited word space (label space). Thus, the time cost of LDAML is closed to the base multi-label classifier in LDAML, which implies that by using LDAML, we can achieve similar or even better performance than the state-of-the-art algorithms, while the time cost is similar to that of faster, naïve methods.
Experiment results on Corel5K (
Experiment results on the dataset Emotions (
Experiment results on the dataset flags (
The LDAML-IMB results show that the imbalance metrics macro-averaged F-measure and macro-AUC can be significant improved, the other metrics are also improved except that sometimes. Although LDAML-IMB sacrifices efficiency to build
It is worth noting that the performance may initially increase with the number of topics, but will eventually converge or even decrease. This is because when topics are introduced into the feature space, the new label correlation information may overlap with a previous topic. Furthermore, the topic prediction is completely accurate, and errors may be introduced with the new information. Therefore, the LDAML and LDAML-IMB frameworks perform best on dataset with many label classes and correlations.
Tables 3–5 report the results of the second group of experiments, with the best result marked in bold. Again, the win/loss row indicates the number of times our frameworks are superior/inferior to the compared algorithm on each dataset. And We also show the runtime of each method (ms). In this group of experiments, we fixed the number of topics in the LDAML framework at 2. We focus comparing each comparing multi-label algorithm
We conclude our experiments as follows: 1) The LDAML and LDAML-IMB are general framework which can adapt almost all the multi-label methods. We can adopt state-of-the-art method as the basic method in LDAML to make a breakthrough from the state-of-the-art performance. We can also adopt simple method as the basic method to improve the performance close to the state-of-the-art performance in low time cost; 2) The LDAML can be regarded as a special and simple version of LDAML-IMB which ignore the class-imbalance problem. Compared with LDAML-IMB, the performance of LDAML is improved less but the time cost is also less than LDAML-IMB. So the framework can be flexible selected according to the actual situation; 3) The LDAML-IMB can achieve significant improvement base on LDAML by correcting the class-imbalance. The performance of LDAML-IMB is more effective but with more time cost than LDAML; 4) With the topics increasing, the performance of our framework can be consistent growth except that sometimes. These results indicate that LDAML can provide robust and preferable solutions in multi-label problem.
In this paper, we proposed an efficient and effective framework, LDAML, for multi-label classification by first exploiting label correlations to improve basic multi-label classifications. Next, we proposed an external framework, LDAML-IMB, based on LDAML to correct the class imbalance problem. Extensive experiments across benchmark datasets show that these frameworks perform better than state-of-the-art algorithms, especially in terms of imbalance-specific metrics such as the macro-averaged F-measure and macro-AUC. Furthermore, we showed that different base multi-label methods in different frameworks (LDAML/LDAML-IMB) with different topic numbers improve performance to varying degrees at varying time costs. Overall, our frameworks can achieve performances comparable to that of most state-of-the-art methods, but at a much lower time cost.
Footnotes
Acknowledgments
This paper is supported by the National Key Research and Development Program of China (grant no. 2016YFB1001102), the National Natural Science Foundation of China (grant nos. 61375069, 61403156, 61502227), the Collaborative Innovation Center of Novel Software Technology and Industrialization at Nanjing University and the Fundamental Research Funds for the Central Universities (grant nos. 020214380036, 020214380038, 020214380040).
