Abstract
Under the framework of multi-label classification, the excessive training time restricts the availability of non-linear kernel SVM (Support Vector Machine) classification algorithm on large-scale data sets. To solve this problem, this paper provides a fast multi-label SVM classification algorithm based on approximate extreme points (AEMLSVM). Firstly, it utilizes the approximate extreme point technique to obtain representative sets from the training data set. These representative sets not only retain almost all information of the training data set, but also its size is much smaller than that of training data set. After that, SVM is trained on the representative sets. Furthermore, the improved AEMLSVM algorithm (AEMLSVM-DEC) adopts DEC (Different Error Costs) technique to solve the label data imbalanced problem. We have conducted extensive experiments on four large-scale benchmark data sets. The results show that the proposed algorithms can effectively reduce training time, and their classification performance is similar to that of the traditional multi-label SVM algorithm. They outperform other scalable multi-label SVM algorithms in training time and classification performance. By adopting DEC method to solve the label data imbalanced problem, the AEMLSVM-DEC algorithm has a better classification performance than AEMLSVM algorithm.
Keywords
Introduction
Multi-label classification is an important research hotspot in the field of machine learning and data mining, in which each instance can have one or more labels simultaneously associated [7]. Specifically speaking, in the multi-label classification framework, assume
Traditional SVM [4] is a popular classification algorithm which can only deal with binary or multi-class classification problem. But improved algorithms like Rank-SVM algorithm [2] can deal with multi-label classification problem. Because most multi-label data sets are not linearly separable, multi-label classification algorithms based on SVM need non-linear kernel functions to achieve better classification performance, which further limits their availability on large-scale data sets. Moreover, it is accepted that most multi-label data sets are label imbalanced data sets [17], which has a negative impact on the classification performance.
In this paper, we utilize the advantages of BR (binary relevance) problem transformation strategy and binary SVM based on approximate extreme points (AESVM) [20] to construct a new fast multi-label SVM classification algorithm called AEMLSVM. AEMLSVM classification algorithm combined with non-linear kernel can be applied on large-scale data sets. The improved AEMLSVM (AEMLSVM-DEC) classification algorithm adopts DEC method [15] to solve the label data imbalanced problem. AEMLSVM and AEMLSVM-DEC classification algorithms utilize the approximate extreme point method to extract representative sets from the training data set. The size of representative sets is much smaller than that of the training data set, while it preserves almost all information of the training data set. After that, SVM classification algorithm is trained on the extracted representative sets. Experiment results on four large-scale benchmark data sets demonstrate that AEMLSVM and AEMLSVM-DEC classification algorithms have the fastest training speed compared with other multi-label classification methods, i.e., ML-BVM [9], ML-CVM [10] and ML-LIBSVM [3]. At the same time, their classification performance is similar to that of ML-LIBSVM method on the five evaluation metrics and better than that of ML-BVM and ML-CVM methods. By adopting DEC method to solve the label data imbalanced issue, the AEMLSVM-DEC classification algorithm has a better classification performance than AEMLSVM classification algorithm. We note that a short version of this work has been appeared in the BIGCOM- 2016 Workshop on the Best Paper Candidates topic [31].
The other parts of this paper are shown as follows. Some related works are introduced in Section 2; the AEMLSVM and AEMLSVM-DEC classification algorithms proposed in this paper are given in Section 3; the experiment results are analyzed in Section 4; Section 5 concludes this paper.
Related work
In recent years, many algorithms [1, 2, 12, 13, 19] have been put forward and used to solve the multi-label classification problem. Meanwhile, lots of algorithms [32, 33, 34] have also been proposed to overcome the label data imbalanced problem since label imbalanced data sets are very universal. SVM has been a popular algorithm in solving binary or multi-class classification problem. In this chapter, several existing multi-label classification algorithms are introduced firstly. Secondly, popular methods are introduced for dealing with label data imbalanced problem. Finally, the binary SVM is presented.
Multi-label classification algorithms
Currently, problem transformation strategy and algorithm adaptation strategy are two basic strategies for dealing with multi-label classification problem. The problem transformation strategy firstly transforms the multi-label problem into one or more single-label problems. After that, each single-label problem is solved with a single-label classification algorithm. Problem transformation strategy mainly includes BR (Binary Relevance) or OVR (One-Versus-Rest) [35], LP (Label Powerset) [21] and so on. Existing single-label classification algorithms include SVM [3], CVM [10], BVM [9], neural network, decision tree and K-nearest neighbor (KNN) [5], etc. The algorithm adaptation strategy extends a single-label classification algorithm in order to directly deal with multi-label data set. By using the algorithm adaptation strategy, many classical multi-label classification algorithms such as C4.5-type algorithm [1], BP-MLL algorithm [19], ML-KNN algorithm [13], Rank-SVM algorithm [2] and Rank-CVM algorithm [12] have been put forward and widely used in many filed.
Methods for solving label data imbalanced problem
It is a widely accepted that most multi-label data sets are label imbalanced data sets [17]. And label data imbalanced issue affects the performance of classification algorithms [8]. To overcome label data imbalanced problem, lots of algorithms have been put forward and they are classified into three types [36]. The first type of method acts on the data level and they are based on resampling the training data set to make the classifier receive the same proportion of instances each label [34]. Re-sampling method includes over-sampling and under-sampling methods. The second type of method acts on the algorithm level and they deal with the label data imbalanced problem by modifying the learning algorithm [32]. They can adjust the decision threshold and make it bias towards the minority label or assign larger costs to compensate for the minority. This is the basic principle of cost-sensitive learning. The third type of method is based on ensemble in which the basic ensemble method is modified to account for the underrepresentation of minority label in the training data set [33].
Binary SVM
Assume a binary data set includes
where
It is worth noting that
It can be concluded from above that although many algorithms have been put forward and used for multi-label classification, they are still restricted from being used on large-scale data sets and this problem is especially serious for those based on SVM. The AEMLSVM and AEMLSVM-DEC classification algorithms can resolve this problem effectively. They can greatly shorten the training time while achieve a comparable classification performance to that of the ML-LIBSVM method. In addition, the AEMLSVM-DEC classification algorithm also overcomes the label data imbalanced problem effectively by using the DEC method.
Firstly, we will introduce the binary relevance strategy. Secondly, we will introduce binary support vector machine based on approximate extreme points (AESVM). Thirdly, we will introduce the improved AESVM. Then, we will implement fast multi-label SVM classification algorithm based on approximate extreme points (AEMLSVM). In addition, the improved AEMLSVM (AEMLSVM-DEC) classification algorithm uses the DEC method to solve the label data imbalanced problem and this further improves the performance of the AEMLSVM classification algorithm. At last, the complexity of the AEMLSVM and AEMLSVM-DEC algorithms are analyzed.
Binary relevance strategy
We adopt the famous BR strategy to realize multi-label classification in this paper. Firstly, the BR strategy transforms the multi-label data set
where
To avoid getting a null relevant label set, the following rule is used [21].
where, if all
The main reason for adopting the BR strategy is that it has many obvious advantages. Firstly, it can use any existing binary classifier as base classifier to realize multi-label classification. Secondly, its complexity is linearly dependent on the amount of labels. Thirdly, it can be parallelized easily. Lastly, it can optimize multiple loss functions [23].
Before introduce the binary SVM based on approximate extreme points, the principle of extreme points will be introduced firstly. The convex hull of set
We can get from the formula that by adopting only
Therefore, Nandan et al. [20] introduce the concept of approximate extreme points. Meanwhile, he proposes a binary SVM based on approximate extreme points (AESVM). It can reduce the training time effectively while guaranteeing the accuracy of classification. The formula of unconstrained optimization problem of binary AESVM is as follows [31]:
where
When the BR problem transformation strategy is used on large-scale data sets, it can result in severe label data imbalanced problem and this problem deteriorates with the increase of labels [30]. Figure 1 shows the ratio of the 20 commonest labels in each multi-label data set. The attributes of these multi-label data sets are presented in Table 1. Many labels are represented by less than 20% of the total instances and some of those not shown in Fig. 1. It is a fact that most of these multi-label data sets have more than 20 labels. In a word, label data imbalanced problem occurs more often in large-scale data sets. Meanwhile, Although binary AESVM classification algorithm has good classification performance, it cannot perform well when the data set is imbalanced. Hence, we will combine the DEC method with binary AESVM to solve this problem. The DEC method is introduced firstly.
Ratio of the 20 commonest labels in each data set.
Binary SVM classification algorithm is sensitive to label data imbalanced problem because same misclassification cost are assigned to the positive and negative label misclassification problems in the penalty term of the soft margin objective function given in Eq. (1). This makes the separating hyperplane slope towards the minority label and leads to a suboptimal model. DEC method is proposed in [15] and it is a cost-sensitive learning technique to deal with the above problems in SVM. In the training of SVM, the soft margin objective function of SVM is modified by assigning different misclassification costs and shown in the following equation.
In Eq. (7),
To summarize, the advantages of DEC method and binary AESVM are fully used to reduce the training time effectively while guaranteeing the accuracy of classification. The following equation gives the optimization problem of the improved binary SVM based on approximate extreme points.
In order to simplify the Eq. (8) and facilitate the implementation of the following algorithms, we convert the Eq. (8) as follows.
Parameter
We will use the following strategy to obtain different penalty factors for the misclassification of instances with the majority and minority label.
Parameter
Therefore, we also realize a fast multi-label SVM classification algorithm by combining the BR problem transformation strategy with the improved binary SVM based on approximate extreme points [20]. These also will be introduced in 3.4. It is worth noting that improper value of
The proposed AEMLSVM and AEMLSVM-DEC classification algorithms firstly utilize the BR strategy to decompose a multi-label data set into
Algorithms 1 and 2 give the pseudocode of AEMLSVM algorithm and AEMLSVM-DEC algorithm respectively. In Algorithms 1 and 2,
[!t]
AEMLSVM[!t]
Complexity analysis of AEMLSVM and AEMLSVM-DEC classification algorithms
The time and space complexity of standard SVM training are
Experiments
In this section, the AEMLSVM and AEMLSVM-DEC algorithms are compared with three popular multi-label methods, ML-BVM, ML-CVM and ML-LIBSVM experimentally. Before the experimental results are presented, we firstly introduce the three popular multi-label methods, four benchmark data sets and five evaluation metrics.
Three popular multi-label methods
In the experiments, we choose three popular multi-label classification methods, ML-BVM, ML-CVM and ML-LIBSVM to compare with AEMLSVM and AEMLSVM-DEC. The reason to choose these three methods is that they also use the BR strategy to decompose the multi-label data set into many binary data subsets. The main difference among these methods is that they use different binary classification algorithms to deal with each binary data subset. ML-BVM method employs BVM [1] algorithm to process binary data subsets, ML-CVM method employs CVM [21] algorithm to process binary data subsets, and ML-LIBSVM method employs LIBSVM [10] algorithm to process binary data subsets. Then the classification results of the binary data subsets are integrated as the final multi-label classification result.
The characteristics of data sets
The characteristics of data sets
The five evaluation metrics on data set mediamill with the RBF kernel.
The four large-scale benchmark data sets downloaded from LIBSVM’s website [16] are used in the experiment. The characteristics of these data sets are shown in Table 1[31].
Evaluation metrics
Because each instance can be linked to multiple labels simultaneously, the evaluation metrics of multi-label classification algorithm are more complex than binary or multi-class classification algorithm. Hence, lots of specific evaluation metrics are developed to deal with this problem [13, 5, 24]. Five popular evaluation metrics are chosen in this paper and they are Hamming loss, One-error, Coverage, Ranking loss and Average-precision [5, 24] respectively.
The parameter setting with the RBF kernel
The parameter setting with the RBF kernel
Training and classifying time on data set mediamill with the RBF kernel.
The five evaluation metrics on data set siam with the RBF kernel.
Training and classifying time on data set siam with the RBF kernel.
Hamming loss: used to evaluate how many average times an instance-label pair is misclassified.
One-error: used to evaluate how many times the top-ranked label is not in the relevant label set.
Coverage: used to evaluate how many average steps are required to move along the ranked label list so as to cover all the relevant labels of an instance.
The five evaluation metrics on data set rcv1v2 with the RBF kernel.
Training and classifying time on data set rcv1v2 with the RBF kernel.
Ranking loss: used to evaluate the average label pairs that are misordered for an instance.
Average-precision: used to evaluate the average fraction that relevant labels are ranked higher than a particular label.
Briefly, the greater the value of average-precision, the better the multi-label classifier’s performance. The smaller the value of the other four metrics, the better the multi-label classifier’s performance.
The five evaluation metrics on data set eukaryoteGO with the RBF kernel.
Training and classifying time on data set eukaryoteGO with the RBF kernel.
In our experiment, we need to set parameters
Analysis of experimental results with the RBF kernel
In the experiment, we adopt the RBF (Radial Basis Function) kernel
It can be seen from Figs 2, 4, 6 and 8 that the classification results of AEMLSVM and AEMLSVM-DEC classification algorithms are comparable to that of the ML-LIBSVM method on the five evaluation metrics and superior to those of ML-BVM and ML-CVM methods. It can be concluded from Figs 3, 5, 7 and 9 that the training time of AEMLSVM algorithm is the shortest and it is only 1/24, 1/4, 1/6 and 1/4 of that of ML-LIBSVM method respectively. It can be seen that our algorithms can shorten the training time greatly while ensuring the classification performance is almost not reduced. This is mainly because our algorithm uses approximate extreme point technique to obtain representative sets, the size of which is far less than that of the training data sets. So the training time on the representative set is far less than that on the training data set. And, the representative set also keeps almost all information of the training data set, so the classification performance of the AEMLSVM and AEMLSVM-DEC algorithms are also comparable to that of ML-LIBSVM method. In addition, the classification results of AEMLSVM-DEC classification algorithm are all superior to that of AEMLSVM classification algorithm on the five evaluation metrics. This proves that by using the DEC method to solve the label data imbalanced problem, we can enhance the classification performance. And AEMLSVM-DEC classification algorithm spends 2%, 10%, 4.8% and 4.2% more training time than the AEMLSVM algorithm respectively. This means the improved classification performance is at the expense of training speed, but it is within acceptable limits. In addition, AEMLSVM and AEMLSVM-DEC classification algorithm have less testing time.
The parameter setting with the polynomial kernel
The parameter setting with the polynomial kernel
The five evaluation metrics on data set siam with the Polynomial kernel.
Training and classifying time on data set siam with the Polynomial kernel.
The five evaluation metrics on data set rcv1v2 with the Polynomial kernel.
Training and classifying time on data set rcv1v2 with the Polynomial kernel.
To validate our proposal of AEMLSVM and AEMLSVM-DEC as a fast candidate to multi-label SVM classification for each non-linear kernel, we also adopt the polynomial kernel
The five evaluation metrics on data set eukaryoteGO with the Polynomial kernel.
Training and classifying time on data set eukaryoteGO with the Polynomial kernel.
It can be seen from Figs 10, 12 and 14 that the classification results of AEMLSVM and AEMLSVM-DEC algorithms are comparable to that of the ML-LIBSVM method on the five evaluation metrics and superior to that of ML-CVM method. It can be concluded from Figs 11, 13 and 15 that the training time of AEMLSVM algorithm is the shortest and it is only 1/8, 1/10 and 1/4 of that of ML-LIBSVM method respectively. It shows that our algorithms can shorten the training time substantially and ensure the classification performance is almost not lowered. This is mainly because our algorithm adopts approximate extreme point technique to obtain representative sets. The size of the representative sets is far less than that of the training data sets. So the training time on the representative set is far less than that on the training data set. And, the representative set also keeps almost all information of the training data set, so the classification performance of the AEMLSVM and AEMLSVM-DEC algorithms is also comparable to that of ML-LIBSVM method. In addition, the classification results of AEMLSVM-DEC classification algorithm are all superior to that of AEMLSVM algorithm on the five evaluation metrics. This proves that by using the DEC method to solve the label data imbalanced problem, we can enhance the classification performance. And, AEMLSVM-DEC classification algorithm spends 4.3%, 9.8% and 20.5% more training time than AEMLSVM algorithm respectively. This means the improved classification performance is at the expense of training speed, but it is within acceptable limits. In addition, AEMLSVM and AEMLSVM-DEC classification algorithm have less testing time.
In a word, the experiment results demonstrate that the results of AEMLSVM and AEMLSVM-DEC algorithms are comparable to that of ML-LIBSVM method, and better than to that of other classification method with respect to the five evaluation metrics. Meanwhile, AEMLSVM classification algorithm not only has advantages in classification performance, but also shortens the training and classifying time. The AEMLSVM-DEC classification algorithm has obtained better classification results by using the DEC method to solve the label data imbalanced issue. In addition, the proposed AEMLSVM and AEMLSVM-DEC classification algorithms can be applied to all non-linear kernels. These will greatly improve the applicability of AEMLSVM and AEMLSVM-DEC classification algorithm on large-scale multi-label data sets.
To solve the overlong training time problem which restricts the multi-label SVM algorithm using non-linear kernel from being used on large-scale data sets, novel algorithms named AEMLSVM and AEMLSVM-DEC are proposed. In addition, the AEMLSVM-DEC classification algorithm adopts the DEC method to overcome label data imbalanced problem. These have greatly extended the application of these algorithms on large-scale data sets. Experiment results demonstrate that AEMLSVM and AEMLSVM-DEC classification algorithms can speed up the training and classification speed obviously while ensuring a comparable classification performance to that of ML-LIBSVM method and better performance than that of ML-BVM and ML-CVM method. In the future, we will further research on the label data imbalanced problem, which could get better classification performance.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China (NSFC) under the grant number 61170258, 61103196, 61379127 and 61379128, by the National Key R&D Program of China under the grant number 2016YFC1401900.
