Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data

Abstract

The imbalanced data problem is widespread in the real world. In the process of training machine learning models, ignoring imbalanced data problems will cause the performance of the model to deteriorate. At present, researchers have proposed many methods to deal with the imbalanced data problems, but these methods mainly focus on the imbalanced data problems in two-class classification tasks. Learning from multi-class imbalanced data sets is still an open problem. In this paper, an ensemble method for classifying multi-class imbalanced data sets is put forward, called multi-class WHMBoost. It is an extension of WHMBoost that we proposed earlier. We do not use the algorithm used in WHMBoost to process the data, but use random balance based on average size so as to balance the data distribution. The weak classifiers we use in the boosting algorithm are support vector machine and decision tree classifier. In the process of training the model, they participate in training with given weights in order to complement each other’s advantages. On 18 multi-class imbalanced data sets, we compared the performance of multi-class WHMBoost with state of the art ensemble algorithms using MAUC, MG-mean and MMCC as evaluation criteria. The results demonstrate that it has obvious advantages compared with state of the art ensemble algorithms and can effectively deal with multi-class imbalanced data sets.

Keywords

multi-class imbalanced data ensemble method random balance based on average size

1. Introduction

If one class has more samples than other classes in the data sets, the data sets are imbalanced [1]. Because of internal or external reasons, the data sets used for machine learning or data mining are often imbalanced in the real world. Data collection and data storage can cause imbalanced data problems, which are external reasons. In some scenarios, the datasets are also imbalanced, such as datasets used for text classification [2], fraud detection [3], medical diagnosis [4] and fault prediction [5]. In machine learning applications, data sets are generally considered to be balanced. If these models are applied to the skewed data sets, the models will be biased towards the majority classes in the data sets and those minority classes will get very poor accuracy [6]. However, it is generally crucial for us to accurately classify minority classes.

Effectively learning from imbalanced data is becoming more and more essential because most real data sets are imbalanced. In recent years, a lot of researchers have devoted themselves to the research of imbalanced data problems and have put forward many algorithms so that these problems are solved. These methods can be roughly divided into hybrid methods, algorithm-level methods and data-level methods in the literature [7]. Data-level methods are mainly to deal with the imbalanced data problems through sampling methods. Its function is to make the class distribution more balanced and the classifier can treat all classes equally. In data-level methods, the number of samples of the minority class can be increased by using over-sampling and the number of samples of the majority class can be reduced by using under-sampling. Previous researchers have presented many data sampling algorithms, such as synthetic minority over-sampling technique (SMOTE) [8], random under-sampling, edited nearest neighbor (ENN) [9], AdaSyn [10], random over-sampling, condensed nearest neighbor (CNN) [11] and so on. The algorithm-level methods change the learning process or the discrimination method so that the algorithms are insensitive to class distribution and the resulting model can accurately classify minority classes [12]. The algorithm-level methods we frequently use include ensemble method, cost-sensitive learning, new loss function, threshold moving and so on. In [13], focal loss which is a new loss function was proposed so that the extreme class imbalance problems in target detection was solved. In [14], A novel adaptive k-NN classifier was proposed so that the imbalanced data problems in two-class and multi-class classification tasks were solved. The ensemble model is also often used to deal with the imbalanced data problems. It is composed of multiple weak classifiers and the final output result is jointly determined by these weak classifiers. In the training process, the ensemble model can be more focused on the minority class, which is conducive to improving the classification performance of the minority class. In hybrid methods, both algorithm-level methods and data-level methods are used and the advantages of the two are combined. It uses the sampling methods to rebalance the data sets and modifies the algorithm mechanism to make the algorithm insensitive to the data distribution, in order that the models are more focused on the minority class and can better distinguish all classes. In [15], ensemble method was combined with cost-sensitive learning and sampling approaches in order to accomplish admirable classification performance.

Usually, solving the multi-class imbalanced data problems is over and above complicated than solving the two-class imbalanced data problems. In multi-class imbalanced datasets, there may be numerous majority classes and numerous minority classes, which together lead to datasets skew. In this case, it becomes extremely difficult to find the connections between the classes. If only the classification performance of individual class is considered, the classification performance of other classes will be reduced, which poses a huge challenge to researchers. In addition, the evaluation criteria used in multi-classification tasks have always been controversial. Better evaluation criteria are desired.

In this paper, an ensemble algorithm is put forward in order to deal with the imbalanced data problems in multi-class classification tasks, called multi-class WHMBoost. It is an extension of the WHMBoost [12] that we proposed before and it has been modified in some steps. It combines the algorithm-level method and the data-level method in order to accomplish preferable classification performance. We use random balance based on average size to process the data sets so that the class distribution becomes more balanced. We use support vector machine and decision tree classifier as the weak classifiers of the ensemble model. In the process of training models, we assign weights to them so that they can participate in model training with the given weights, thereby obtaining complementary advantages. In the experimental part, we used MAUC, MG-mean and MMCC as the classification performance evaluation criteria, and compared our designed method with the previously presented algorithms (SAMME [16], SMOTEBoost [17], GradientBoost [18, 19], RUSBoost [20]) on 18 multi-class imbalanced data sets. Experimental results demonstrate that our designed algorithm is more advanced than other ensemble methods in classifying multi-class imbalanced data.

The main contributions of this paper are summarized as follows:

1.
We extended WHMBoost, which solves binary imbalanced data problems, to multi-class WHMBoost, which solves multi-class imbalanced data problems, and made improvements in some algorithm steps.
2.
We used three performance evaluation criteria and compared our designed method with other algorithms like SAMME, RUSBoost, GradientBoost on 18 multi-class imbalanced data sets.
3.
By observing the experimental results and analyzing the experimental data, we can conclude that the presented algorithm is more advanced than the previously presented algorithms and can produce exceptional results.

The rest of the paper is organized as follows: in Section 2, related work is shown. The presented method is described in detail in Section 3. In Section 4, the results and analysis on 18 multi-class unbalanced data sets are introduced. Finally, Section 5 concludes the paper.
2. Related works

2.1 WHMBoost

In this paper, the multi-class WHMBoost we introduce is an extension and improvement of the WHMBoost that was previously presented to deal with the two-class imbalanced data problems. In this section, we first introduce WHMBoost. WHMBoost is a hybrid method that combines algorithm-level method and data-level method. The data-level method we use is the sampling algorithm that consist of adjustable random balance and random under-sampling. The sampling algorithm is used to balance the distribution of the data sets so that the number of samples in all classes tends to be the same. In order to make the adjustable random balance and random under-sampling better complement each other and adapt to different data sets, we assign weights to them. The algorithm-level method we use is the ensemble algorithm. In the ensemble method, the weak classifiers we use consist of support vector machine and decision tree classifier. The decision tree algorithm is unstable so that many different trees can be generated and more data space will be covered during the operation of the algorithm. Support vector machine has strong generalization capabilities and may also have satisfactory results on small data sets. Weights are assigned to support vector machine and decision tree classifier in order that they can better complement each other’s strengths in training and maximize their ability in order to accomplish the outstanding classification effect. Each iteration of the presented algorithm consists of two stages. In the first stage, the sampling algorithm is applied on the original data sets so that the size of all classes tend to be equal, in which adjustable random balance or random under-sampling is selected for training according to the given weight. In the second stage, the ensemble algorithm is executed on the generated data sets, in which a base classifier is selected for training according to the given weight.

[htb] : Adjustable Random Balance[1] Set S of example ( $x_{1}$ , $y_{1}$ ), $\ldots$ , ( $x_{m}$ , $y_{m}$ ) where $x_{i}$ $\in$ X $\subseteq$ $R^{n}$ and $y_{i}\in Y=\{-1,+1\}$ ( $+$ 1: the minority class, $-$ 1: the majority class) neighbours used in SMOTE, k scale factor, ratio New set $S^{\prime}$ of examples with Adjustable Random Balance $\textit{totalSize}\leftarrow\left|S\right|$ $S_{N}\leftarrow\{(x_{i},y_{i})\in S|y_{i}=-1\}$ $S_{P}\leftarrow\{(x_{i},y_{i})\in S|y_{i}=+1\}$ $\textit{majoritySize}\leftarrow\left|S_{N}\right|$ $\textit{minoritySize}\leftarrow\left|S_{P}\right|$ $\textit{rangeMinimum}\leftarrow(1.0-ratio)\times\textit{majoritySize}$ $\textit{rangeMinimum}<2$ $\textit{rangeMinimum}\leftarrow 2$ $\textit{rangeMaximum}\leftarrow\textit{majoritySize}+\textit{ratio}\times% \textit{minoritySize}$ $\textit{rangeMaximum}>(\textit{totalSize}-2)$ $\textit{rangeMaximum}\leftarrow\textit{totalSize}-2$ $\textit{newMajoritySize}\leftarrow$ Random integer between rangeMinimum and rangeMaximum $\textit{newMinoritySize}\leftarrow\textit{totalSize}-\textit{newMajoritySize}$ $\textit{newMajoritySize}<\textit{majoritySize}$ $S^{\prime}\leftarrow S_{P}$ Take a random sample of size newMajoritySize from $S_{N}$ , add the sample to $S^{\prime}$ . Create $\textit{newMinoritySize}-\textit{minoritySize}$ artificial examples from $S_{P}$ using SMOTE, add these examples to $S^{\prime}$ . $S^{\prime}\leftarrow S_{N}$ Take a random sample of size newMinoritySize from $S_{P}$ , add the sample to $S^{\prime}$ . Create $\textit{newMajoritySize}-\textit{majoritySize}$ artificial examples from $S_{N}$ using SMOTE, add these examples to $S^{\prime}$ . $S^{\prime}$ The adjustable random balance in WHMBoost is used to process the data sets. The adjustable random balance which is an enhancement of random balance [21] can assure that the minority class and the majority class have abundant samples to participate in training, and its superiority has been verified by experiments in [12]. Although the original random balance can increase the diversity of the model, it will also make the model performance worse due to too few samples in a certain class. In the adjustable random balance, we introduce a scale factor in order to control the range within which the sample size of the minority or majority class can vary. Its pseudo code is shown in Algorithm 1. The size of the minority class and the majority class in the set $S^{\prime}$ is calculated according to steps 6 to 15 in the Algorithm 1. By using the scale factor, the size of the new majority class is limited between $(1.0-\textit{ratio})\times\textit{majoritySize}$ and $\textit{majoritySize}+\textit{ratio}\times\textit{minoritySize}$ , but its minimum value cannot be less than $2$ and its maximum value cannot be greater than $\textit{totalSize}-2$ . If the size of the new class is larger than the size of the original class, redundant samples are generated by using SMOTE. If the size of the new class is smaller than the size of the original class, some examples are removed by using random under-sampling algorithm. The detailed description is shown in steps 16 to 23 in the Algorithm 1.

[htb] : WHMBoost[1] Set $S$ of example ( $x_{1}$ , $y_{1}$ ), $\ldots$ , ( $x_{m}$ , $y_{m}$ ) where $x_{i}\in X\subseteq R^{n}$ and $y_{i}\in Y=\{+1,-1\}$ ( $-$ 1: the majority class, $+$ 1: the minority class) Sampling method weight set, samplingMethodWeightSetBase classifier weight set, baseClassifierWeightSetNumber of iterations, $T$ Scale factor, ratioWHMBoost is built Initialize weights, $W_{1}(j)\leftarrow\frac{1}{m}$ for $j=1,2,\ldots,m$ $i=1$ to $T$ According to samplingMethodWeightSet, select a sampling algorithm from the samplingMethodSet, and use this sampling algorithm to generate a temporary training set $S_{i}$ from the original training set $S$ . $W_{i}^{\prime}(j)\leftarrow W_{i}(k)$ if $S_{i}(j)=S(k)$ else $\frac{1}{m}$ , for $j=1,2,\ldots,m$ According to baseClassifierWeightSet, a classification algorithm is selected from the baseClassifierSet. A base classifier $h_{i}$ is trained using $S_{i}$ and weights $W_{i}^{\prime}$ . Calculate pseudo-loss of $h_{i}$ , $e_{i}=\sum_{j=1}^{m}W_{i}(j)\times I(h_{i}(x_{j})\neq y_{j})$ ${\alpha}_{i}=\frac{1}{2}\times\ln\frac{1-e_{i}}{e_{i}}$ Update $W_{i}$ : $W_{i+1}(j)\leftarrow W_{i}(j)\times\exp(-\alpha_{i}\times y_{j}\times h_{i}(x_% {j}))$ Normalize $W_{i+1}$ : Let $Z_{i}=\sum_{j}W_{i+1}(j)$ , $W_{i+1}(j)=\frac{W_{i+1}(j)}{Z_{i}}$ An ensemble model: $H_{i}(x)=\arg\max_{y\in Y}\sum_{t=1}^{i}\alpha_{t}\times I(h_{t}(x)=y)$ score $=$ roc_auc_score( $H_{i}$ , $X_{\textit{test}}$ , $Y_{\textit{test}}$ ) $\textit{score}_{\textit{best}}$ < score $\textit{score}_{\textit{best}}=$ score $H_{\textit{best}}=H_{i}$ $H_{\textit{best}}$ In Algorithm 2, The pseudo code of WHMBoost is shown. The sampling method set is $\textit{sampleMethodSet}=\{\textit{random undersampling, adjustable random % balance}\}$ , and its corresponding sampling method weight set is $\textit{sampleMethodWeightSet}=\{p_{1},p_{2}\}$ . In one iteration of the algorithm, the probability of random under-sampling being selected to process data is $p_{1}$ and the probability of adjustable random balance being selected to process data is $p_{2}$ . The sum of $p_{1}$ and $p_{2}$ is less than or equal to 1, which means that the data will not be dealt with in some cases. The base classifier set is $\textit{baseClassifierSet}=\{\textit{decision tree classifier, support vector % machine}\}$ , and its corresponding weight set is $\textit{baseclassifierWeightSet}=\{p_{1},p_{2}\}$ , where $p_{1}+p_{2}=1$ . This means that the probability that the decision tree classifier is selected as the weak classifier is $p_{1}$ , and the probability that the support vector machine is selected as the weak classifier of the ensemble model is $p_{2}$ . The two sampling methods and two weak classifiers are merged in this weighted manner to achieve complementary advantages. In each iteration of the WHMBoost algorithm, the current optimal ensemble model will be chose through the AUC evaluation criteria.

The steps of the WHMBoost can be illustrated as follows. First, the weight $W_{1}(j)$ of each example in the data sets is initialized to $\frac{1}{m}$ , where $m$ is the total number of samples in the data sets. Then the algorithm will iterate $T$ times. During each iteration, the presented algorithm needs to use random under-sampling or adjustable random balance to generate a temporary data set $S_{i}$ for training from the initial data set $S$ . In the temporary data set, if a sample belongs to the original data set, its weight remains unchanged, that is, $W_{i}(k)$ . The weight of the sample generated using SMOTE is set to $\frac{1}{m}$ . According to baseClassifierWeightSet, WHMBoost selects a base classifier from baseClassifierSet, either support vector machine or decision tree classifier. This base classifier is trained using the temporary data set $S_{i}$ and the temporary weight vector $W_{i}^{\prime}$ . The pseudo-loss of the base classifier $h_{i}$ is calculated by Eq. (1). In Eq. (1), if $h_{i}(x_{j})\neq y_{j}$ is true, the result of $I(h_{i}(x_{j})\neq y_{j})$ is 1, otherwise the result of $I(h_{i}(x_{j})\neq y_{j})$ is 0. The function of $I(h_{t}(x)=y)$ is the same as that of $I(h_{i}(x_{j})\neq y_{j})$ . After the pseudo loss $e_{i}$ is calculated, we use the Eq. (2) to calculate the weight assigned to the base classifier in the ensemble model. By using Eq. (3), the weights of the samples in the initial data sets are updated so that the weights of the correctly classified samples become smaller, while the weights of the misclassified examples become larger. The weight vector needs to be normalized. The ensemble model with the paramount score is saved as $H_{\textit{best}}$ by using the AUC evaluation method so as to choose the best-performing ensemble model. When the training is over, $H_{\textit{best}}$ is the eventual ensemble model.

$\displaystyle e_{i}=\sum_{j=1}^{m}W_{i}(j)\times I(h_{i}(x_{j})\neq y_{j})$ (1) $\displaystyle{\alpha}_{i}=\frac{1}{2}\times\ln\frac{1-e_{i}}{e_{i}}$ (2) $\displaystyle W_{i+1}(j)\leftarrow W_{i}(j)\times\exp(-\alpha_{i}\times y_{j}% \times h_{i}(x_{j}))$ (3)

2.2 The previously presented algorithms

For multi-class imbalanced datasets, the current processing methods can be divided into two groups, one is to decompose multi-class imbalanced data problems into numerous two-class imbalanced data problems for processing [22], and the other is to directly carry out multi-class imbalanced datasets problems. The class decomposition methods are not discussed in this paper. In real data sets, most of them are imbalanced, which seriously affects the performance of machine learning models. Previous researchers have also put forward abounding ensemble algorithms in order to carry out the imbalanced data problems.

SAMME [16] was put forward in order to deal with the multi-class classification problems and it directly extends AdaBoost to multi-class situations, rather than reducing it to numerous binary-class situations. The algorithm process of SAMME is basically the same as that of AdaBoost. The only difference is the formula for calculating the weight of the weak classifier in the boosting algorithm, as shown in Eq. (4). The extra term $\log{(C-1)}$ is very important for multi-class problems. It makes SAMME equivalent to fitting a forward-stage additive model [23].

$\displaystyle a=\log\frac{1-e}{e}+\log{(C-1)}$ (4)

GradientBoost [18, 19] is a method based on Boosting, which can carry out both two-class classification tasks and multi-class classification tasks. It is suitable for various loss functions and is an additive model. In each iteration, the learner is obtained by fitting the negative gradient of the loss function of the previously established model. The learner in GradientBoost will be initialized with a constant value in order that the loss function of all samples is minimized [24]. When the learner is added to the ensemble model, it needs to be multiplied by a coefficient, which can be regarded as the learning rate.

SMOTEBoost [17] was proposed by Chawla et al. in order to deal with both the two-class imbalanced data problems and the multi-class imbalanced data problems. The process of this method is divided into two steps: SMOTE and boosting method. In each iteration, SMOTE will synthesize new data based on the initial data, and the weak classifier in boosting is trained based on the initial data and the synthesized data. At the end of each iteration, the weights of the initial data will be updated, and the weights of correctly classified samples become smaller and the weights of incorrectly classified samples become larger. By updating the weights of samples, the model will be more focused on misclassified samples after one iteration, so as to advance the classification effect of the model. The disadvantage of SMOTEBoost is that it generates a large amount of data, which leads to longer training time.

RUSBoost [20], which can handle multi-class imbalanced data problems and two-class imbalanced data problems, was proposed by Seifert et al. It can overcome the shortcomings of longer training time for SMOTEBoost. RUSBoost consists of random under-sampling and AdaBoost, and each iteration process is divided into two stages. In the first stage, random under-sampling, randomly removing samples from the majority class, is used to adjust the data distribution so that all classes have the same size. In the second stage, the weak classifier is trained using the data generated in the first stage. After the classifier is trained, the weights of misclassified samples will also be updated. Although it overcomes the shortcomings of SMOTEBoost, random under-sampling may discard the more important data in the data sets, resulting in reduced classification performance.

MEBoost [25] integrates two weak classifiers under the framework of boosting, but it can only deal with the imbalanced data problems in the binary classifications. In each iteration, either the extra tree classifier or the decision tree classifier is chose for training. This method can combine the advantages of the two classifiers so that the classification effect is better. AUC is used to choose the best ensemble model in the training process. RB-Boost [21] was proposed by Díez-Pastor, combining random balance and AdaBoost. It can only carry out the two-class imbalanced data problems. In random balance, the temporary data sets are obtained by randomly under-sampling one class and using SMOTE to over-sample another. Random balance increases the diversity of the data sets. In addition, RHSBoost [26], AdaCost [27], etc. have also been presented in order to deal with the two-class imbalanced data problems.

2.3 Evaluation criteria for multi-class imbalanced data

In this paper, we used MAUC, MG-mean and MMCC as the evaluation criteria, and compared our proposed algorithm with the previously presented algorithm. AUC is considered to be a more robust evaluation standard and it is often used to evaluate imbalanced data problems. The AUC for evaluating multi-class imbalanced data is shown in Eq. (5), which is called MAUC [28]. In Eq. (5), $C$ represents the number of classes in the data sets and $\textit{AUC}(a,b)$ represents the AUC between class $a$ and class $b$ .

$\displaystyle\textit{MAUC}=\frac{2}{C(C-1)}\sum_{a<b}{\textit{AUC}(a,b)}$ (5)

Matthews correlation coefcient (MCC) is also often used as the two-class and multi-class performance evaluation standard. Its value is between $-$ 1 and 1, where 1 means perfect prediction, $-$ 1 means inverse prediction and 0 means random prediction. In this paper, we used the MCC and MMCC defined in [24] to evaluate our method, as shown in Eqs (6) and (7).

$\displaystyle\textit{MCC}=\frac{\textit{TP}\times\textit{TN}-\textit{FP}\times% \textit{FN}}{\sqrt{(\textit{TP}+\textit{FP})(\textit{TP}+\textit{FN})(\textit{% TN}+\textit{FP})(\textit{TN}+\textit{FN})}}$ (6) $\displaystyle\textit{MMCC}=\frac{2}{C(C-1)}\sum_{a<b}{\textit{MCC}(a,b)}$ (7)

Geometric-mean (G-mean) [29] is the third method used to evaluate imbalanced data in our paper, and it can be extended to multi-classification problems. In this paper, we used the geometric mean of the recall values of all classes to represent the G-mean in a multi-classification tasks, as shown in Eq. (8).

$\displaystyle\textit{MG-mean}=\Bigg{(}\prod_{i=1}^{C}{\textit{Recall}_{i}}% \Bigg{)}^{\frac{1}{c}}$ (8)

Figure 1.

The process of multi-class WHMBoost.

3. Multi-class WHMBoost for multi-class imbalanced data

In [12], we presented a weighted hybrid ensemble method, called WHMBoost, in order to carry out the binary-class imbalanced data problems encountered in the field of machine learning and data mining. In this paper, we put forward multi-class WHMBoost for classifying multi-class imbalanced data, which directly extends WHMBoost, which can only solve the imbalanced data problems in two-class classification tasks, to an algorithm that can solve the multi-class imbalanced data problems. We have made some improvements to the algorithm so as to make the presented algorithm better deal with multi-class classification tasks. Multi-class WHMBoost is a hybrid method that combines algorithm-level method and data-level method. The data-level method we use is random balance based on average size. This method is inspired by random balance [21]. In our algorithm, random balance based on average size is used to balance the data sets while increasing the diversity of the data sets. It can make the size of each class randomly tend to the average size of all classes, that is, classes with a size smaller than the average size will be added samples and classes with a size larger than the average size will be reduced samples. The algorithm-level method we use is boosting algorithm. We use two base classifiers which are support vector machine and decision tree classifier in the boosting algorithm. In order to make support vector machine and decision tree classifier complement each other, we assign weights to them in order that they participate in training with given weight. Each iteration of the algorithm consists of two stages. In the first stage, random balance based on average size is applied on the original data sets so that the size of all classes tend to be equal. In the second stage of the method, boosting is executed on the generated data sets, in which a base classifier is selected for training according to the given weight. The algorithm process of multi-class WHMBoost is shown in Fig. 1.

In random balance based on average size, we use random under-sampling and SMOTE (synthetic minority over-sampling technique) to adjust the class distribution. In this method, we first obtain the average size of all classes in the data sets, then use random under-sampling to randomly remove samples from those classes whose size is larger than the average size, and use SMOTE to generate new samples and add them to those classes whose size is smaller than the average size. We do not directly change the size of each class to the average size, but use a coefficient to randomly tend the size of each class to the average size so as to increase the diversity of the training sets. After the random balance based on average size processing, the size of each class is changed as shown in Eq. (9). When the size of a certain class is equal to the average size of all classes, its size is not changed. If the size of a certain class is greater than the average size of all classes, its size becomes $\textit{size}_{\textit{original}}-\left\lceil(\textit{size}_{\textit{original}% }-\textit{size}_{\textit{average}})\times\alpha\right\rceil$ . If the size of a certain class is smaller than the average size of all classes, its size becomes $\textit{size}_{\textit{original}}+\left\lceil(\textit{size}_{\textit{average}}% -\textit{size}_{\textit{original}})\times\alpha\right\rceil$ . In Eq. (9), $\alpha$ is a random number from 0 to 1 and the same $\alpha$ is used when calculating the temporary size of all classes. In each iteration, the temporary data generated is used to train models.

$\displaystyle\textit{size}_{\textit{temp}}=\left\{\begin{array}[]{ll}\textit{% size}_{\textit{original}},&\text{if}\ \textit{size}_{\textit{original}}=% \textit{size}_{\textit{average}}\\ \textit{size}_{\textit{original}}-\left\lceil(\textit{size}_{\textit{original}% }-\textit{size}_{\textit{average}})\times\alpha\right\rceil,&\text{if}\ % \textit{size}_{\textit{original}}>\textit{size}_{\textit{average}}\\ \textit{size}_{\textit{original}}+\left\lceil(\textit{size}_{\textit{average}}% -\textit{size}_{\textit{original}})\times\alpha\right\rceil,&\text{if}\ % \textit{size}_{\textit{original}}<\textit{size}_{\textit{average}}\end{array}\right.$ (9)

In the presented multi-class WHMBoost, the base classifier set is $\textit{baseClassifierSet}=\{\textit{support vector}\linebreak\textit{machine,% decision tree classifier}\}$ , and its corresponding base classifier weight set is $\textit{baseClassifierWeight}\linebreak\textit{Set}=\{p_{1},p_{2}\}$ , where $p_{1}+p_{2}=1$ . As in WHMBoost, the probability that support vector machine is selected for training models is $p_{1}$ and the probability that decision tree classifier is selected for training models is $p_{2}$ . In different data sets, we can assign different weights to support vector machine and decision tree classifier. This is mainly because the performance of these two algorithms may be different in different data sets. In each iteration, by assigning different weights to the support vector machine and the decision tree classifier, they can participate in the model training with the given weight, so as to achieve the effect of complementary advantages. This is also the difference between our presented method and other ensemble algorithms. In the designed algorithm, we not only merge the two classifiers, but also assign weights to them so as to maximize their performance.

The pseudo code of our presented multi-class WHMBoost is shown in Algorithm 3. Its steps are basically the same as those of WHMBoost. The different steps are described below. During each iteration, the presented algorithm needs to use random balance based on average size to generate a temporary data set $S_{i}$ for training from the initial data set $S$ . we use the Eq. (4) used in the paper [16] to calculate the weight assigned to the base classifier in the ensemble model. By using Eq. (10), the weights of the samples in the initial data sets are updated so that the weights of the correctly classified samples remain unchanged, while the weights of the misclassified examples become larger.

$\displaystyle W_{i+1}(j)\leftarrow W_{i}(j)\times\exp(\alpha_{i}\times I(h_{i}% (x_{j})\neq y_{j}))$ (10)

[htb] : multi-class WHMBoost for multi-class imbalanced data [1] Set S of example ( $x_{1}$ , $y_{1}$ ), $\ldots$ , ( $x_{m}$ , $y_{m}$ ) where $x_{i}\in X\subseteq R^{n}$ and $y_{i}\in Y=\{1,2,\ldots,C\}$ ( $C$ is the number of classes) Base classifier weight set, baseClassifierWeightSetNumber of iterations, $T$ multi-class WHMBoost is built Initialize the weights of samples, $W_{1}(j)\leftarrow\frac{1}{m}$ for $j=1,2,3,\ldots,m$ $i=1$ to $T$ Use random balance based on average size to generate a temporary training set $S_{i}$ from the original training set $S$ . $W_{i}^{\prime}(j)\leftarrow W_{i}(k)$ if $S_{i}(j)=S(k)$ else $\frac{1}{m}$ , for $j=1,2,3,\ldots,m$ , k is an integer from 1 to m. According to the baseClassifierWeightSet, a classification algorithm is selected from the baseClassifierSet. A base classifier $h_{i}$ is trained using $S_{i}$ and weights $W_{i}^{{}^{\prime}}$ . Calculate pseudo-loss of $h_{i}$ , $e_{i}=\sum_{j=1}^{m}W_{i}(j)\times I(h_{i}(x_{j})\neq y_{j})$ . ${\alpha}_{i}=\log\frac{1-e_{i}}{e_{i}}+\log(C-1)$ . Update $W_{i}$ : $W_{i+1}(j)\leftarrow W_{i}(j)\times\exp(\alpha_{i}\times I(h_{i}(x_{j})\neq y_% {j}))$ Normalize $W_{i+1}$ : Let $Z_{i}$ = $\sum_{j}W_{i+1}(j)$ , $W_{i+1}(j)$ = $\frac{W_{i+1}(j)}{Z_{i}}$ An ensemble model: $H_{i}(x)=\arg\max_{y}\sum_{t=1}^{i}\alpha_{t}\times I(h_{t}(x)=y)$ score $=$ evaluation_method( $H_{i}$ , $X_{\textit{test}}$ , $Y_{\textit{test}}$ ) $\textit{score}_{\textit{best}}<$ score $\textit{score}_{\textit{best}}=$ score $H_{\textit{best}}=H_{i}$ $H_{\textit{best}}$

4. Experimental study

Table 1
Imbalance data sets used in the experiments

Name	Classes	Features	Instances (Instances of each class)	IR	Source
Dermatology	6	34	366 (112,61,72,49,52,20)	5.60	KEEL
Glass	6	9	214 (70,76,17,13,9,29)	8.44	KEEL
Hayes-roth	3	4	132 (51,51,30)	1.70	KEEL
New-thyroid	3	5	215 (35,30,150)	5.00	KEEL
Penbased	10	16	1100 (115,114,114,106,114,106,105,115,105,106)	1.10	KEEL
Balance	3	4	625 (49,288,288)	5.88	KEEL
Contraceptive	3	9	1473 (629,333,511)	1.89	KEEL
Thyroid	3	21	720 (17,37,666)	39.18	KEEL
Wine	3	13	178 (59,71,48)	1.48	KEEL
Analcatdata_dmft	6	4	797 (127,132,124,155,136,123)	1.26	OpenML
First-order-theorem-proving	6	51	6118 (1089,486,748,617,624,2554)	5.26	OpenML
Optdigits	10	64	5620 (554,571,557,572,568,558,558,566,554,562)	1.03	OpenML
Satimage	6	36	6430 (1531,703,1356,625,707,1508)	2.45	OpenML
Splice	3	60	3190 (767,768,1655)	2.16	OpenML
Vehicle	4	18	846 (218,212,217,199)	1.10	OpenML
Analcatdata_authorship	4	70	841 (317,296,55,173)	5.76	OpenML
Cardiotocography	10	35	2126 (384,579,53,81,72,332,252,107,69,197)	10.92	OpenML
Cmc	3	9	1473 (629,333,511)	1.89	OpenML

4.1 Data sets

A total of 18 imbalanced data sets were collected to evaluate our presented multi-class WHMBoost and the previously proposed ensemble algorithms. Among them, 9 data sets are from KEEL [30] and the remaining data sets are from OpenML. The imbalance rate of these data sets ranges from 1.03 to 39.18. In the multi-class imbalanced data sets, the imbalance rate is defined as the maximum number of samples in the class divided by the minimum number of samples in the class [31].

The characteristics of the data sets are shown in Table 1. The name of the data sets is displayed in the first column. The number of classes in the data sets is shown in the second column. The number of features in the data sets is shown in the third column. The types of features include real, integer and nominal value. The number of instances and the number of instances of each class are displayed in the fourth column and the imbalance rate (IR) is shown in the fifth column. The origin of the data sets, either OpenML or KEEL, is shown in the last column.

4.2 Experimental results and analysis

The data set is randomly divided into training set and testing set in each experiment. the training set accounts for 80% of the entire data set. The proportion of the number of samples of the same class in the training set and the test set to the total number of samples is equal so as to preserve the original imbalance rate. The algorithm is trained using the training set and the effectiveness of the algorithm is evaluated using the testing set. As in [12], this process is repeated 50 times and the experimental result is the average of 50 experiments. In all experiments, the number of iterations of the ensemble model is the same. If the pseudo loss $e$ in the ensemble model is less than or equal to 1e-100, it is set to 1e-100.

On 18 multi-class imbalanced data sets, we compared the performance of multi-class WHMBoost (WHMBoostM, WHMBoostA) with state of the art ensemble algorithms like SAMME, SMOTEBoost, GradientBoost and RUSBoost using MAUC, MG-mean and MMCC as evaluation criteria. The best ensemble model in the WHMBoostM algorithm is selected based on MMCC and the best ensemble model in the WHMBoostA algorithm is selected based on MAUC. The range of $p_{1}$ in baseClassifierWeightSet is from 0 to 1, and eleven values are obtained at intervals of 0.1. Similarly, the range of $p_{2}$ in baseClassifierWeightSet is also from 0 to 1, but we must ensure that $p_{1}+p_{2}=1$ . So as to analyze the results of the experiments and compare the presented algorithm with other ensemble algorithms, the average ranks [32] and the hochberg test [33] were used in our experiments.

Table 2
Score obtained using MAUC as the evaluation criterion

Name	SAMME	RUSBoost	SMOTEBoost	GradientBoost	WHMBoostM	WHMBoostA
Dermatology	0.8422	0.8301	0.8209	0.9953	0.9979	0.9984
Glass	0.7374	0.7704	0.6753	0.8467	0.8965	0.8998
Hayes-roth	0.6606	0.6381	0.6563	0.8797	0.8901	0.8935
New-thyroid	0.9092	0.8829	0.8998	0.9766	0.9947	0.9961
Penbased	0.7827	0.7812	0.7671	0.9673	0.9989	0.9991
Balance	0.6543	0.6759	0.6073	0.7994	0.9644	0.9694
Contraceptive	0.6336	0.6449	0.6385	0.7186	0.7175	0.7208
Thyroid	0.8824	0.8932	0.8892	0.9468	0.9584	0.9708
Wine	0.9164	0.9169	0.931	0.9942	0.9995	0.9996
Analcatdata_dmft	0.5426	0.537	0.5335	0.5901	0.5664	0.5681
First-order-theorem-proving	0.614	0.6053	0.5787	0.6839	0.7164	0.6948
Optdigits	0.6833	0.6795	0.6614	0.9857	0.9998	0.9998
Satimage	0.8017	0.7454	0.7953	0.9677	0.984	0.9843
Splice	0.8738	0.8793	0.8771	0.9695	0.9595	0.9631
Vehicle	0.8121	0.8031	0.8088	0.8918	0.943	0.944
Analcatdata_authorship	0.9032	0.9336	0.8799	0.9954	1.0	1.0
Cardiotocography	0.7933	0.7933	0.784	0.9978	1.0	1.0
Cmc	0.6434	0.6378	0.6328	0.7297	0.7236	0.7218

Table 3

Hochberg test results using MAUC as the evaluation criterion

Control method (WHMBoostA)
Method	Z-values	Adjusted $p$ -values
WHMBoostM	1.0690	1.0
GradientBoost	2.0045	0.1801
SAMME	5.2561	4.4134e-07
RUSBoost	5.6125	3.9888e-08
SMOTEBoost	6.6370	3.2016e-11

Figure 2.

Average ranks from Friedman test using MAUC as the evaluation criterion.

Table 2 shows the scores obtained by multi-class WHMBoost and other ensemble algorithms based on MAUC. WHMBoostA gets the best score on 14 data sets, accounting for 77.78% of the total data sets. WHMBoostM gets the best score on 4 data sets, accounting for 22.22% of the total data sets. In the data sets “optdigits”, “analcatdata_authorship” and “cardiotocography”, WHMBoostM and WHMBoostA get equal scores and both get the highest scores. Figure 2 shows the average ranks of various methods. From the figure, we can find that the average rank of WHMBoostA is the smallest. The methods for ranking in the second and third positions are WHMBoostM and GradientBoost, respectively. SMOTEBoost has the largest average rank. The results of the hochberg test based on MAUC are shown in Table 3. According to the adjusted $p$ -values, WHMBoostA is significantly different from SAMME, RUSBoost and SMOTEBoost with the significance of $\alpha=0.05$ . The adjusted $p$ -value between WHMBoostA and WHMBoostM is 1.0 and the adjusted $p$ -value between WHMBoostA and GradientBoost is 0.1801, both of which are greater than 0.05. This demonstrates that there are no significant differences between WHMBoostA and those two methods.

Table 4

Score obtained using MMCC as the evaluation criterion

Name	SAMME	RUSBoost	SMOTEBoost	GradientBoost	WHMBoostM	WHMBoostA
Dermatology	0.6131	0.6112	0.5684	0.9307	0.9648	0.9678
Glass	0.3819	0.3825	0.3181	0.4775	0.5254	0.5565
Hayes-roth	0.3657	0.3	0.2972	0.6224	0.6444	0.6741
New-thyroid	0.7801	0.7132	0.7654	0.8316	0.9188	0.9203
Penbased	0.3058	0.311	0.306	0.7732	0.9778	0.9781
Balance	0.4105	0.3368	0.2295	0.6699	0.8101	0.7973
Contraceptive	0.2273	0.2194	0.2184	0.2757	0.2961	0.2989
Thyroid	0.6508	0.6481	0.6118	0.6527	0.649	0.66
wine	0.7804	0.7999	0.8318	0.914	0.9775	0.9808
Analcatdata_dmft	0.0405	0.054	0.0576	0.0506	0.0662	0.0611
First-order-theorem-proving	0.137	0.1546	0.1291	0.1431	0.2238	0.2278
Optdigits	0.3151	0.2979	0.2778	0.8496	0.9859	0.9868
Satimage	0.5303	0.4664	0.5796	0.8021	0.8803	0.8828
Splice	0.5753	0.6422	0.6446	0.8258	0.7754	0.7713
Vehicle	0.4502	0.4278	0.4474	0.5742	0.7172	0.7187
Analcatdata_authorship	0.7491	0.8385	0.6793	0.9354	0.9952	0.9945
Cardiotocography	0.5869	0.5869	0.583	0.9854	0.9974	0.9977
Cmc	0.2197	0.222	0.2315	0.2833	0.3063	0.307

Figure 3.

Average ranks from Friedman test using MMCC as the evaluation criterion.

By using MMCC as the evaluation criteria, the scores obtained by the presented method and other algorithms are demonstrated in Table 4. WHMBoostA gets the best score on 14 data sets, accounting for 77.78% of the total data sets. WHMBoostM gets the best score on 3 data sets, accounting for 16.67% of the total data sets. From the table, we find that the scores obtained by WHMBoostA and WHMBoostM are not much different on most data sets. In Fig. 3, the average ranks of WHMBoostA is 1.2778 and is the smallest. The average rank of WHMBoostM is 1.9444 and the average rank of GradientBoost is 3.0. These two methods rank behind WHMBoostA. In Table 5, the adjusted $p$ -values between WHMBoostA and other methods are all less than 0.05, except for WHMBoostM, which shows that there are significant differences between them.

Table 5

Hochberg test results using MMCC as the evaluation criterion

Control method (WHMBoostA)
Method	Z-values	Adjusted $p$ -values
WHMBoostM	1.0690	1.0
GradientBoost	2.7617	0.0230
SAMME	5.5679	7.7329e-08
RUSBoost	5.6570	3.0803e-08
SMOTEBoost	6.3252	2.5293e-10

Table 6

Score obtained using MG-mean as the evaluation criterion

Name	SAMME	RUSBoost	SMOTEBoost	GradientBoost	WHMBoostM	WHMBoostA
New-thyroid	0.8491	0.805	0.8401	0.8542	0.9444	0.9514
Dermatology	0.0	0.0	0.0	0.9343	0.967	0.9682
Analcatdata_authorship	0.2253	0.7946	0.7138	0.9406	0.9955	0.9959
Glass	0.0	0.0	0.0	0.0096	0.3764	0.391
Hayes-roth	0.021	0.1103	0.0	0.771	0.7697	0.7974
Penbased	0.0	0.0	0.0	0.6093	0.9798	0.9802
Balance	0.0	0.3909	0.3113	0.0	0.7696	0.8134
Thyroid	0.8334	0.8739	0.7369	0.7294	0.7956	0.7722
First-order-theorem-proving	0.0	0.0	0.0	0.0012	0.2372	0.1572
Contraceptive	0.2176	0.3446	0.2623	0.4311	0.5339	0.527
Optdigits	0.0	0.0	0.0	0.8618	0.9874	0.9878
Wine	0.8654	0.884	0.8827	0.9343	0.9834	0.985
Satimage	0.0	0.0	0.0	0.7598	0.8825	0.8856
Splice	0.7225	0.7738	0.7486	0.878	0.8609	0.8619
Analcatdata_dmft	0.0	0.0	0.0	0.0986	0.1984	0.1969
Vehicle	0.0188	0.1353	0.007	0.6251	0.7776	0.7731
Cardiotocography	0.0	0.0	0.0	0.9913	0.9983	0.9985
Cmc	0.235	0.381	0.3212	0.4257	0.5348	0.5356

We also used MG-mean to evaluate the presented algorithm and other algorithms and the scores are demonstrated in Table 6. From the table, we can see that WHMBoostA obtains the highest score on 12 data sets, accounting for 66.67% of the total data sets, while WHMBoostM obtains the highest score on 4 data sets, accounting for 22.22% of the total data sets. In the table, SAMME, RUSBoost and SMOTEBoost have scores of 0 on some data sets. This may be related to keeping four decimal places. In Fig. 4, the average rank of WHMBoostA is still the smallest, followed by WHMBoostM. At the same time SAMME obtains the largest average rank. Based on MG-mean, the adjusted $p$ -values between WHMBoostA and other methods are shown in Table 7. The adjusted $p$ -values between WHMBoostA and WHMBoostM is 1, indicating that there are no significant differences between them. There are significant differences between WHMBoostA and other algorithms except WHMBoostM, because the adjusted $p$ -values between them are all less than 0.05.

Based on the above experimental results and analysis, we can conclude that there are significant differences between the presented algorithm and other ensemble algorithms. In the multi-class imbalanced data sets, the performance of the proposed method is excellent and it can produce more effects than other ensemble methods. Although there are no significant differences between WHMBoostA and WHMBoostM, WHMBoostA seems to be able to obtain better results. In future experiments, we can use MAUC as an evaluation criterion in our algorithm to select the best-performing ensemble model. In the experiment, we also found that on most data sets, assigning a larger weight to the support vector machine seems to get better performance. On only a few data sets, the best scores were obtained when a larger weight were assigned to the decision tree classifier.

Table 7

Hochberg test results using MG-mean as the evaluation criterion

Control method (WHMBoostA)
Method	Z-values	Adjusted $p$ -values
WHMBoostM	0.8018	1.0
GradientBoost	2.7172	0.0263
RUSBoost	4.6325	1.0837e-05
SMOTEBoost	5.7907	1.4022e-08
SAMME	5.8352	5.3725e-09

Figure 4.

Average ranks from Friedman test using MG-mean as the evaluation criterion.

5. Conclusions

Due to internal and external reasons, most data sets in real scenarios are imbalanced. A model trained on the imbalanced data sets will be more focused on the majority class and ignore the minority class that we are interested in. This is determined by the inherent characteristics of the imbalanced data sets. The classification tasks of the multi-classification problems are much more complicated than the classification tasks of the two-classification problems. In the classification tasks of the multi-classification problems, there may be numerous minority classes and numerous majority classes, which makes it difficult to judge the connection between different classes in the data sets. Previous researchers have put forward many algorithms in order to deal with the imbalanced data problems in two-class classification tasks and multi-class classification tasks. These methods focus on solving the imbalanced data problems in binary classification tasks. The imbalanced data problems in multi-class classification tasks are still worthy of extensive research. In this paper, multi-class WHMBoost is put forward so as to carry out the imbalanced data problems in multi-class classification tasks. It directly expands the hybrid weighted ensemble model we previously put forward to deal with the imbalanced data problems in two-class classification tasks into a method to carry out the imbalanced data problems in multi-class classification tasks. In our proposed algorithm, two base classifiers are used and weights are assigned to them in order that they can give full play to their advantages. Like SAMME, when we calculate the weight of the base classifier in the ensemble model, the formula used is shown in Eq. (4), in which there is the extra term $\log(C-1)$ . On 18 multi-class imbalanced data sets, the performance of multi-class WHMBoost has been evaluated using MAUC, MG-mean and MMCC as evaluation criteria. From the experimental results and analysis, we can conclude that the presented method outperforms state of the art ensemble algorithms like SAMME, GradientBoost and SMOTEBoost. From the experimental results, we can also conclude that when we use MAUC as an evaluation criterion to select an excellent ensemble model in the proposed algorithm, better results are easier to obtain.

Footnotes

Acknowledgments

We would like to thank the Natural Science Foundation of China (NSFC) under Grant 11972275 for funding our work. We used datasets from OpenML and KEEL in our work and we also express our gratitude to them.

References

Japkowicz

, Learning from Imbalanced Data Sets: A Comparison of Various Strategies *, 2000.

Talpur

B.A.

and O’Sullivan

, Multi-class imbalance in text classification: A feature engineering approach to detect cyberbullying in twitter, Informatics 7 (2020), 52.

Priscilla

and Prabha

, Influence of Optimizing XGBoost to handle Class Imbalance in Credit Card Fraud Detection, 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 2020, pp. 1309–1315.

Jain

Ratnoo

and Kumar

, Addressing class imbalance problem in medical diagnosis: A genetic algorithm approach, 2017 International Conference on Information, Communication, Instrumentation and Control (ICICIC), 2017, pp. 1–8.

Arun

and Lakshmi

, Class Imbalance in Software Fault Prediction Data Set, 2020.

Batista

G.E.A.P.A.

Prati

and Monard

M.C.

, A study of the behavior of several methods for balancing machine learning training data, SIGKDD Explor 6 (2004), 20–29.

Johnson

and Khoshgoftaar

, Survey on deep learning with class imbalance, Journal of Big Data 6 (2019), 1–54.

Chawla

N.V.

Bowyer

Hall

and Kegelmeyer

W.P.

, SMOTE: Synthetic minority over-sampling technique, J Artif Intell Res 16 (2002), 321–357.

Wilson

, Asymptotic properties of nearest neighbor rules using edited data, IEEE Trans Syst Man Cybern 2 (1972), 408–421.

10.

Bai

Garcia

E.A.

and Li

, ADASYN: Adaptive synthetic sampling approach for imbalanced learning, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) 2008, pp. 1322–1328.

11.

Hart

, The condensed nearest neighbor rule (Corresp.), IEEE Trans. Inf. Theory 14 (1968), 515–516.

12.

Zhao

Jin

Chen

Zhang

and Liu

, A weighted hybrid ensemble method for classifying imbalanced data, Knowl Based Syst 203 (2020), 106087.

13.

Lin

T.-Y.

Goyal

Girshick

R.B.

and Dollár

, Focal Loss for Dense Object Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2020), 318–327.

14.

Kirtania

Mitra

and Shankar

B.U.

, A novel adaptive k-NN classifier for handling imbalance: Application to brain MRI, Intell Data Anal 24 (2020), 909–924.

15.

and Garcia

E.A.

, Learning from Imbalanced Data, IEEE Transactions on Knowledge and Data Engineering 21 (2009), 1263–1284.

16.

Hastie

Rosset

Zhu

and Zou

, Multi-class AdaBoost, Statistics and Its Interface 2 (2009), 349–360.

17.

Chawla

N.V.

Lazarevic

Hall

L.O.

and Bowyer

K.W.

, SMOTEBoost: Improving Prediction of the Minority Class in Boosting, in: PKDD, 2003.

18.

Friedman

, Greedy function approximation: A gradient boosting machine., Annals of Statistics 29 (2001), 1189–1232.

19.

Friedman

, Stochastic gradient boosting, Computational Statistics & Data Analysis 38 (2002), 367–378.

20.

Seiffert

Khoshgoftaar

T.M.

Hulse

J.V.

and Napolitano

, RUSBoost: A Hybrid Approach to Alleviating Class Imbalance, IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans 40 (2010), 185–197.

21.

s Díez-Pastor

Diez

J.J.R.

García-Osorio

and Kuncheva

, Random Balance: Ensembles of variable priors classifiers for imbalanced data, Knowl Based Syst 85 (2015), 96–111.

22.

Wang

and Yao

, Multiclass Imbalance Problems: Analysis and Potential Solutions, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42 (2012), 1119–1130.

23.

Guo-qiang

Yu-ying

and De-you

, A noise classification algorithm based on SAMME and BP neural network, 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), 2018, pp. 274–278.

24.

Tanha

Abdi

Samadi

Razzaghi

and Asadpour

, Boosting methods for multi-class imbalanced data classification: an experimental review, Journal Of Big Data 7(1) (2020), 70.

25.

Rayhan

Ahmed

Mahbub

Jani

M.R.

Shatabda

Farid

D.M.

and Rahman

C.M.

, MEBoost: Mixing estimators with boosting for imbalanced data classification, 2017 11th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), 2017, pp. 1–6.

26.

Gong

and Kim

, RHSBoost: Improving classification performance in imbalance data, Computational Statistics & Data Analysis 111(C) (2017), 1–13. doi: 10.1016/j.csda.2017.01.00. https://ideas.repec.org/a/eee/csdana/v111y2017icp1-13.html.

27.

Fan

Stolfo

Zhang

and Chan

, AdaCost: Misclassification Cost-Sensitive Boosting, in: ICML, 1999.

28.

Zhen

and Qiong

, A New Feature Selection Method for Internet Traffic Classification Using ML, Physics Procedia 33(none) (2012).

29.

Sun

Kamel

and Wang

, Boosting for Learning Multiple Classes with Imbalanced Class Distribution, Sixth International Conference on Data Mining (ICDM’06) 2006, pp. 592–602.

30.

Alcalá-Fdez

Fernández

Luengo

Derrac

and García

, KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework, J Multiple Valued Log Soft Comput 17 (2011), 255–287.

31.

Fiori

Martino

J.M.D.

and Fernández

, An optimal multiclass classifier design, 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 480–485.

32.

Demsar

, Statistical Comparisons of Classifiers over Multiple Data Sets, J Mach Learn Res 7 (2006), 1–30.

33.

Hochberg

, A sharper Bonferroni procedure for multiple tests of significance, Biometrika 75 (1988), 800–802.