A survey of multi-class imbalanced data classification methods

Abstract

In reality, the data generated in many fields are often imbalanced, such as fraud detection, network intrusion detection and disease diagnosis. The class with fewer instances in the data is called the minority class, and the minority class in some applications contains the significant information. So far, many classification methods and strategies for binary imbalanced data have been proposed, but there are still many problems and challenges in multi-class imbalanced data that need to be solved urgently. The classification methods for multi-class imbalanced data are analyzed and summarized in terms of data preprocessing methods and algorithm-level classification methods, and the performance of the algorithms using the same dataset is compared separately. In the data preprocessing methods, the methods of oversampling, under-sampling, hybrid sampling and feature selection are mainly introduced. Algorithm-level classification methods are comprehensively introduced in four aspects: ensemble learning, neural network, support vector machine and multi-class decomposition technique. At the same time, all data preprocessing methods and algorithm-level classification methods are analyzed in detail in terms of the techniques used, comparison algorithms, pros and cons, respectively. Moreover, the evaluation metrics commonly used for multi-class imbalanced data classification methods are described comprehensively. Finally, the future directions of multi-class imbalanced data classification are given.

Keywords

Classification multi-class imbalance data data preprocessing method algorithm-level classification method

1 Introduction

In the field of machine learning and data mining, classification of imbalanced data is an important research direction. The classes with fewer instances in the data are called minority classes, and minority classes are often the interested aspect for researchers. The distribution of classes in data collected from many applications is often heterogeneous, such as network intrusion detection [1], credit card fraud detection [2] and disease diagnosis [3]. This imbalanced distribution of data can lead to difficulties in classification, as classifiers tend to process the majority classes and misclassify the minority classes. Many algorithms have been proposed for imbalanced data, but most of them address the two-class problem. However, two class cannot cover all real-world scenarios, and imbalanced multi-class data is often more likely to occur in real-world applications. Learning from multi-class imbalanced data is more difficult than the two-class case. Not only is it necessary to classify multiple classes, but the boundaries between classes may overlap. In addition, there may exist multiple minority classes or multiple majority classes in the data [4], as shown in Fig. 1.

Fig. 1

Dataset with multi-class, imbalance, overlapping and noisy.

The earlier methods to deal with multi-class imbalanced data are mainly divided into two types. One of types are applying decomposition strategies on multi-class. Tan et al. [5] proposed an ensemble learning method based on OVO decomposition by decomposing multi-class datasets and training different base classifiers to construct ensemble to accommodate the imbalanced distribution of classes. Another approach is to deal with the multi-class imbalance problem directly, which was first proposed by Sun et al. [6]. By combining cost-sensitive with Boosting ensemble and using genetic algorithms to find the optimal cost of each class to construct the cost matrix to learn on multi-class imbalanced data.

Most of the existing surveys of imbalance data classification methods have been based on two-class imbalance problems. There are only a few surveys summarizing multi-class imbalance classification methods. Sahare et al. [7] mainly introduced data preprocessing techniques combined with neural networks, but only summarized a fewer number of multi-class imbalance methods, and their perspective and review were not comprehensive. Tanha et al. [8] analyzed the application and performance of various Boosting ensemble methods on multi-class imbalance datasets. However, other types of multi-class imbalance classification methods were not described and analyzed. Sridhar et al. [9] introduced sampling methods, decomposition techniques, neural network and ensemble strategies for multi-class imbalance data, but no specific models and algorithms were studied and analyzed. Li et al. [10] summarized the methods in recent years from the perspective of decomposition methods and extemporaneous methods, which collectively referred to cost-sensitive, ensemble learning and deep networks as extemporaneous methods. In the existing surveys, most of the research perspectives are too partial, and no researchers have developed comprehensive description and analysis of multi-class imbalanced data classification methods and evaluation metrics.

This article summarizes and introduces the multi-class imbalanced data classification methods published in recent years. Instead of the perspective of existing surveys, this article provides a comprehensive analysis and summary of both data preprocessing methods and algorithm-level classification methods, as well as a detailed description and explanation of the techniques and performance used by algorithms. The general framework of this article is shown in Fig. 2. The main contributions are as follows:

Fig. 2

Multi-class imbalanced data classification methods.

This article presents a comprehensive overview of data preprocessing methods in dealing with multi-class imbalance problems, including oversampling, under-sampling, hybrid sampling and feature selection. In addition, this article provides a detailed description and analysis of algorithm-level classification methods for multi-class imbalance data from the perspective of ensemble learning, neural network, support vector machine and multi-class decomposition technique for the first time.

In this article, the experimental results of data preprocessing methods and algorithm-level classification methods using the same dataset are compared and analyzed respectively. Meanwhile, the techniques used, comparison algorithms, pros and cons of the various algorithms are described in detail.

This article comprehensively introduces the evaluation metrics commonly used in multi-class imbalanced data classification methods and provides statistics on the evaluation metrics of all the algorithms.

Finally, this article summarizes the current problems in the field of multi-class imbalanced data classification and proposes corresponding solution ideas, such as handling multi-class imbalanced data streams by dynamic selection methods, tackling the concept drift problem in multi-class imbalanced data streams and coping with complex multi-class imbalanced datasets.

2 Multi-class imbalance data preprocessing method

The existing multi-class imbalance data preprocessing methods include resampling and feature selection. The resampling method under-samples the majority class samples or oversamples the minority class samples in the imbalanced data before training the classifier, thus balancing the class distribution. Feature selection is mainly used to filter out redundant data features and retain relevant data features to improve the performance of the classifier. In this chapter, the multi-class imbalance data preprocessing methods are analyzed from four perspectives: oversampling, under-sampling, mixed sampling and feature selection.

2.1 Data preprocessing method based on oversampling

Oversampling is the most commonly used method for preprocessing multi-class imbalance data. It solves the multi-class imbalance problem by introducing new instances of minority classes to rebalance the original biased data distribution [11].

The Synthetic Minority Oversampling Technique (SMOTE) is the most representative method among the oversampling methods, which artificially synthesizes new samples based on the minority class samples to add to the dataset. However, the learnability of minority class samples would be compromised by generating wrong samples, which leads to overgeneralization of minority classes to majority class regions.

To address the weaknesses of existing SMOTE methods, a number of researchers have improved it. Considering the overgeneralization problem that may arise when training on multi-class imbalanced data, Zhu et al. [12] proposed Synthetic Minority Oversampling technique for Multi-class imbalance (SMOM) based on k-nearest neighbors (k-NN), which assigned selection weights to the k-NN direction of each instance when processing the data and granted lower selection weights to the neighboring directions that may produce severe overgeneralization. In addition, Neighborhood-Based clustering for Discovering the clusters of Outstanding instances (NBDOS) is applied to avoid the calculation of selection weights for minority class instances, and the computation of distance between a large number of instances is reduced using two round-robin filters to improve the time performance of the algorithm. Sampling Safety Coefficient for Multi-class Imbalance Oversampling (SSCMIO) [13] also proposed a mechanism to avoid overgeneralization by oversampling the safety coefficients based on the instance neighborhoods and assigning smaller weights to regions that may cause overgeneralization. Different from the SMOM algorithm, SSCMIO employed the reverse nearest neighbor sampling safety factor to prevent the newly generated instances from intruding into the regions of other classes, which could effectively reduce the occurrence of class overlap. Based on Hellinger Distance and SMOTE algorithm (HDSMOTE) [14] guides the direction of the synthesized samples by comparing the Hellinger distance [15] within the neighborhood of minority class instances, and proposes sampling quality assessment based on the Hellinger distance strategy to evaluate synthetic instances so as to solve the overgeneralization and class overlap effectively.

Several researchers argued that the majority class also contained vital information. Synthetic Over-sampling with Minority and Majority classes (SOMM) [16] is a method to synthesize instances by considering information from both minority class and majority class neighbors. Experiments showed that SOMM could improve the performance of the classifier and outperformed the SMOM algorithm. Sridhar et al. proposed a method that combined SMOTE and Z-Score (SMOTE&Z-Score) [17]. By using Z-Score to separate majority class samples from minority class samples to find the correlation patterns. After random oversampling, the SMOTE is used to constrain the results. The experiment proved that the method ensured that the correlation patterns of the balanced data could maintain a high degree of consistency with the original data.

Combining SMOTE oversampling with spectral clustering can effectively deal with outliers in the dataset. One-versus-one and Spectral Clustering (OSC) [18] is an oversampling method based on spectral clustering, which uses the spectral clustering method to divide minority classes in the dataset into subspaces and conducts SMOTE according to the distribution characteristics of the data, thus avoiding oversampling outliers.

The Adaptive Synthetic oversampling algorithm (ADASYN) [19] is also used to deal with data with multiple class imbalances. The method adaptively generates minority class instances by providing different weights for different minority class instances based on the density distribution of the classes, thus balancing the skewed class distribution. Kurniawati et al. [20] proposed the ADASYN-N and ADASYN-KNN methods by improving the algorithm to handle nominal datasets. ADASYN-N method calculates the k-NN of the class by the value difference metric, and then ADASYN-KNN generates synthetic data based on the evaluated nearest neighbor instances. Rahayu et al. [21] further analyzed the ADASYN-N method by conducting experiments on the values of the parameter k in the process of nearest neighbor search, and the results showed that the method has the best performance when the value of k was 5, 7 and 9.

Over-sampling after sorting the class instances according to certain ways can produce better results. Sampling Technique based on Composite weights and Sorting (STCPS) [22] first ranks the internal samples of each class based on the distance of the sample data to the hyperplane. Then the data density around the sampling point is calculated and used as a weight to sample the original samples. The new data is assigned according to the information of the neighborhood of the sampling point, which maintains the characteristics of the original data and can effectively solve the small sample problem existing in multi-class imbalance. Dentamaro et al. [23] proposed Less Important Components for Imbalanced multi-class Classification (LICIC) method by considering that all classes in the data have the same importance. The method works in the transformation space and applies permutations to the proportion of components and similar instances belonging to the same class, thus creating synthetic instances for each minority class. LICIC does not add new information and randomness during data preprocessing, so the classification model has a good generalization capability. Meanwhile, Complexity-based Over-Sampling Technique (COSTE) [24] was also used to deal with the multi-class imbalance problem. Differently from the proximity-based SMOTE method, this method first normalizes the data min-max and calculates the complexity of each instance and ranks them, then selects instances that are similar in complexity to synthesize sample instances to balance the dataset. Lestari et al. [25] applied the COSTE method to multi-class imbalance problems and showed that it works better on G-mean compared to SMOTE.

In oversampling methods, the use of distributed features of classes to synthetic instances can effectively improve the classifier’s ability to recognize minority classes. In this issue, Multi-Class Radial-Based Oversampling (MC-RBO) [26] has shown good performance. The main advantage of this algorithm is that it uses the local data features of each class with intelligent oversampling and does not change the features of the original class. In addition, information from all classes is utilized in the artificial instance generation process. Experiments show that MC-RBO is more robust when minority classes form multiple disjoint clusters.

2.2 Data preprocessing method based on under-sampling

Under-sampling balances the class distribution by removing the number of majority class instances [27]. Since under-sampling methods tend to lose important sample information and their classification results are unstable, oversampling methods have been mostly used to deal with the multi-class imbalance problem. However, some researchers have been able to obtain better classification results by improving and adapting the under-sampling method.

In multi-class imbalanced data, the existence of class overlap problem can lead to ineffective identification of class boundaries, which reduces the performance of the classifier. Wu et al. [28] proposed Based on LOF and Overlap (BLO), which used LOF local outlier point factor and box plot to clean the noisy samples in the training dataset, and under-sampled the important samples after extracting them according to the class overlap. As a result, the original data distribution is maximally maintained and the accuracy of the classifier is improved.

The clustering method combined with under-sampling can effectively deal with majority classes in imbalanced datasets. The Clustering-based Under-Sampling (CUS) [29] clusters the majority class instances and then under-samples the instances with the largest information to form multiple balanced datasets. Experiments show that the method achieves high accuracy in classifying both majority class and minority class instances.

The general sampling method for multi-class imbalanced data first balances the dataset and then trains the classifier. Unlike existing methods, One-Class SVM-Under-Sampling (OCSV-US) [30] is a two-stage algorithm combining under-sampling and genetic algorithm, which processes multi-class imbalanced data using a training-then-balancing approach. In the first stage, M single-class classifiers are trained based on the number of multi-class, and each classifier will return a set of class instances with the highest information values for the next sampling stage. In the second stage, multiple randomly under-sampled subsets of data are created based on the class instances from the previous step, and the best dataset for classification is obtained by applying Genetic Algorithm to evolve the subsets until the fitness function of the subsets can no longer be improved. The results show that the two-stage strategy realized by this method can improve the computational time efficiency and classification accuracy.

2.3 Data preprocessing method based on hybrid sampling

Of the multi-class imbalanced data preprocessing methods, the hybrid sampling method is the combination of oversampling and under-sampling or other approaches, which can effectively alleviate the overfitting problem caused by oversampling and the information loss caused by under-sampling.

Combining SMOTE with other under-sampling techniques is a common approach in hybrid sampling schemes. SMOTE and Clustered Under-sampling Technique (SCUT) [31] generates synthetic examples using SMOTE for minority classes and under-sampling for majority classes using Expectation Maximization clustering, which is suitable for scenarios with high imbalance ratio. In dealing with the class overlap problem, Fuzzy C-Mean and SMOTE (FCMSMT) [32] combines SMOTE and fuzzy c-mean clustering so that all classes have a similar number of class instances and randomly select instances from each cluster, which can effectively solve the problem of class imbalance and overlap. Class Imbalance Aware Review (CIAR) [33] uses a combination of SMOTE and Random Under-sampling [34] to oversample and under-sample the majority and minority classes, respectively, in the data preprocessing stage. The method further divides the balanced sample instances into N subsets to provide to the base classifier for training as a way to improve the time efficiency of the classifier, and experiments show that the CIAR model has the best prediction performance.

A number of researchers have argued that the overfitting of SMOTE is unavoidable and can be especially severe in the case of extremely imbalanced datasets. Therefore, they proposed a new oversampling scheme in hybrid sampling. Minimizing Overlapping Selection under Hybrid Sampling (MOSHS) [35] balances multi-class based on the overlap of classes and uses minority-based oversampling with Edited Nearest Neighbors (ENN) [36] to sample minority and majority classes, respectively. Experiments demonstrate that the method has better results in terms of recall and other metrics. In Similarity Oversampling and Under-sampling Preprocessing (SOUP) [37], the most influential majority class samples are first under-sampled, and then the most important minority class samples are oversampled by analyzing the security levels generated by their neighborhoods. The results show that SOUP performs better than the Static-SMOTE and Global-CS methods.

The Random Balance [38] strategy is a two-class imbalanced data preprocessing strategy that uses random class proportions to randomly under-sample and oversample the data. Based on this, Rodríguez et al. [39] proposed the Multi-class Random Balance (MultiRandBal) to extend random balance to multi-class imbalanced datasets. Contrary to the previous approach, the method uses randomly generated priors for sampling instead of class proportions. Hartono et al. [40] combined dynamic ensemble selection with MultiRandBal in their Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method, maintaining the diversity of data and classifiers and achieving higher performance using few classifiers.

2.4 Data preprocessing method based on feature selection

Datasets with high dimensionality add difficulty to classification, and classifiers may not be able to respond to and process the features efficiently. Feature selection [41] is an effective method in the field of data mining which aims to select more relevant data features to provide a concise and clear description of the data to improve the time and memory efficiency of learning models. In recent years, the use of feature selection for processing multi-class imbalanced data has gradually received attention from researchers.

Fernández et al. [42] proposed Ensemble classifier from a Feature and Instance Selection by means of Multi-Objective Evolutionary Algorithm (EFIS-MOEA), which was mainly used to solve the problem of class overlap. The method uses a multi-objective evolutionary approach to simplify class boundaries by limiting features that may pose difficulties for class boundary identification, making it easier to distinguish different classes. Then it finds the appropriate class distribution based on instance selection to solve the imbalance problem, while eliminating noisy instances. EFIS-MOEA can be embedded in any classifier and is highly universal.

To be able to handle both labeled and unlabeled instances, weighted Pattern Matching approach for Classification (PMC+) [43] is proposed by Sreeja et al. PMC+ classifies unlabeled instances by computing the absolute difference between the feature values of instances and unlabeled instances. To further improve the performance of PMC+, a pyrotechnic algorithm based on feature weight selection was also proposed for feature selection, and a storage pool and a selection pool were set up. The storage pool initially stores all the features of the dataset, and the selection pool stores the selected features, weight and Kappa. The algorithm dynamically updates the pools in each iteration until the optimal features and weights for classification are retained in the selection pool. Experiments prove that the algorithm performs well on AUC.

Rough set theory [44] is an effective method to deal with the ambiguity and uncertainty of datasets and can be applied to feature selection. Roughly Balanced Bagging (RBBag) [45] draws on the ideas of random subspaces and random forests to randomly select a subset of attributes from the set containing all attributes and use them as samples to train the base classifier. The algorithm adds decision bounds for rare and unusual instances to ensure that instances of minority classes are correctly classified and can effectively address the case of class overlap. Experiments were conducted on UCI dataset and real dataset to demonstrate the effectiveness of the method. Rough-Set-based Feature Selection Algorithm for Imbalanced Data Multi-class (RSFSAID-M) [46] considers the imbalance distribution of classes by feature significance, which calculates the feature significance of each attribute based on the granular structure of each instance in the boundary region, and then selects the feature dataset to provide to the classifier based on the feature significance. The calculation of feature significance is shown in Equation (1). $s_{l} = \sum_{j = 1}^{| U / - d |} \frac{R_{l, j} \times P_{l, j}}{R_{l, j} + P_{l, j}}$ (1)

Where the $R_{l, j} = \frac{ɛ_{l, j}^{+}}{| D_{j} |}$ and $P_{l, j} = \frac{ɛ_{l, j}^{+}}{ɛ_{l, j}^{+} + ɛ_{l, j}^{-}}$ , then $ɛ_{l, j}^{+}$ and $ɛ_{l, j}^{-}$ denote the numbers of estimated correctly classified and misclassified instances, respectively.

Sun et al. [47] proposed Feature Reduction algorithm for imbalanced data using Similarity-based feature clustering and AWKNN (FRSA). Initially, the method first uses the differences of samples in each dimension to build a similarity measure matrix to measure the similarity between clusters, and constructs a new hierarchical clustering model to generate new samples. Secondly, the normalized information gain is introduced to design the symmetric uncertainty between each feature and other features. Then the initial feature clustering centers are determined automatically by symmetric uncertainty-based adaptive weighted k-NN. Finally, the optimal feature subset is selected from feature clustering using symmetric uncertainty-based feature parsimony method to improve the accuracy of the classifier.

2.5 Chapter summary

This chapter summarizes the multi-class imbalance data preprocessing methods in terms of oversampling, under-sampling, hybrid sampling and feature selection. In order to further explore the performance of data preprocessing methods on handling multi-class imbalanced datasets, algorithms using the same datasets are analyzed and compared in this chapter. Table 1 lists the datasets commonly used by the data preprocessing methods and describes the parameters of the datasets. Table 2 lists 10 algorithms using these four datasets.

Table 1
Dataset parameters

Dataset Attribute Class Instance Class distribution

Ecoli 7 8 336 143, 77, 52, 35 20, 5, 2, 2

Yeast 8 10 1484 244, 429, 463, 44, 51, 163, 35, 30, 20, 5

Vehicle 18 4 846 199, 212, 217, 218

Wine-Quality 12 7 6497 30, 216, 2138, 2836, 1079, 193, 5

Dataset	Attribute	Class	Instance	Class distribution
Ecoli	7	8	336	143, 77, 52, 35 20, 5, 2, 2
Yeast	8	10	1484	244, 429, 463, 44, 51, 163, 35, 30, 20, 5
Vehicle	18	4	846	199, 212, 217, 218
Wine-Quality	12	7	6497	30, 216, 2138, 2836, 1079, 193, 5

Table 2

Algorithms using the same datasets

Algorithm	Method	Ecoli	Yeast	Vehicle	Wine-Quality
SMOM	Over-sampling	√	√	√
MOTE&Z-Score	Over-sampling	√	√		√
MC-RBO	Over-sampling	√	√	√
CUS	Under-sampling	√	√
SCUT	Hybrid sampling	√	√		√
FCMSMT	Hybrid sampling	√	√		√
SOUP	Hybrid sampling	√	√	√	√
EFIS-MOEA	Feature selection	√	√
PMC+	Feature selection	√	√
RBBag	Feature selection	√	√	√

On the Ecoli, Yeast and Wine-Quality datasets with multiple minority classes, SCUT outperforms all other algorithms. The classifier used by SCUT is the decision tree J48, which achieves the highest values of 90.7 and 93.8 for G-mean and AUC on Ecoli, respectively. This is due to the fact that SCUT does not overuse sampling, to some extent maintains the original class distribution and balances all class instances. The best performance on the Vehicle dataset with relatively balanced class instances is SOUP, which also uses the decision tree J48 classifier, with a G-mean value of 91.5. However, the performance on the other three datasets is worse, so SOUP is not suitable for datasets with a large number of classes and multiple minority classes. The SMOM based on improved SMOTE was experimented on Ecoli, Yeast and Vehicle datasets, using MAUC as the evaluation metric, and its average MAUC value reached 87.8. In the feature selection methods, experiments were conducted mainly on Ecoli and Yeast datasets. EFIS-MOEA uses multi-objective evolution and instance selection with an AUC of 84.41, which is easier to distinguish different classes and achieves good results in general. EFIS-MOEA is better than PMC+, but PMC+ can handle unlabeled instances in the dataset. RBBag only uses G-mean as evaluation metric, and the algorithm only performs better on the Yeast dataset with G-mean value of 81.5.

It can be seen from the analysis that oversampling and hybrid sampling methods that take class distribution information into account can achieve better results on datasets with multiple minority classes. Due to the poor performance of under-sampling on multi-class imbalance, there has been more research on oversampling methods. Most of the oversampling method is based on the improved SMOTE, which boosts the percentage of minority class instances in the original data. Another oversampling method is to balance the skewed class distribution based on the density or weights of classes without changing the original class characteristics. The results are shown that it can improve the generalization ability of classification models. Hybrid sampling combines the advantages of oversampling and under-sampling, which can solves the overfitting and information loss problems effectively. Experiments show that the hybrid sampling-based approach works better than using oversampling or under-sampling alone. Compared with the sampling-based method, the classifier constructed using the training set after feature selection can better ensure the class distribution and improve the performance of the classifier. The multi-class imbalance data preprocessing methods are summarized and compared in Table 3 in terms of technique, dataset, comparison algorithm, pros and cons.

Table 3

Multi-class imbalance data preprocessing methods

Algorithm	Technique	Method	Datasets	Compared algorithm	Advantages	Disadvantages
SMOM [12]	SMOTE/ k-NN/ NBDOS	Over-sampling	Vehicle, Yeast, Vowel, Abalone, Ecoli, Wine-red	ROS, SL-SMOTE, MWMOTE, MDO, ADASYN	Adopts mechanisms to avoid overgeneralization and selects safer synthesis instances of neighboring directions which can effectively identify minority classes.	The algorithm is difficult to implement and only suitable for dealing with continuous attributes.
SSCMIO [13]	SMOTE/ Near-neighbor sampling/ Safety factor	Over-sampling	Ecoli, Yeast, Led7digit, LEV, ERA	SMOM, MWMOTE, ADASYN, BSMOTE	By using the local and global features of sample points to calculate the safety factor, it can avoid overgeneralization and deal with the class overlap problem.	Only datasets with numerical attributes can be handled, no experiments were conducted on datasets with nominal attributes.
HDSMOTE [14]	SMOTE/Hellinger distance	Over-sampling	Yeast, Balance, Cleveland, Wine, Vehicle-mc	SMOM, MWMOTE, ADASYN, Cluster SMOTE	The overgeneralization and class overlap problems can be effectively addressed by guiding sample synthesis through Hellinger distance.	The time complexity of the algorithm is higher and the definition of the local nearest neighbor domain is too simple during sampling.
SOMM [16]	SMOTE	Over-sampling	Vehicle, Abalone, Wine Quality	SMOM, MDO	Considers the neighborhood information of majority and minority classes and generates instances adaptively with high classification accuracy.	The algorithm performs poorly when the spaces of the minority and majority classes overlap significantly.
SMOTE&Z-Score [17]	SMOTE/Z-Score	Over-sampling	Wine Quality, Yeast, Ecoli	Random Forest	Separates majority class and minority class samples and find relevant patterns to synthesize instances, which preserves the original class distribution.	Only focus on minority class instances, without considering useful information from majority classes.
OSC [18]	SMOTE/Spectral clustering	Over-sampling	Ecoli, Market, Car	OSM, IAOS, OUB, DRCW_ASEG	Samples based on data features and responds better to outliers.	The problem of class overlap in multi-class datasets is not handled.
ADASYN-N/ ADASYN-KNN [20]	ADASYN/KNN	Over-sampling	Pap smear	SMOTE-N	The class distribution is considered and synthetic data is generated by the value difference metric which improves the classifier accuracy.	Only a single dataset and comparison algorithm were used.
STCPS [22]	Hyperplane distance/Euclidean spatial distance	Over-sampling	Weather data, Market	SMOTE, SVMOM, SMO+TLK	Weights are set in terms of hyperplane distance and data density, which preserves the original data features.	The class overlap problem is not considered, and the classification computation speed of the algorithm needs to be improved.
LICIC [23]	KCPA/Transformation space	Over-sampling	MicroMass, GCM	ADASYN, SMOTE, Random Forest	Without adding new information and randomness to the data, the generalization ability of the classifier is improved and it can handle high-dimensional datasets.	The algorithm needs to configure more parameters correctly and does not handle the case of class overlap.
COSTE [25]	Min-max normalization	Over-sampling	Ant, Camel, Log4j, Poi, Jedit	SMOTE, Borederline, MWMOTE	Synthesizing sample instances with the complexity of instances improves the diversity of data.	The time complexity of the algorithm is high and needs further improvement.
MC-RBO [26]	Climbing algorithm	Over-sampling	Vehicle, Ecoli, Wine, Wine-red	MLP,, SMOM, ADASYN, SMOTEBag	The features of the original class are preserved by using local data features of each class with intelligent oversampling.	The algorithm requires multiple observations when majority of the classes contain more information, in which case the time performance is poor.
BLO [28]	LOF/Box plot	Under-sampling	Ecoli, Glass, Balance, User	CMSVM, FSVM-CLL, CSMVMsuiji, DEC	The noisy samples are efficiently processed and the original class distribution is maintained by under-sampling based on class overlap.	The number of classes and samples in the dataset for the experiment is small and data with high imbalance ratio is not analyzed.
CUS [29]	Clustering	Under-sampling	Ecoli, Yeast, Abalone, prima, poker	AdaBoost, RUSBoost, SOMTEBoost	It has good performance on datasets with high imbalance rate.	The algorithm does not process high-dimensional datasets.
OCSV-US [30]	Genetic Algorithm	Under-sampling	Balance, Car, Cleveland, Wine, New-Thyroid	S-SMOTE, SMOM, MDO	By obtaining the class instances with the highest information values to sample, the time efficiency and classification accuracy are significantly improved.	The algorithm does not handle the case of class overlap.
SCUT [31]	SMOTE/EM Clustering	Hybrid sampling	Ecoli, Yeast, Autos, Wine, Thyroid	SMOTE, CUT, RU	It performs better on datasets with high number of classes and high imbalance, and is able to effectively deal with noise.	The threshold value set for sampling individual class is the average value of all class instances, which does not take into account the class distribution correlation.
FCMSMT [32]	SMOTE/FCM	Hybrid sampling	Ecoli, Yeast, WineQuality	FCM, SCUT, SMOTE	Makes all classes with similar number of instances and can effectively handle high imbalance and class overlap problems.	The improvement of the algorithm is not significant at low imbalance ratio.
CIAR [33]	SMOTE/RUS	Hybrid sampling	AIV, CPA, Electronics, Video Games	SLR+US, SCUT, SLR+SMOTE, SLR+OS, IDBN	The synthesized samples are further divided into subsets to train the base classifier, which improves the time efficiency and classification performance.	The experiment is conducted only for the comment datasets, and no other forms of datasets are considered.
MOSHS [35]	M-SMOTE/ENN	Hybrid sampling	Wine-red, Flare, Pageblocks, Car	Neighbor Based Under-sampling	Overfitting is limited during sampling and class overlap is handled.	Only under-sampling based algorithms are compared and doesn’t reflect the algorithmic advantages.
SOUP [37]	Similarity oversampling/Undersampling pre-processing	Hybrid sampling	Car, Wine-quality, Ecoli, Balance, Yeast, Flare	MRBB, SMOTE, Global-CS, OVOROS	The performance of the classifier can be improved by removing the harmful majority class instances and oversampling according to instances neighborhood safety level.	The time complexity of the algorithm is high and the comparison algorithm is too simple
HAR-MI [40]	MultiRand-Bal	Hybrid sampling	Pageblocks, Ecoli, Balance	DES-MI	Maintains better data diversity and improves the classification performance of the classifier.	The experiments are not performed on datasets with a large number of attributes.
EFIS-MOEA [42]	Multi-objective Evolutionary algorithm	Feature selection	Ecoli, Yeast, Pageblocks, Glass	Global-CS, C4.5, AdaBoost.NC, RandomForest;	Simplifies class boundaries by feature selection for easy identification and eliminates noisy instances.	The algorithm has poor time performance for large datasets.
PMC+ [43]	Fireworks/Kappa	Feature selection	Balance, Glass Yeast, Wine, Ecoli, Thyroid	CSVM-CS, k-NN, C4.5-SMOTE	The use of weighted pattern matching and fireworks feature weight selection leads to classification with high reliability and better performance in terms of time efficiency.	Only using the AUC value as an evaluation metric does not reflect the performance of the algorithm.
RBBag [45]	Fuzzy balance	Feature selection	Vehicle, Car, Ecoli, Yeast, Balance	J4.8, oMRBBag, uMRBBag	Adds decision bounds for rare and unusual instances to ensure that minority class instances are correctly classified, which can effectively address the class overlap problem.	Improves the identification of unsafe minority class instances while worsening the classification of majority class instances.
RSFSAID-M [46]	Rough-set	Feature selection	Yeast, car_df Vehicle	SYMON	Proposes to select feature datasets based on feature significance, which preserves the original class distribution.	The algorithm selects features in classes with noise will reduce the ability of the classifier.
FRSA [47]	AWKNN/Feature Clustering	Feature selection	Yeast, Vehicle, Glass, Car	ODP; RSFSAID; SYMON	The algorithm takes into account the correlation and redundancy between features, and is able to select the optimal subset of features to improve the classification accuracy.	The determination of the similarity threshold takes a long time during the sampling process.

3 Multi-class imbalance algorithm-level classification method

Algorithm-level classification methods improve the accuracy of class prediction by optimizing the base classifier or classification model. Currently, algorithm-level classification methods for multi-class imbalance classification can be classified into four categories: ensemble learning, neural network, support vector machine and multi-class decomposition techniques.

3.1 Algorithm-level classification method based on ensemble learning

Ensemble learning is a method for solving imbalanced multi-classification problems, which is usually superior to methods using single classifiers. Ensemble learning combines multiple single classifiers after training and generally uses majority voting mechanism for classification.

3.1.1 Ensemble learning based on hybrid strategy

Hybrid ensemble creates balanced training sets for the base learner by combining ensemble learning methods with data-level methods. The combination of ensemble and data-level methods will result in the creating balanced set before training the base model, which can improve the performance of the ensemble classifier. The hybrid ensemble learning model is shown in Fig. 3.

Fig. 3

Hybrid ensemble learning model.

Some algorithms combine resampling techniques with ensemble learning methods. Bhowmick et al. [48] proposed Hybrid Ensemble technique for Classification of Multi-class Imbalanced data (HECMI) to deal with datasets with multiple majority and minority classes. The instances of classes with recall below the threshold are oversampled and added to the next data part in the training when constructing the model. The final prediction is done by obtaining the majority votes of the classifiers in the ensemble. The results show that the method can effectively handle and classify multi-class imbalanced data. However, it performs poorly in the case of containing noise and outliers. Purwar et al. [49] proposed Sampling And Genetic Algorithm Based Ensemble Classifier (SA-GABEC), which attempted to find the best subset for given samples that is the most accurate in prediction. SA-GABEC first applies the genetic algorithm to the dataset and then under-samples the majority class. In addition, different subsets of data are used in the learning process of generating classifiers. Finally, the different classifiers are combined together to form ensemble that ensures the diversity of classifiers.

To explore the effect of combining different sampling techniques and ensemble classifiers on the predictive performance of classification models, Sainin et al. [50] conducted experiments on existing ensemble methods and used two combined sampling classifiers and ensemble classifiers, namely the resampling ensemble and the SMOTE ensemble. Meanwhile, different base classifiers were selected and several combinations were constructed, trained and tested on large multi-class imbalanced benchmark dataset. Experiments demonstrate that the ensemble using random forest outperforms any single classifier.

Rather than previous ensemble learning methods, Mahadevan et al. [33] proposed Class Imbalance Aware Review (CIAR) method where Boosting and Bagging were nested in order to create a robust ensemble structure. To begin with, the training set is balanced by SMOTE and RUS techniques, which is used to create the base learner in Bagging ensemble. The CIAR model is constructed by nesting, where the model uses Bagging as the main ensemble. The base learner in the Bagging ensemble is AdaBoost, while the base learner in the AdaBoost is a decision tree. As a result, the base learners constructed by the CIAR model are all enhanced and strengthened. By experimental comparison, the model have the best prediction performance while achieving the highest values on G-mean, F1-score and ROC.

Compared with the hybrid ensemble method based on resampling, applying the threshold shift technique to the ensemble can obtain better classification results. Collell et al. [51] proposed Probability Threshold bagging (PT-bagging) based on the threshold moving technique. Threshold moving technique is an alternative approach to deal with imbalanced data that relies on the weights or posterior probabilities of the classes. PT-Bagging preserves the natural distribution of classes by bootstrapping the sampling to balance the classes. Then, a Bagging ensemble is created that moves the threshold to assign class labels. The algorithm is compared with the resampling-based approach, the results demonstrate that PT-bagging outperforms the resampling-based approach in terms of macro accuracy and macro F1-score. Alam et al. [52] proposed Partition using Balanced Distribution (PBD), which used recursive-based data partitioning techniques to transform multi-class imbalance problems into multiple balance problems. The method first specifies a threshold value to recursively partition the data until the imbalanced data are partitioned into balanced data. Then, a classifier is constructed for each data partition, and then all the classifiers are combined together to construct the ensemble classifier. The ensemble classifier uses a voting mechanism to classify the data. After experiments on several datasets, PBD has high average accuracy and F-Measure.

Several researchers have combined evolutionary algorithms with ensemble. Ensemble of classifiers based on Multi-Objective genetic Sampling for Imbalanced Classification (E-MOSAIC) [53] can eliminate the risk that minority classes in the dataset receiving less attention. The method uses a multi-objective evolutionary algorithm to derive a set of classifiers from an imbalanced dataset and evolves balanced samples in the original data guided by the classification accuracy, thus introducing classifiers with high prediction accuracy and diversity for each class.

The heterogeneous ensemble is a powerful and complex ensemble model, which can effectively handle multi-class imbalanced data. Ensemble Filter Selection Method (EFSM) [54] is mainly used for outlier detection and handling imbalances. The global outliers are filtered and the dataset is resampled using SMOTE in the preprocessing stage, and then the multi-class dataset is binarized by decomposition technique. In the model construction stage, a heterogeneous ensemble model is constructed using Adaboost, random subspace algorithm and random forest as the base classifier. Finally, the constructed classifiers are combined according to the probabilistic average voting rule and evaluated using 10-fold cross-validation. Experiments prove that the model ensures the completeness of majority classes and shows better performance in terms of outlier detection and classification accuracy. Arumugam et al. [55] proposed Neighborhood based Adaptive Heterogeneous Oversampling Ensemble Classifier (NAHOEC). The 5 neighboring instances of the minority class instances are found using k-NN during data preprocessing and these instances are oversampled based on random number and the total number of instances in the current minority class, thus generating a balanced dataset for training. Then, N training datasets are constructed in the ensemble construction process and a list of base classifiers is constructed on each dataset. Finally, the base classifiers are evaluated using test dataset, so that K adaptive classifiers are selected to build the ensemble.

To address the problems of imbalance and concept drift in the data stream, Vafaie et al. [56] proposed Improved SOMTE Online Ensemble (ISOE) and Improved Online Ensemble (IOE) that can dynamically balance the training set. ISOE uses a sliding window of fixed size to process the data instances and sets Poisson distribution rate parameter based on the recall as the threshold. If the class recall is higher than the threshold, the window will be oversampled by SMOTE. Finally, the sampled data is used to train the online ensemble. The rate parameter is retained in IOE while eliminating SMOTE, and minority classes are oversampled by recall-based class weights. Experimental results demonstrate that the IOE performs better than ISOE on G-mean and it produces accurate results on both static and evolutionary data streams. The Poisson distribution rate parameter of ISOE and IOE are shown in Equations (2, 3), respectively. Where r_c is recall of class c and r_avg - excluding - c is the average recall of the classes excluding class c. In Equation (3), where W_c is the weight of class c and max (W) is the maximum W for all classes. $\begin{matrix} k = Poisson (r_{avg - excluding - c /} r_{c}) \end{matrix}$ (2) $k = Poisson ((r_{avg - excluding - c /} r_{c}) * (\max (W) / - W_{c}))$ (3)

3.1.2 Ensemble learning based on dynamic selection

At present, most of the existing research focuses on static ensemble. In recent years, the application of dynamic selection techniques to multi-class imbalance classification problems has received attention. The technique includes a dynamic selection module for selecting a set of base classifiers from a pool with the given test instances, thus constructing the best ensemble. The dynamic selection ensemble can achieve higher performance than static ensemble on the imbalanced classification problem. The ensemble method for dynamic selection is shown in Fig. 4.

Fig. 4

Dynamic selection of ensemble method.

Roy et al. [57] and Cruz et al. [58] proposed to apply a combination of dynamic selection techniques and data preprocessing techniques to handle multiple types of imbalanced data. They proposed a combination of multiple preprocessing methods and dynamic selection. An experimental analysis is performed on Bagging, while a comparison is made with static ensemble in datasets with different degrees of imbalance. The results show that the application of dynamic selection and data preprocessing techniques can improve the performance of the ensemble classifier in classifying minority classes and outperform the static ensemble approaches.

The combination of dynamic selection with static ensemble can lead to a better combination of classifiers. Zhao et al. [59] proposed to combine dynamic selection strategy with the currently popular multi-class imbalanced static ensemble methods and experimented with 14 static ensembles. The dynamic selection strategy is applied in the training process, and the weights of each neighbor sample are calculated based on the k-NN of the test sample x_i. Then the classification ability of each base classifier h for neighboring samples of x_i is calculated and ranked based on the ability values of the classifiers, so that the top N classifiers for predicting x_i are selected and added to the ensemble. Finally, the classifier in the ensemble classify x_i by voting. Experiments prove that the MAvA and F-measure of the improved static ensemble after dynamic selection are significantly enhanced and can achieve the desired classification performance. The classification ability calculation formula and voting method are shown in Equations (4, 5), respectively. Where x_it is the t-th neighbor instance of x_i, w_it is the weight of x_it and y_t is the true label of x_it. $F_{h | x_{i}} = \sum_{t = 1}^{k} I (x_{it}) \times w_{it}$ (4) $I (h (x_{it}) = y_{t}) = {\begin{matrix} 0, h (x_{it}) \neq y_{t} \\ 1, h (x_{it}) = y_{t} \end{matrix}$ (5)

The Dynamic Ensemble Selection for Multi-class Imbalanced datasets (DESMI) [60] method handles multi-class imbalanced data through two phases. In the first stage, a preprocessor is developed that mixes random under-sampling, random oversampling and SMOTE to balance the training set. In the second stage, the ensemble is constructed with a weighted voting approach, which evaluates the ability of candidate classifiers based on the weighted instances in the neighborhood. Then, a set of classifiers with strong classification ability for minority class instances is selected to construct the ensemble, and the ensemble outputs the final classification results by voting. The results show that DESMI can effectively process and classify multi-class imbalanced data, but the method has a high time complexity.

3.2 Algorithm-level classification method based on neural network

In the field of machine learning, there are various algorithms that have been proposed for classification problems. Nevertheless, for imbalanced multi-class classification tasks, existing classifiers may not be able to adapt to such complex data environments. Due to the strong robustness and fault tolerance of neural network, some researchers have applied them to multi-class imbalanced classification scenarios. Recently, extreme learning machine and deep learning methods have obtained more researchers’ attention.

3.2.1 Extreme learning machine classification method

Extreme Learning Machine (ELM) is an efficient algorithm proposed by Huang et al. [61]. Unlike traditional machine learning algorithms such as BP-based neural network or Support Vector Machine (SVM), the learning parameters of the hidden layer of ELM are randomly generated and the output weights can be calculated by least squares method. In addition, ELM is easy to implement, which has better generalization performance and faster learning speed. Figure 5 shows the structure of ELM.

Fig. 5

Structure of extreme learning machine.

To improve the learning performance of the classical ELM algorithm for multi-class imbalanced data, several researchers have improved ELM and combined it with other advanced techniques. The G-mean and Probability ELM (GPELM) [62] algorithm uses the probability of training samples in each class to calculate the G-mean. In addition, the probability that the training samples belong to each class is introduced in the design of the cost function in order to maintain the initial data distribution. Then, an ELM parameter optimization problem is constructed to minimize the 2-norm of the weight matrix and the G-mean-based cost function calculated by the probability function. The algorithm has the best performance on G-mean.

The kernelized ELM has better learning results compared to the traditional ELM using random input parameters. Li et al. [63] proposed the Parallel one-class ELM (P-ELM). In P-ELM, the training dataset is first divided into k subsets according to the number of classes. Then, the divided training dataset is fed into the corresponding k kernel-based one-class ELM classifiers, and each one-class classifier executes the operation parallel. The output function of the P-ELM classifier constructs an estimate of the probability density. Therefore, the class properties of the samples can be directly determined by comparing the output function values of each kernel-based single-class ELM classifier. After analysis and validation, P-ELM has good classification accuracy and time efficiency. Generalized Class-Specific Kernelized ELM (GCSKELM) [64] maps the data to the kernel space by applying Gaussian kernel functions, which avoids the non-optimal hidden node problem and reduces the computational effort of the classifier caused by existing ELM methods. Meanwhile, the algorithm uses class-specific regularization parameters determined by class proportions to improve the generalization performance. Class-specific Cost Regulation ELM (CCR-ELM) [65] introduces the class-specific regulatory cost when classes are misclassified as a trade-off between structural and empirical risk. In addition, the optimal combination of all output function parameters is obtained by grid search, which reduces the effect of the number of class samples and the degree of data dispersion. In addition, the algorithm introduces a kernel function matrix for dealing with the case of class overlap. Experiments demonstrate that CCR-ELM can significantly improve the classification performance and is applicable to both multi-class and two-class imbalance classification scenarios. The classification formula of CCR-ELM is shown in Equation (6). Where H is the hidden layer output matrix, T is the training data target matrix, ı is the unit matrix, h ( x ) is the output vector of the hidden layer about the input x, C^d is the d-th class and Ω_ELM is the kernel matrix of ELM. $\begin{matrix} f_{M} (x) = h (x) H^{T} (\sum_{d = 1}^{D} \frac{ı}{C^{d}} + H^{T} H) T \\ = [\begin{matrix} K (x, x_{1}) \\ . \\ . \\ K (x, x_{N}) \end{matrix}] {(\sum_{d = 1}^{D} \frac{ı}{C^{d}} + Ω_{ELM})}^{†} T \end{matrix}$ (6)

Learning on multi-class imbalanced data streams is also worthy of attention, and several researchers have applied online learning to ELM. Mirza et al. [66] first proposed a sequential classifier to solve multi-class imbalanced data streams, which was called Voting based Weighted Online Sequential ELM (VWOS-ELM). This method extends the weight matrix of WOS-ELM [67] to multi-class and constructs several independent WOS-ELM-based networks to accommodate the constant arrival of new data. VWOS-ELM can tackle the class imbalance problem in one-by-one and block-by-block patterns without storing the previously learned samples. Weighted Online Sequential ELM with Kernels (WOS-ELMK) [68] also improves the WOS-ELM by using implicit kernel mapping instead of random feature mapping. With the use of kernel mapping, it is possible to adapt to some random initialization of new data and maintain the stability of the classifier even if only a single classifier is used. In addition, WOS-ELMK implements a fixed memory scheme to save the computational load on large imbalanced data streams. Yu et al. [69] proposed Generative WOS-ELM (GWOS-ELM) using a two-stage gaming strategy. There are data generation stage and model update stage in this method. In the data generation stage, minority class samples are generated using two dynamic least squares with a game strategy to balance the class distribution. In the model update stage, the classification model is updated according to the current prediction performance and cost sensitivity. The method establishes the relationship between the new weights and individual classifiers based on the changing imbalance ratios. These strategies help to reduce fitting errors. Experiments show that GWOS-ELM can effectively predict changing data streams and improve the generalization performance of classifiers for online prediction. Post-Boosting using extended G-mean (PBG) [70] is a novel learning method that effectively addresses the challenge of sequentially arrived multi-class imbalanced data by post-adjusting the classification boundaries under extended G-mean. Moreover, with maximizing the extended G-mean, PBG can dynamically focus more attention on those classes that are prone to misclassification.

Multi-class classification on highly imbalanced datasets is more difficult and requires consideration of classifier accuracy and training efficiency. Vong et al. [71] designed the Sequential Ensemble Learning (SEL) framework on ELM to address these problems simultaneously. This framework improves the accuracy of classifiers on highly imbalanced datasets by dividing the samples of majority class into multiple small and disconnected subsets for training weak classifiers. The experimental results conclude that the SEL is suitable for scenarios with short training time and high classification accuracy.

3.2.2 Deep learning classification method

Recently, researchers have proposed many deep learning-based methods for classifying multi-class imbalanced data due to deep learning has the powerful performance and ability to handle complex data.

AdaBoost-CNN (AdaBoost-Convolutional Neural Network) [72] combines CNN with AdaBoost ensemble to be able to maintain high classification accuracy in large datasets. In AdaBoost-CNN, weights are assigned to each training sample based on the learning ability of the weak learner on the sample. Then, AdaBoost-CNN uses a migration learning strategy during training to transfer the knowledge gained to the next CNN estimator when training a single CNN estimator and update the weights of the training samples to reduce the computational effort of the CNN. Experiments on multiple types of datasets show that AdaBoost-CNN reduces the deep learning algorithm time complexity and can effectively classify large datasets and highly imbalanced datasets.

In the problem of medical image analysis, Yuan et al. [73] proposed a Regularized Ensemble Framework of Deep Learning (REFDL) for cancer detection. The algorithm builds ensemble by Adaboost.M1 and uses Deep Neural Network (DNN) as the base classifier. A regularization parameter is introduced during training to correct the classifier error. In addition, the algorithm uses a weighted hierarchical sampling technique to sample each class according to the data distribution thus balancing the dataset. The experiments demonstrate that REFDL is able to handle the multi-class imbalance problem on high-dimensional image datasets and train a classifier with stable performance.

The multi-class imbalance problem also often arises in the field of Hyperspectral Image (HSI) classification. Lv et al. [74] combined ensemble classification with deep learning and proposed Enhanced random feature subspace based Ensemble CNN (EECNN). Firstly, the number of instances of each class is sorted in descending order and the dataset is randomly oversampled by the oversampling rate thus obtaining the balanced dataset. The oversampling rate is shown in Equation (7), where N_l is the number of instances of the lth class and N1 is the number of instances of the largest class. Then, a subset of features is extracted by random feature selection to train T CNN classifiers to build the ensemble model. Finally, the final classification results are obtained by majority voting. The experiments show that the model has better performance and robustness compared with other traditional algorithms. $α % = (N 1 - N_{l}) / - N 1 \cdot 100 %$ (7)

On the text imbalance multi-classification problem, Tong et al. [75] proposed a multi-model based deep learning framework, DistilBERT BI-LSTM Predictor (DBLP), which is mainly used to classify short text datasets. Firstly, DistilBERT is applied at the encoder layer to obtain sensitive dynamic word embedding as the input of BI-LSTM. Then, the hidden key features are extracted from the text by the BI-LSTM network and stored in the feature matrix to improve the classification performance. In addition, a max-pooling layer is built to reduce the dimensionality of the feature matrix. Finally, the obtained feature matrix is used as the input of the softmax layer and normalized to obtain the final classification results. Experiments prove that the model maintains state-of-the-art performance on the short text multi-class imbalance classification problem and has lower time complexity.

3.3 Algorithm-level classification method based on support vector machine

SVM [76] is a machine learning method based on statistical theory. The main problem of SVM is how to select kernels, improve accuracy, increase speed and correctly set the values of key parameters during training and testing to obtain the best generalization performance [77].

There are often noisy samples in the dataset that affect the classification accuracy. To address this problem, Wu et al. [78] proposed Fuzzy SVM (FSVM). The algorithm uses the distance from the training samples to the class centers and the weighted class overlap method to design the sample fuzzy affiliation function, and assigns the corresponding affiliation value according to the importance of the samples, which means increasing the weights of the support vectors and decreasing the weights of the noise. Meanwhile, the improved class overlap degree method is used to distinguish the support vectors that play a decisive role in hyperplane classification and assign them a higher affiliation value. Experimental results show that the algorithm can solve the imbalance and noise problems in multi-class data more effectively.

Abdalazie et al. [79] proposed a hierarchical classification model based on Multi-class SVM (Multi-class SVM) in order to obtain more accurately minority or rare instances and assign them to a minority class. The model uses a grouping algorithm to generate new balanced synthetic samples from the original imbalanced classes, which are classified by a hierarchical step. In addition, the model conducts experiments on the issue of whether to assign weights or not, and the class weights are calculated as shown in Equation (8). The results show that the model performs best in terms of G-mean when weights are given to the class instances. The Multi-class SVM hierarchical model structure as shown in Fig. 6. $W_{c_{i}} = \frac{total sample}{number of class \times sample of c_{i}}$ (8)

Fig. 6

Multi-class SVM hierarchical model structure.

The SMOTE-Least Square SVM (SMOTE-LSSVM) [80] applies intelligent optimization algorithm to the parameter optimization problem and builds a classifier to deal with the multi-class imbalance problem by using SMOTE and least square SVM. The method first decomposes the multi-classes and then uses SMOTE to balance the data. Finally, the parameters of the LSSVM classifier are optimized according to the particle swarm optimization and gravitational search algorithms, which combine the global search capability of the former with the local search capability of the latter to improve the performance of the classifier. After an in-depth analysis of the effects of class imbalance and class overlap in traditional learning models, Devi et al. [81] proposed the One-class SVM and Under-Sampling technique (OSVM-US). The model first uses a one-class SVM to detect overlapping instances as outliers. Then, under-sampling of majority class instances is performed by Tomek-link pairs, and the boundary, redundancy and overlap cases are eliminated based on sparse neighborhood. Finally, the refined training set is fed to the final stage of learning to train the three classifiers and their performance is evaluated. The experimental results demonstrate that the model improves the classification accuracy of the minority class. Moreover, only the largest number of majority class instances are eliminated, thus ensuring the completeness of the other majority classes. However, the performance of the classifiers in this method decreases as the class overlap rate increases. Mehmood et al. [82] also investigated the class overlap problem by proposing Modified SVM with AdaBoost (MSVM-AdB). In the data processing stage, the overlapping and non-overlapping regions of the multi-class dataset are divided using the Euclidean distance formula. Then, regions with dense overlapping samples are mapped to higher dimensions according to a kernel mapping function based on custom standard support vector machine in order to facilitate the base classifier to find the optimal hyperplane, thus predicting the minority class samples and improving the final classification accuracy.

3.4 Algorithm-level classification method based on multi-class decomposition technique

With the processing of multi-class imbalance problems, a strategy is to classify the data by decomposing the multi-class problem into two-class sub-problems using division rules, and then applying the two-class imbalance learning algorithm to these sub-problems.

The main decomposition methods currently are One Vs One (OVO) decomposition and One Vs All (OVA) decomposition. In the OVO decomposition, an m class problem is divided into m (m - 1)/ - 2 two-class sub-problems, where each problem is handled by independent base classifiers that are responsible for different pairs of classes. OVA creates a classifier for each class, which considers all other classes as a whole when classifying. Figure 7 (a) and (b) shows two ways to decompose the multi-class problem.

Fig. 7

Multi-class decomposition methods.

Zhang et al. [83] explore the application of OVO and OVA decomposition techniques on multi-class imbalance classification problem and combine them with two-class ensemble learning approach. First, the dataset is decomposed using OVO or OVA techniques. Then, a SMOTE-based ensemble learning approach is used to synthesize minority class instances to balance the distribution of the training set and create a two-class classifier for each paired-class. Finally, once the classifier for each class is obtained from the ensemble learning, an aggregation strategy is used to provide the final output from the score matrix. The experimental results show that the proposed OVO decomposition strategy combined with two-class ensemble learning obtains very competitive results. The use of composite learners for each pair of classes captures better local features of the classes, thus improving the classification accuracy. In addition, Rodríguez et al. [39] argued that the Random Balance method could be extended to multi-class problems by using OVO or OVA decomposition techniques, which proposed OVO-Random Balance (OVO-RandBal) and OVA-Random Balance (OVA-RandBal). In OVO-RandBal, all pairs of classes are formed and a classifier is built for each pair of classes. The ensemble consists of c (c - 1)/ - 2 classifiers, each classifier votes for the classes it has trained. The final classification result is obtained by majority voting at the end. OVA-RandBal creates c two-class classifiers, where each classifier is paired with all remaining classes. In the experimental results, the use of OVA in the background of random balance is more advantageous than the use of OVO, especially when the evaluation metric is MAUC.

To solve the synergistic problem between imbalance learning and dynamic classifier weighting in OVO, Zhang et al. [84] proposed Distance-based Relative Competence Weighting with Adaptive Synthetic Example Generation (DRCW-ASEG). The method decomposes the original multi-class imbalanced dataset according to the OVO strategy and then generates synthetic instances based on the neighborhood of a minority class instances in the dynamic weighting process stage to deal with the imbalanced distribution of classes, which improves the classifier’s capability. In addition, the method considers using the Heterogeneous Value Difference Metric (HVDM) to calculate the distance between two instances and uses it as the weight of the classifier. The results reflect a considerable improvement in classification performance using the DRCW-ASEG method and outperform previous methods on most of the datasets. The HVDM is shown in Equation (9). Where x and y are the input instances, f is the number of attributes, d_a (x, y) is the distance between x and y for attribute a. $d_{HVDM} (x, y) = \sqrt{\sum_{a = 1}^{f} d_{a}^{2} (x, y)}$ (9)

The OVA decomposes multiple imbalanced classes by treating one class as positive and the other classes as negative, which can lead to extremely imbalanced situations [85]. As a result, OVA may reduce the identification rate of all minority class instances to some extent. However, some researchers have still been able to obtain good classification results by applying OVA decomposition.

Differential Partition Sampling Ensemble (DPSE) [86] splits the multi-class dataset into multiple binary datasets by OVA, and the number of majority and minority samples in each binary sub-set are used as the upper and lower bounds of sampling, respectively. Based on this range, DPSE simulates the construction of arithmetic progression to generate a collection of sets with different numbers of samples and equal intervals. In addition, DPSE handles safe samples according to the ROS, while SMOTE handles edge and rare samples. Then, a binary classification model is trained using the balanced training set. Experimental results show that this method performs better than other typical imbalanced learning methods in the OVA scheme.

Dong et al. [87] proposed the One-Against-All-based Hellinger Distance (OAHD). To begin with, the OVA scheme is introduced in the Hellinger distance calculation process to decompose and balance the dataset. Next, it designs a modified Gini coefficient to handle the distribution and number of different classes simultaneously, thus ensuring the purity of decision tree nodes. The experimental results show that OAHD has significantly improved in accuracy, MAUC and other metrics compared with other decision trees.

The OVA decomposition-based approach can handle multi-class imbalanced data streams as well. To solve the uncertainty problem of learning in imbalanced data streams, Mohammed et al. [88] combines the OVA decomposition strategy with ensemble learning and proposes One-Vs-All Adaptive Window Re-Balancing with retain Knowledge (OVA-AWBReK). The method first quickly processes the received data streams and decomposes them using OVA. Then incremental rebalancing method is used to train the classifier, which adaptively passes the previously learned knowledge to the subsequent windows as increments. In addition, an adaptive window is designed to dynamically adjust the window size by the imbalance ratio, thus reducing the uncertainty in the learning process of imbalanced data streams. Experiments demonstrate that the method performs better on multi-class datasets with high imbalance ratio.

3.5 Chapter summary

This chapter introduces algorithm-level classification methods for multi-class imbalanced data in terms of four aspects: ensemble learning, neural network, SVM and multi-class decomposition techniques.

To further analyze the performance and efficiency of multi-class imbalance algorithm-level classification methods, algorithms using the same dataset are compared in this chapter. The parameters of the datasets are listed in Table 4. The 13 algorithms using these four datasets are listed in Table 5.

Table 4
Dataset parameters

Datasets Attribute Class Instance Class distribution

Ecoli 7 8 336 143, 77, 52, 35, 20, 5, 2, 2

Yeast 8 10 1484 244, 429, 463, 44, 51, 163, 35, 30, 20, 5

New-thyroid 5 3 215 150, 35, 30

Wine-Quality Red 11 6 1599 10, 53, 681, 638, 199, 18

Datasets	Attribute	Class	Instance	Class distribution
Ecoli	7	8	336	143, 77, 52, 35, 20, 5, 2, 2
Yeast	8	10	1484	244, 429, 463, 44, 51, 163, 35, 30, 20, 5
New-thyroid	5	3	215	150, 35, 30
Wine-Quality Red	11	6	1599	10, 53, 681, 638, 199, 18

Table 5

Algorithms using the same datasets

Algorithm	Method	Ecoli	Yeast	New- thyroid	Wine-Quality Red
E-MOSAIC	Hybid Ensemble	√		√
PT-Bagging	Hybid Ensemble	√	√	√
DESMI	Dynamic Selection	√		√	√
GPELM	ELM	√	√	√
WOS-ELMK	ELM	√	√
GCSKELM	ELM	√	√	√	√
PBG	ELM	√	√	√
FSVM	SVM	√
Multi-class SVM	SVM	√	√	√
MSVM-AdB	SVM	√	√	√
DRCW-SEG	OVO	√	√	√	√
OSC	OVO	√		√
DPSE	OVA	√	√	√	√

In the Ecoli dataset, E-MOSAIC has the best performance with the base classifier Multi-layer Perceptron, which achieves 96.1 and 82.22 in MAUC and G-mean, respectively. E-MOSAIC outperforms DRCW-SEG, OSC and ELM methods in general. That is due to E-MOSAIC being able to identify minority class effectively and ensure the diversity of classifiers during training. PT-Bagging was experimented on Ecoli, Yeast and New-thyroid. The algorithm was evaluated using AUC, macro-accuracy and macro-F1-score. PT-Bagging improves the overall performance of the classifier by preserving the natural distribution of classes. In the above four datasets, DESMI, DRCW-SEG and DPSE performed similarly and achieved good overall results on MAvA, as all of their base classifiers used CART decision trees. However, the time complexity of DRCW-SEG and DPSE is higher compared to DESMI because they require decomposing multi-class. Among the ELM-based classification methods, WOS-ELMK has the best performance and outperforms other ELM methods on G-mean. That is due to the fact that WOS-ELMK uses implicit kernel mapping, which improves the adaptability of the classifier to new instances. It is worth mentioning that all current methods are unable to effectively classify Yeast and Wine-Quality Red datasets with large number of instances and multiple minority classes. The G-mean value of these methods are under 65 on Yeast, and both G-mean and AveAcc values are under 43 on Wine-Quality Red.

By comparing and analyzing the above types of multi-class imbalance algorithm-level classification methods, the following conclusions can be drawn. In the hybrid ensemble, the performance of the ensemble classifier in classifying multi-class imbalanced data is improved by combining the data-level method with the ensemble learning. In addition, by applying dynamic selection in the ensemble learning, the best combination of classifiers can be selected during the training process, thereby improving the classification accuracy. Among the neural network-based methods, the ELM is easy to implement and the weights of hidden layer nodes can be randomly or artificially given, which has better generalization performance and faster learning speed. Meanwhile, ELM has the better results in dealing with extreme imbalance datasets and multi-class imbalance data streams. Deep learning classification methods are more suitable for dealing with complex data types such as images and text. SVM shows excellent performance in solving problems such as class overlap and noise in multi-class imbalance. In addition, SVM can effectively solve the overfitting in the training process, which improves the stability and performance of the learning algorithm. The multi-class decomposition technique divides the multi-class problem into a two-class problem, thus allowing the use of the current more advanced classification methods for the two-class imbalance problem. The existence of multi-class decomposition techniques simplifies the complexity of multi-class imbalance, which makes it more widely applicable. However, it leads to worse time performance. Table 6 summarizes and analyzes the multi-class imbalance algorithm-level classification methods introduced in this chapter.

Table 6

Multi-class imbalance algorithm-level classification methods

Algorithm	Technique	Method	Datasets	Compared algorithm	Advantages	Disadvantages
HECMI [48]	Boosting/Recall sampling	Hybrid Ensemble	Ecoli, Yeast, Glass, Balance, Thyroid	LR, LDA, k-NN, SVM	The constructed ensemble classifier has good generalization performance and can effectively handle and classify multi-class imbalanced data.	The performance is poor in datasets with noise and outlier.
SA-GABEC [49]	Genetic Algorithm	Hybrid Ensemble	Dermatology, Satimage	GAB-EPA, Adaboost, Bagging	The genetic algorithm is used to find the best subset, which makes the method perform well in terms of recall and extended G-mean.	The noise in the dataset is not addressed and comparisons with the latest algorithms are not made.
PT-bagging [51]	Threshold movement/Bagging	Hybrid Ensemble	Ecoli, Yeast, Glass, Vehicle, New-thyroid	RB-bagging, SMOTE-bagging RNB-bagging	The natural class distribution of the data is preserved and well-calibrated posterior probabilities are obtained, thus improving the accuracy of the classifier.	The algorithm does not make use of the distribution of classes in the sampling process.
PBD [52]	Data partition	Hybrid Ensemble	Balance, Wine, Dermatology, New-thyroid	SMOTEBagging, SplitBal, RP	The imbalance problem in regression analysis is solved by using data partition to deal with imbalanced data.	The class overlap issue in the dataset is not addressed.
E-MOSAIC [53]	Multiobjective genetic sampling	Hybrid Ensemble	Ecoli, Glass, Car, Thyroid, New-Thyroid	ROS, RUS, RFS	The multi-objective process searches for classifiers that can produce classifiers with high prediction accuracy.	The performance of the algorithm is poor when the number of instances of minority classes is too low.
EFSM [54]	Adaboost/Heterogeneous ensemble	Hybrid Ensemble	Ecoli, Yeast, Glass, Vehicle	Bagging, SVM, Ad-RF, k-NN	Heterogeneous ensemble models are constructed, which show good performance in outlier detection and classification.	There is only one evaluation metric, and the comparison algorithm is outdated.
NAHOEC [55]	k-NN/Heterogeneous ensemble	Hybrid Ensemble	Ecoli, Autos, Penbased, New-thyroid	ADASYN, SMOGN, Gaussian Noise	The k-NN oversampling method can effectively balance the dataset and construct a heterogeneous ensemble classifier with better performance.	There is no statistical test to prove the validity of the algorithm and the datasets used is less.
ISOE/IOE [56]	Online learning/SMOTE	Hybrid Ensemble	SEA, Forest Cover, Gas Sensor	SCUT-DS, MUOB, MOOB	The algorithm is able to produce accurate results on both static, evolved and labeled missing data streams.	Noise and class overlap issues are not treated, and only one evaluation metric is used.
DESMI [60]	SMOTE/RUS/ROS	Dynamic Selection	Ecoli, Yeast, Glass, Balance, New-thyroid	OVO-Easy, OVA-NBSVM, AdaBoost.NC	The dynamic selection stage selects a set of classifiers with strong classification ability to build the ensemble, which improves the classification accuracy.	The time complexity is high and it is only applicable to supervised learning.
GPELM [62]	G-mean/Weight matrix	ELM	Ecoli, Yeast, Wine, Thyroid, New-thyroid	SMOTE-ELM. RWOS-ELM, SVM-OTHR	The parameters of ELM are optimized to maintain the class distribution of the initial data and improve the classifier performance.	The algorithm performs poorly on extremely imbalanced datasets.
P-ELM [63]	One-class classification/Kernel	ELM	Ecoli, Yeast, Glass,Wine	ELM, W-ELM Kerne-based-ELM, FELM, PD-ELM	The class labels can be determined by comparing the output values of k one-class classifiers, and the algorithm has better time performance and classification accuracy.	The comparison is made only with the ELM-based algorithm.
GCSKELM [64]	Gaussian kernel function/Regularization parameters	ELM	Ecoli, Yeast, Glass, Balance, New-thyroid	KELM, KWELM, CCR-KELM, RUSBoost	The application of Gaussian kernel functions and regularization parameters reduces the overhead of the algorithm and has good generalization.	Failure to effectively classify extremely imbalanced datasets.
CCR-ELM [65]	Class-specific cost/Kernel	ELM	Ecoli, Yeast, New-thyroid, Glass	ELM, W-ELM	The algorithm is able to adapt to datasets with few class instances.	The algorithms are only compared with the ELM-based algorithm.
VWOS-ELM [66]	WOS-ELM/Weight matrix/Sequential Learning	ELM	Ecoli, Yeast, Glass, Balance, Page-Blocks	McELM	The algorithm solves the imbalance problem in multi-class data streams for the first time and is more adaptable to new instances.	The comparison algorithm of the algorithm is single and simple, and performs poorly in the case of extreme imbalance.
WOS-ELMK [68]	Implicit kernel mapping	ELM	Ecoli, Yeast, Glass, Balance, Page-Blocks	VWOS-ELM, KOS-ELM	The use of implicit kernel mapping to improve the adaptability of the classifier and the setting of fixed memory to handle large data streams ensures the stability of the classifier.	The evaluation metric is only one and does not reflect the generalizability of the algorithm.
GWOS-ELM [69]	Least square/Gaming Strategy	ELM	Balance, Vehicle, Abalone, Statlog, Waveform	VWOS-ELM, WOS-ELM, SMOTE-SOELM	The fitting error is reduced and the generalization of the classifier is improved.	The algorithm fails to effectively classify datasets with a high number of instances.
PBG [70]	PostBoosting/G-mean	ELM	Ecoli, Yeast, New-thyroid, Thyroid	VWOS-ELM, OSW₀ELM	The algorithm constructs multiple independent WOS-ELM networks with stronger robustness and effectively solves the multi-class imbalance problem in data streams.	The performance of the algorithm degrades when the number of attributes in the dataset are excessive.
SEL [71]	Sequential Learning/Subset Segmentation	ELM	Page-Blocks, Winequality-red, Car, Thyroid	RUSBoost, SCUT	The algorithm splits majority class samples for improving the performance of weak learners, which can handle highly imbalanced data streams with good time performance.	There are few comparison algorithms, which cannot reflect the advantages of the algorithm.
AdaBoost -CNN [72]	CNN/Transfer learning	Deep learning	CIFAR-10, Fashion-MNIST HAR	CNN, AdaBoost-Decision-Tree	The training time is effectively reduced by transfer learning, and high classification accuracy is achieved on many types of datasets.	There is only one evaluation metric, which cannot reflect the performance of the algorithm.
REFDL [73]	Regularization/ Hierarchical sampling	Deep learning	CE, Letter, Statlog, Pen Banana	Ada, SMT SAMME, RUS	The algorithm effectively reduces the computational cost, and the regularization method can effectively improve the capability of the classifier.	The use of random under-sampling leads to loss of information.
EECNN [74]	Random feature selection/ROS	Deep learning	AVRIS, ROSIS, Salinas	CNN-ERFS, RF-ERFS, ECNN	The algorithm selects the optimal subset of features after oversampling which can train a classifier with better classification ability.	The experimental datasets are few and no statistical tests are performed.
DBLP [75]	LSTM/BERT	Deep learning	HUFF, COVID-Q	AdaBoost. RandomForest, BERT+CNN, BERT-LSTM-CNN	The algorithm is able to train a more accurate model in a short period of time.	The performance is poor on complex real datasets.
FSVM [78]	Fuzzy affiliation function	SVM	Ecoli, Glass, Balance, User	SVM, FSVM, FSVM-CLL, DEC	The class overlap is used to assign high affiliation values to the important support vector, thus effectively solving the class imbalance and noise.	Experiments are not carried out on large and high imbalance datasets.
Multi-class SVM [79]	Hierarchical classification	SVM	Ecoli, Yeast, Glass, Balance, New-thyroid	SVM, SVM-Weight	The classification performance of the hierarchical model used by the algorithm is not affected by the number of features.	The proposed hierarchical model consumed more time.
SMOTE-LSSVM [80]	Intelligent optimization algorithm/SMOTE	SVM	Breast cancer	–	The parameters of the classifier are optimized based on particle swarm optimization and gravitational search to improve the performance of the classifier.	No comparison is made with other algorithms, and experiments are conducted on only one dataset.
OSVM-US [81]	One-class classification/Tomek-link	SVM	Thyroid, Heart, Breast cancer, Diabetes	NCL, NCL+BNF, SMOTE+TL, SMOTE+BNF	The classification accuracy is improved by eliminating boundary, redundancy and overlap cases based on sparse neighborhoods.	The performance of the classifier decreases as the class overlap rate in the dataset increases.
MSVM-AdB [82]	AdaBoost/Euclidean distance	SVM	Ecoli, Yeast, Glass, Vehicle New-thyroid	SVM, MSVM, SVM_AdB	The algorithm divides the class overlap region by the Euclidean distance and maps it to high dimension, which can effectively deal with class overlap.	Only compared with the SVM-based algorithm and has not been experimented on large datasets.
OVO-RandBal/OVA-RandBal [39]	OVO/OVA/RandBal	Multi-class decom-positon	Ecoli, Yeast, Glass, Balance, New-thyroid	OVO-SMOTE Bagging	The algorithm has good generalization performance, which can combine with many existing ensemble methods and base classifiers with high average accuracy.	The stochastic nature of the algorithm may cause instability problems in some cases.
DRCW-SEG	OVA/Dynamic Weighting/SMOTE	Multi-class decom-positon	Ecoli, Yeast, Glass, Balance, New-thyroid	SMOTEBagging, SMOTEBoost, NBSVM	The use of a dynamic weighting approach to redundant classifiers improves the capability of the classifier.	The algorithm is only compared with the classical imbalance algorithm.
DPSE [86]	OVA/Differential Partition/SMOTE/RUS	Multi-class decom-positon	Ecoli, Yeast, Glass, Balance, New-thyroid	SMOTE, DESMI k-means-SMOTE, Bagging-RB	The diversity of the classifier is improved based on the distribution characteristics and differential partition.	The algorithm does not deal with noise and class overlap and the performance degrades in case of extreme imbalance.
OAHD [87]	OVA/Gini coefficient	Multi-class decom-positon	Wine, Glass, Page-blocks, Liver	CART, C4.5, DCSM, Ihd, iHDw	The dataset is balanced by Hellinger distance in the OVA process and the purity of the decision tree nodes is improved by the modified Gini coefficient.	The algorithm is only compared with the decision tree algorithm.
OVA-AWBReK [88]	OVA/Adaptive window	Multi-class decom-positon	Yeast, Forest CovType	–	The algorithm adaptively passes the previously learned knowledge to the subsequent window, which can reduce the uncertainty in the learning process more quickly.	The algorithm is only experimented on two datasets and is not compared with other algorithms.

4 Multi-class imbalanced data classification method evaluation metrics

On the issue of classification of multi-class imbalanced data, the traditional evaluation metrics based on two-class imbalanced data are still used by many researchers as criteria for algorithm performance evaluation. The AUC value, Accuracy, Recall and F-measure are the commonly used evaluation metrics. F-measure combines the results of Precision and Recall. To further illustrate the metrics, the two-class confusion matrix is mainly used to represent the above metrics. The two-class confusion matrix is shown in Table 7, and the formulae for the above assessment metrics are shown in Equations (10, 11).

Table 7
Confusion matrix of binary class

Positive class Negitive class

Actual positive class True Positive (TP) False Negative (FN)

Actual negative class False Posivite (FP) True Negative (TN)

Positive class	Negitive class
Actual positive class	True Positive (TP)	False Negative (FN)
Actual negative class	False Posivite (FP)	True Negative (TN)

F-Measure (also known as F1-score) is the harmonic mean of Precision and Recall. In some cases, Precision and Recall can be contradictory, and the values of both cannot be high. Therefore, it is necessary to apply the F-Measure evaluation metric, which takes into account the values of Precision and Recall. $Recall = \frac{TP}{TP + FN}, Precision = \frac{TP}{TP + FP}$ (10) $F - measure = \frac{2 \times Recall \times Precision}{Recall + Precision}$ (11)

However, the traditional evaluation metrics for two-class imbalanced data cannot reflect the performance of multi-class classification algorithms well. Therefore, some researchers have improved the evaluation metrics of the two-class problem and extended them to the evaluation metrics of the multi-class problem. Meanwhile, the two-class confusion matrix also needs to be extended into a multi-class confusion matrix, as shown in Table 8.

Table 8

Confusion matrix of multi-class

		Predicted
		A	B	C	D	E
Actual	A	TP _A	F _AB	F _AC	F _AD	F _AE
	B	F _BA	TP _B	F _BC	F _BD	F _BE
	C	F _CA	F _CB	TP _C	F _CD	F _CE
	D	F _DA	F _DB	F _DC	TP _D	F _DE
	E	F _EA	F _EB	F _EC	F _ED	TP _E

G-mean is a combination of TP and FP, which is a common algorithm performance evaluation metric in the field of imbalance. Sun et al. [89] extended G-mean by proposing a multi-class G-mean based on the recall values of each class, so that the performance of multi-class imbalance classification methods can be effectively measured, as shown in Equations (12, 13), where $\sum_{j = 1}^{n - 1} {FN}_{c_{ij}}$ is the sum of the FN values of other classes misclassified as class c_i. $Recall = \frac{{TP}_{c_{i}}}{{TP}_{c_{i}} + \sum_{j = 1}^{n - 1} {FN}_{c_{ij}}}$ (12) $G - mean = {(\prod_{i = 1}^{n} {Rccall}_{c_{i}})}^{1 / - n}$ (13)

The F-measure can be modified to Multi-class F-measure (MFM) [90] by calculating the F-measure value for each class, as shown in Equation (14), where m is the number of classes and i is the subscript of the positive class. $MFM = \frac{\sum_{i = 1}^{n} F - {measure}_{i}}{n}$ (14)

Accuracy is a common metric used in algorithm performance evaluation. In a multi-class setting, it needs to be extended. Marco Average Arithmetic (MAvA) [90] (also known as AveAcc) can be derived by computing Accuracy for each class, as shown in Equations (15, 16), where m is the number of instances, n is the number of classes, f (i, j) denotes the actual probability that instance i belongs to class j, and C (i, j) denotes the predicted probability that instance i belongs to class j. $Accuracy = \frac{\sum_{i = 1}^{m} \sum_{j = 1}^{n} f (i, j) C (i, j)}{m}$ (15) $MAvA = \frac{1}{n} \sum_{i = 1}^{n} {Accuracy}_{c_{i}}$ (16)

The Macro F1-score is an evaluation metric that regards each class equally. It considers each class individually as a positive class and then averages its corresponding F1-score, as shown in Equations (17–19). ${Precision}_{macro} = \frac{\sum_{i = 1}^{n} {TP}_{c_{i}}}{\sum_{i = 1}^{n} {TP}_{c_{i}} + \sum_{i = 1}^{n} {FP}_{c_{i}}}$ (17) ${Reacll}_{macro} = \frac{\sum_{i = 1}^{n} {TP}_{c_{i}}}{\sum_{i = 1}^{n} {TP}_{c_{i}} + \sum_{i = 1}^{n} {FN}_{c_{i}}}$ (18) $macro F 1 - score = \frac{2 \times {Recall}_{macro} \times {Precision}_{macro}}{{Recall}_{macro} + {Precision}_{macro}}$ (19)

MAUC [91] is an effective measure of the performance of multi-class algorithms, as shown in Equation (20). For any pair of classes i and j, define $\hat{A} (i, j)$ as the estimated probability that a randomly selected member of class i belongs to class j, which is less than the probability of a randomly selected member of class j. $MAUC = \frac{2}{c (c - 1)} \sum_{i < j} \hat{A} (i, j)$ (20)

Table 9 summarizes the commonly used evaluation metrics for different types of multi-class imbalance data classification methods.

Table 9

Evaluation metrics for multi-class imbalanced data classification methods

Algorithms Type	Method	Algorithms	Evaluation metrics for multi-class imbalanced data classification methods
			AUC	ACC	Re	FM	GM	MFM	MAv	MF1	MA
Data preprocessing	Oversampling	SMOM [12]			√		√				√
methods		SSCMIO [13]			√	√	√				√
		HDSMOTE [14]			√	√	√				√
		SOMM [16]					√
		SMOTE&Z-SCORE [17]		√	√	√				√
		OSC [18]					√	√
		ADASYN-N/ADASYN-KNN [20]			√	√
		STCPS [22]	√			√	√
		LICIC [23]							√
		COSTE [25]	√						√
		MC-RBO [26]					√		√
	Under-sampling	BLO [28]					√		√
		CUS [29]	√
		OCSV-US [30]					√		√
	Hybrid sampling	SCUT [31]	√			√	√
		FCMSMT [32]	√			√
		CIAR [33]	√			√	√
		MOSHS [35]			√	√
		SOUP [37]					√
		HAR-MI [40]						√	√
	Feature selection	EFIS-MOEA [42]									√
		PMC+[43]	√
		RBBag [45]					√
		RSFAID-M [46]						√
		FRSA [47]	√	√
Algorithm-level
classification	Hybrid ensemble	HECMI [48]			√
methods		SA-GABC [49]			√		√
		PT-Bagging [51]	√							√
		PBD [52]			√	√
		E-MOSAIC [53]					√				√
		EFSM [54]				√
		NAHOEC [55]	√				√
		ISOE/IOE [56]					√
Dynamic selection		DESMI [60]						√	√
ELM		GPELM [62]				√
		P-ELM [63]		√
		GCSKELM [64]					√		√		√
		CCR-ELM [65]		√			√
		VWOS-ELM [66]					√
		WOS-ELMK [68]					√
		GWOS-ELM [69]				√	√
		PBG [70]					√		√
		SEL [71]					√			√
Deep learning		AdaBoost-CNN [72]		√
		REFDL [73]		√
		EECNN [74]			√			√	√
		DBLP [75]			√	√
SVM		FSVM [78]					√		√
		Multi-class SVM [79]		√		√	√
		SMOTE-LSSVM [80]		√
		OSVM-US [81]	√		√				√
		MSVM-AdB [82]		√			√
Decomposition technique		OVO-RandBal/OVA-RandBal [39]		√		√	√		√	√
		DRCW-SEG [84]							√
		DPSE [86]							√
		OAHD [87]				√				√
		OVA-AWBReK [88]	√			√

Remarks: ACC: Accuracy; Re: Recall; FM: F-measure; GM: G-mean; MAv: MAvA; MF1: MF1-score; MA: MAUC.

5 Research directions and prospects

The algorithms and models proposed for the multi-class imbalance classification problem have made considerable progress and development, but there are still many problems that need to be solved and further research and optimization of the existing methods are needed. The following discusses the current problems and future research directions for multi-class imbalance classification.

(1) Handling multi-class imbalanced data streams with dynamic ensemble selection

Multi-class imbalance learning on data streams is a direction that has received few studies, and most of the available data stream classification algorithms have been developed for the two-class imbalance problem. There have been some researchers who have proposed algorithms on multi-class imbalanced data streams and achieved good results. However, the problem of uncertainty in multi-class imbalanced data streams has not yet been addressed. For example, the majority class may become the minority class after a period of time, while the minority class may become the majority class, and new classes may arrive as time changes. To address these issues, the author plan to combine the sliding window and dynamic selection ensemble in future studies, which processes the data in batches while evaluating and removing weak classifiers from the ensemble and retaining the more capable ones.

(2) Tackling concept drift in multi-class imbalanced data streams

The concept drift problem is common in various application scenarios at present, and it also exists in multi-class imbalanced data streams. In the environment of data streams, concept drift detection becomes very demanding. Because it is necessary to deal with multiple classes of concept changes and imbalanced class distributions simultaneously, which cannot be solved by traditional drift detection techniques. In subsequent research, the capability of the classifier can be improved by designing a drift detection mechanism that combines the ratio of multi-class imbalance with ensemble learning.

(3) Coping with complex multi-class imbalanced datasets

The data generated in real-world applications are bulky and complex. Apart from the problems of biased class distribution, class overlap, multiple majority and multiple minority classes, there are also extreme imbalances, noise and conceptual drift in multi-class imbalanced data. However, most of the existing methods are devoted to solving only one or two of these problems. More efficient and comprehensive methods for classification in this complicated environment should be investigated.

6 Conclusion

This article presents a review of existing data preprocessing methods and algorithm-level classification methods based on multi-class imbalanced data. Firstly, oversampling, under-sampling, hybrid sampling and feature selection are introduced in the data preprocessing methods. Secondly, a detailed introduction and summary of the algorithm-level classification methods are presented in four aspects: ensemble learning, neural network, SVM and multi-class decomposition techniques. Moreover, the performance of part of the algorithms using the same dataset is compared and analyzed, and the pros and cons and performance of all the algorithms are analyzed and summarized. Finally, the next research directions and solutions are proposed for the challenges and problems faced by the current multi-class imbalanced data classification.

Footnotes

Acknowledgments

This work was supported by the National Nature Science Foundation of China (62062004), the Ningxia Natural Science Foundation Project (2022AAC03279) and the Graduate Innovation Project of North Minzu University (YCX22191).

References

Chaïri

, Alaoui

, Lyhyaoui

Lyhyaoui, Intrusion detection based sample selection for imbalanced data distribution [C], Proc of the Second International Conference on the Innovative Computing Technology, Casablanca, Morocco: IEEE, 2012, pp. 259–264.

Peng

, Xuegang

, Peipei

et al., Online feature selection for high-dimensional class-imbalanced data [J], Knowledge-Based Systems 136 (2017), 187–199.

Khalilia

, Chakraborty

and Popescu

, Predicting disease risks from highly imbalanced data using random forest [J], BMC Medical Informatics and Decision Making 11(1) (2011), 1–13.

Shuo

and Xin

, Multiclass imbalance problems: Analysis and potential solutions [J], IEEE Trans on Systems, Man, and Cybernetics, Part B (Cybernetics) 42(4) (2012), 1119–1130.

Tan

A.C.

, Gilbert

and Deville

, Multi-class protein fold classification using a new ensemble machine learning approach [J], Genome Informatics 14 (2003), 206–217.

Yanmin

, Kamel

M.S.

, Yang

Boosting for learning multiple classes with imbalanced class distribution [C], Proc of the 6th International Conference on Data Mining, Hong Kong, China: IEEE, 2006, pp. 592–602.

Sahare

and Gupta

, A review of multi-class classification for imbalanced data [J], International Journal of Advanced Computer Research 2(3) (2012), 160.

Tanha

, Abdi

, Samadi

et al., Boosting methods for multi-class imbalanced data classification: An experimental review [J], Journal of Big Data 7(1) (2020), 1–47.

Sridhar

, Kalaivani

Advances in Smart System Technologies [M], Singapore: Springer, 2021, pp. 775–790.

10.

Mengmeng

, Yi

, Gengsong

et al., Survey on imbalanced multi-class classification algorithms [J/OL], Journal of Computer Applications (2022), 1–17.

11.

Abdi

and Hashemi

, To combat multi-class imbalanced problems by means of over-sampling techniques [J], IEEE Trans on Knowledge and Data Engineering 28(1) (2015), 238–251.

12.

Tuanfei

, Yaping

and Yonghe

, Synthetic minority oversampling technique for multiclass imbalance problems [J], Pattern Recognition: The Journal of the Pattern Recognition Society 72 (2017), 327–340.

13.

Minggang

, Ming

and Chao

, Sampling safety coefficient for multi- class imbalance oversampling algorithm [J], Journal of Frontiers of Computer Science and Technology 14(10) (2020), 1776–1786.

14.

Minggang

, Zhenlong

and Chao

, Multi-class imbalanced learning algorithm based on Hellinger Distance and SMOTE algorithm [J], Computer Science 47(1) (2020), 102–109.

15.

Cieslak

D.A.

, Hoens

T.R.

, Chawla

N.V.

et al., Hellinger distance decision trees are robust and skew-insensitive [J], Data Mining and Knowledge Discovery 24(1) (2012), 136–158.

16.

Khorshidi

H.A.

, Aickelin

ASyntheticOver-sampling method with Minority and Majority classes for imbalance problems [J], arXiv preprint arXiv: 2011.04170 (2020).

17.

Sridhar

, Kalaivani

A two tier iterative ensemble method to tackle imbalance in multiclass classification [C], Proc of International Conference on Decision Aid Sciences and Application, Sakheer, Bahrain: IEEE, 2020, pp. 1248–1254.

18.

Qianmu

, Yanjun

, Jing

et al., Multiclass imbalanced learning with one-versus-one decomposition and spectral clustering [J], Expert Systems with Applications 147 (2020), 113152.

19.

Haibo

, Yang

, Garcia

E.A.

et al., ADASYN: Adaptive synthetic sampling approach for imbalanced learning [C], Proc of IEEE International Joint Conference on Neural Networks, Sendai, Japan: IEEE, 2008, pp. 1322–1328.

20.

Kurniawati

Y.E.

, Permanasari

A.E.

, Fauziati

Adaptive synthetic-nominal (adasyn-n) and adaptive synthetic-knn (adasyn-knn) for multiclass imbalance learning on laboratory test data [C], Proc of the 4th International Conference on Science and Technology, Yogyakarta, Indonesia: IEEE, 2018, pp. 1–6.

21.

Rahayu

, Putra

J.A.

, Yumarlin

M.Z.

Effect of giving N value on ADA N method for classification of imbalanced nominal data [C], Proc of the 4th International Conference on Information Technology, Information Systems and Electrical Engineering, Yogyakarta, Indonesia: IEEE, 2019, pp. 290–294.

22.

Mingyang

, Yingshi

, Chang

et al., An oversampling method for multi-class imbalanced data based on composite weights [J], Plos One 16(11) (2021), e0259227.

23.

Dentamaro

, Impedovo

and Pirlo

, LICIC: Less important components for imbalanced multiclass classification [J], Information 9(12) (2018), 317.

24.

Shuo

, Keung

, Xiao

et al., COSTE: Complexity-based oversampling technique to alleviate the class imbalance problem in software defect prediction [J], Information and Software Technology 129 (2021), 106432.

25.

Lestari

, Rahmadsyah

, Lubis

R.M.F.

HAR-MI with COSTE in handling multi-class imbalance [C], Proc of the 8th International Conference on Cyber and IT Service Management, Pangkal, Indonesia: IEEE, 2020, pp. 1–4.

26.

Krawczyk

, Koziarski

and Wozniak

, Radial-based oversampling for multiclass imbalanced data classification [J], IEEE Trans on Neural Networks and Learning Systems 31(8) (2020), 2818–2831.

27.

Yap

B.W.

, Rani

K.A.

, Rahman

H.A.A.

et al., An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets [C], Proc of the First International Conference on Advanced Data and Information Engineering, Kuala Lumpur, Malaysia: Springer, 2014, pp. 13–22.

28.

Yuanyuan

and Liyong

, Imbalanced fuzzy multiclass support vector machine algorithm based on class-overlap degree undersampling [J], Journal of University of Chinese Academy of Sciences 35(4) (2018), 536–543.

29.

Arafat

M.Y.

, Hoque

, Farid

D.M.

Cluster-based undersampling with random forest for multi-class imbalanced classification [C], Proc of the 11th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Pangkal, Indonesia: IEEE, 2017, pp. 1–6.

30.

Krawczyk

, Bellinger

, Corizzo

et al., Undersampling with support vectors for multi-class imbalanced data classification [C], Proc of International Joint Conference on Neural Networks (IJCNN) IEEE, 2021, pp. 1–7.

31.

Agrawal

, Viktor

H.L.

and Paquet

, SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling [C], Proc of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3k) IEEE 1 (2015), 226–234.

32.

Pruengkarn

Enhancing classification performance by handling noise and imbalanced data with fuzzy classification techniques [D]. Perth, Australia: Murdoch University, 2018.

33.

Mahadevan

and Arock

, A class imbalance-aware review rating prediction using hybrid sampling and ensemble learning [J], Multimedia Tools and Applications 80(5) (2021), 6911–6938.

34.

Van Hulse

, Khoshgoftaar

T.M.

, Napolitano

An empirical comparison of repetitive undersampling techniques [C], Proc of IEEE International Conference on Information Reuse & Integration, Las Vegas, USA: IEEE, 2009, pp. 29–34.

35.

Hartono

and Ongko

, Combining hybrid approach redefinition-multiclass imbalance (HAR-MI) and hybrid sampling in handling multi-class imbalance and overlapping [J], JOIV: International Journal on Informatics Visualization 5(1) (2021), 22–26.

36.

Zhaozhao

, Derong

, Nie

et al., A hybrid sampling algorithm combining M-SMOTE and ENN based on random forest for medical imbalanced data [J], Journal of Biomedical Informatics 107 (2020), 103465.

37.

Janicka

, Lango

and Stefanowski

, Using information on class interrelations to improve classification of multiclass imbalanced data: A new resampling algorithm [J], International Journal of Applied Mathematics and Computer Science 29(4) (2019).

38.

Díez-Pastor

J.F.

, Rodríguez

J.J.

, Garcia-Osorio

et al., Random balance: Ensembles of variable priors classifiers for imbalanced data [J], Knowledge-Based Systems 85 (2015), 96–111.

39.

Rodríguez

J.J.

, Diez-Pastor

J.F.

, Arnaiz-Gonzalez

et al., Random balance ensembles for multiclass imbalance learning [J], Knowledge-Based Systems 193 (2020), 105434.

40.

Hartono

, Risyani

, Ongko

et al., HAR-MI method for multi-class imbalanced datasets [J], Telecommunication Computing Electronics and Control 18(2) (2020), 822–829.

41.

Tang

, Alelyani

and Liu

, Feature selection for classification: A review [J], Data Classification: Algorithms and Applications 37 (2014).

42.

Fernández

, Carmona

C.J.

, Jose del Jesus

et al., Apareto-based ensemble with feature and instance selection forlearning from multi-class imbalanced datasets [J], International Journal of Neural Systems 27(06) (2017), 1750028.

43.

Sreeja

N.K.

, A weighted pattern matching approach for classification of imbalanced data with a fireworks-based algorithm for feature selection [J], Connection Science 31(2) (2019), 143–168.

44.

Pawlak

and Skowron

, Rough sets and boolean reasoning [J], Information Sciences 177(1) (2007), 41–73.

45.

Lango

and Stefanowski

, Multi-class and feature selection extensions of roughly balanced bagging for imbalanced data [J], Journal of Intelligent Information Systems 50(1) (2018), 97–127.

46.

Hongmei

, Tianrui

, Xin

et al., Feature selection for imbalanced data based on neighborhood rough sets [J], Information Sciences 483 (2019), 1–20.

47.

Sun

, Zhang

, Ding

et al., Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted K-nearest neighbors [J], Information Sciences 593 (2022), 591–613.

48.

Bhowmick

, Shah

U.B.

, Shah

M.Y.

et al., HECMI:Hybrid Ensemble Technique for Classification of Multiclass Imbalanced Data [M], Information Systems Design and Intelligent Applications, Springer, Singapore, 2019, pp. 109–118.

49.

Purwar

and Singh

S.K.

, A novel ensemble classifier by combining sampling and genetic algorithm to combat multiclass imbalanced problems [J], International Journal of Data Analysis Techniques and Strategies 12(1) (2020), 30–42.

50.

Sainin

M.S.

, Alfred

, Adnan

et al., Combining sampling and ensemble classifier for multiclass imbalance data learning [C], Proc of International Conference on Computational Science and Technology, Singapore: Springer, 2017, pp. 262–272.

51.

Collell

, Prelec

and Patil

K.R.

, A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data [J], Neurocomputing 275 (2018), 330–340.

52.

Alam

, Ahmed

C.F.

, Zahin

S.A.

et al., An effective ensemble method for multi-class classification and regression for imbalanced data [C], Proc of Industrial Conference on Data Mining, New York, USA: Springer, 2018, pp. 59–74.

53.

Fernandes

E.R.Q.

, Carvalho

A.C.

and Xin.

, Ensemble of classifiers based on multiobjective genetic sampling for imbalanced data [J], IEEE Trans on Knowledge and Data Engineering 32(6) (2019), 1104–1115.

54.

Ndirangu

, Mwangi

and Nderu

, A Hybrid ensemble method for multiclass classification and outlier detection [J], International Journal of Sciences: Basic and Applied Research 45(1) (2019), 192–213.

55.

Arumugam

Handling Class Imbalance in Multiclass Datasets by using a Neighborhood based Adaptive Heterogeneous Oversampling Ensemble Classifier [C], Proc of 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS) IEEE, 2022, pp. 1498–1501.

56.

Vafaie

, Viktor

, Michalowski

Multi-class imbalanced semi-supervised learning from streams through online ensembles [C], Proc of International Conference on Data Mining Workshops, Sorrento, Italy: IEEE, 2020, pp. 867–874.

57.

Roy

, Cruz

R.M.O.

, Sabourin

et al., A study on combining dynamic selection and data preprocessing for imbalance learning [J], Neurocomputing 286 (2018), 179–192.

58.

Cruz

R.M.O.

, Souza

M.A.

, Sabourin

et al., Dynamic ensemble selection and data preprocessing for multi-class imbalance learning [J], International Journal of Pattern Recognition and Artificial Intelligence 33(11) (2019), 1940009.

59.

Dongxue

, Xin

, Yashuang

et al., Experimental study and comparison of imbalance ensemble classifiers with dynamic selection strategy [J], Entropy 23(7) (2021), 822.

60.

García

, ZhongLiang

, Altalhi

et al., Dynamic ensemble selection for multi-class imbalanced datasets [J], Information Sciences 445 (2018), 22–37.

61.

Guangbin

, Qinyu

, Siew

C.K.

Extreme learning machine:Anewlearning scheme of feedforward neural networks [C], Proc of IEEE International Joint Conference on Neural Networks, Budapest, Hungary: IEEE, 2, 2004, pp. 985–990.

62.

J.H.

, Guanzhong

, Yong

et al., Extreme learning machine with hybrid cost function of G-mean and probability for imbalance learning [J], International Journal of Machine Learning and Cybernetics 11(9) (2020), 2007–2020.

63.

Yanjiao

, Sen

, Yixin

et al., Parallel one-class extreme learning machine for imbalance learning based on Bayesian approach [J], Journal of Ambient Intelligence and Humanized Computing (2018), 1–18.

64.

Raghuwanshi

B.S.

and Shukla

, Generalized class-specific kernelized extreme learning machine for multiclass imbalanced learning [J], Expert Systems with Applications 121 (2019), 244–255.

65.

Wendong

, Jie

, Yanjiao

et al., Class-specific cost regulation extreme learning machine for imbalanced classification [J], Neurocomputing 261 (2017), 70–82.

66.

Mirza

, Zhiping

, Jiuwen

et al., Voting based weighted online sequential extreme learning machine for imbalance multi-class classification [C], Proc of IEEE International Symposium on Circuits and Systems, Lisbon, Portugal: IEEE, 2015, pp. 565–568.

67.

Mirza

, Zhiping

and Toh

K.A.

, Weighted online sequential extreme learning machine for class imbalance learning [J], Neural Processing Letters 38(3) (2013), 465–486.

68.

Shuya

, Mirza

, Zhiping

et al., Kernel based online learning for imbalance multiclass classification [J], Neurocomputing 277 (2018), 139–148.

69.

Haiyang

, Chunyi

and Huaming

, Two-stage game strategy formulticlass imbalanced data online prediction [J], NeuralProcessing Letters 52(3) (2020), 2493–2512.

70.

Vong

C.M.

, Jie

, Wong

C.M.

et al., Postboosting using extended G-mean for online sequential multiclass imbalance learning [J], IEEE Trans on Neural Networks and Learning Systems 29(12) (2018), 6163–6177.

71.

Vong

C.M.

and Jie

, Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data [J], Neural Networks 128 (2020), 268–278.

72.

Taherkhani

, Cosma

and McGinnity

T.M.

, AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning [J], Neurocomputing 404 (2020), 351–366.

73.

Yuan

, Xie

and Abouelenien

, A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data [J], Pattern Recognition 77 (2018), 160–172.

74.

, Feng

, Quan

et al., Enhanced-random-feature-subspace-based ensemble CNN for the imbalanced hyperspectral image classification [J], IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14 (2021), 3988–3999.

75.

Tong

, Wang

, Rui

A multi-model-based deep earning framework for short text multiclass classification with the imbalanced and extremely small data set [J], arXiv preprint arXiv:2206.12027 (2022).

76.

Cortes

and Vapnik

, Support-vector networks [J], Machine Learning 20(3) (1995), 273–297.

77.

Sim

D.Y.Y.

Computational Science and Technology [M]. Singapore: Springer, 2020, pp. 157–167.

78.

Yuanyuan

, Liyong

, Sanguo

Fuzzy multiclass support vector machines for unbalanced data [C], Proc of the 29th Chinese Control And Decision Conference, Chongqing, China: IEEE, 2017, pp. 2227–2231.

79.

Abdalazie

H.S.

and Saeed

F.A.

, New Hierarchical model for multiclass imbalanced classification [J], Journal of Theoretical & Applied Information Technology 95(16) (2017).

80.

Purnami

S.W.

, Trapsilasiwi

R.K.

SMOTE-least square support vector machine for classification of multiclass imbalanced data [C], Proc of the 9th International Conference on Machine Learning and Computing, Singapore, 2017, pp. 107–111.

81.

Devi

, Biswas

S.K.

and Purkayastha

, Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique [J], Connection Science 31(2) (2019), 105–142.

82.

Mehmood

and Asghar

, Customizing SVM as a base learner with AdaBoost ensemble to learn from multi-class problems: A hybrid approach AdaBoost-MSVM [J], Knowledge-Based Systems 217 (2021), 1068.

83.

Zhongliang

, Krawczyk

, Garcia

et al., Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data [J], Knowledge-Based Systems 106 (2016), 251–263.

84.

Zhang

Z.L.

, Luo

X.G.

, González

et al., DRCW-ASEG:One-versus-one distance-based relative competence weighting withadaptive synthetic example generation for multi-class imbalanceddatasets [J], Neurocomputing 285 (2018), 176–187.

85.

Raziff

A.R.A.

, Sulaiman

M.N.

, Mustapha

et al., Single classifier, OvO, OvA and RCC multiclass classification method in handheld based smartphone gait identification [C], Proc of AIP Conference AIP Publishing LLC 1891(1) (2017), 020009.

86.

Xin

, Yang

, Mi

et al., A multiclass classification using one-versus-all approach with the differential partition sampling ensemble [J], Engineering Applications of Artificial Intelligence 97 (2021), 104034.

87.

Dong

, Liu

and Jing

, One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning [J], Frontiers of Information Technology & Electronic Engineering 23(2) (2022), 278–290.

88.

Mohammed

R.A.

, Wong

K.W.

, Shiratuddin

M.F.

et al., Classification of multi-class imbalanced data streams using a dynamic data-balancing technique [C], Proc of International Conference on Neural Information Processing, Bangkok, Thailand: Springer, 2020, pp. 279–290.

89.

Sun

, Kamel

M.S.

, Wang

Boosting for learning multiple classes with imbalanced class distribution [C], Proc of the sixth international conference on data mining (ICDM’06) IEEE, 2006, pp. 592–602.

90.

Sánchez-Crisostomo

J.P.

, Alejo

, López-González

Empirical analysis of assessments metrics for multiclass imbalance learning on the back-propagation context [C], Proc of International Conference in Swarm Intelligence, Springer, Cham, 2014, pp. 17–23.

91.

Hand

D.J.

and Till

R.J.

, A simple generalisation of the area under the ROC curve for multiple class classification problems [J], Machine Learning 45(2) (2001), 171–186.

A survey of multi-class imbalanced data classification methods

Abstract

Keywords

1 Introduction

2.1 Data preprocessing method based on oversampling

2.2 Data preprocessing method based on under-sampling

2.3 Data preprocessing method based on hybrid sampling

2.4 Data preprocessing method based on feature selection

Table 1 Dataset parameters Dataset Attribute Class Instance Class distribution Ecoli 7 8 336 143, 77, 52, 35 20, 5, 2, 2 Yeast 8 10 1484 244, 429, 463, 44, 51, 163, 35, 30, 20, 5 Vehicle 18 4 846 199, 212, 217, 218 Wine-Quality 12 7 6497 30, 216, 2138, 2836, 1079, 193, 5

3.1 Algorithm-level classification method based on ensemble learning

3.1.1 Ensemble learning based on hybrid strategy

3.2.1 Extreme learning machine classification method

Table 4 Dataset parameters Datasets Attribute Class Instance Class distribution Ecoli 7 8 336 143, 77, 52, 35, 20, 5, 2, 2 Yeast 8 10 1484 244, 429, 463, 44, 51, 163, 35, 30, 20, 5 New-thyroid 5 3 215 150, 35, 30 Wine-Quality Red 11 6 1599 10, 53, 681, 638, 199, 18

Table 7 Confusion matrix of binary class Positive class Negitive class Actual positive class True Positive (TP) False Negative (FN) Actual negative class False Posivite (FP) True Negative (TN)

6 Conclusion

Footnotes

Acknowledgments

References

Table 1
Dataset parameters

Dataset Attribute Class Instance Class distribution

Ecoli 7 8 336 143, 77, 52, 35 20, 5, 2, 2

Yeast 8 10 1484 244, 429, 463, 44, 51, 163, 35, 30, 20, 5

Vehicle 18 4 846 199, 212, 217, 218

Wine-Quality 12 7 6497 30, 216, 2138, 2836, 1079, 193, 5

Table 4
Dataset parameters

Datasets Attribute Class Instance Class distribution

Ecoli 7 8 336 143, 77, 52, 35, 20, 5, 2, 2

Yeast 8 10 1484 244, 429, 463, 44, 51, 163, 35, 30, 20, 5

New-thyroid 5 3 215 150, 35, 30

Wine-Quality Red 11 6 1599 10, 53, 681, 638, 199, 18

Table 7
Confusion matrix of binary class

Positive class Negitive class

Actual positive class True Positive (TP) False Negative (FN)

Actual negative class False Posivite (FP) True Negative (TN)