Abstract
Majority classes are easily to be found in imbalance datasets, instead, minority classes are hard to be paid attention to due to the number of is rare. However, most existing classifiers are better at exploring majority classes, resulting in that classification results are unfair. To address this issue of binary classification for imbalance data, this paper proposes a novel fuzzy support vector machine. The thought is that we trained two support vector machines to learn the majority class and the minority class, respectively. Then, the proposed fuzzy is used to estimate the assistance provided by instance points for the training of the support vector machines. Finally, it can be judged for unknown instance points through evaluating that they provided the assistance to the training of the support vector machines. Results on the ten UCI datasets show that the class accuracy of the proposed method is 0.747 when the imbalanced ratio between the classes reaches 87.8. Compare with the competitors, the proposed method wins over them in classification performance. We find that aiming at the classification of imbalanced data, the complexity of data distribution has negative effects on classification results, while fuzzy can resist these negative effects. Moreover, fuzzy can assist those classifiers to gain superior classification boundaries.
Introduction
Real-world data is usually imbalanced. The so-called imbalanced data is that there exhibits the extreme difference in the number of samples between classes in data [1, 2]. In terms of imbalanced data, binary classification has positive meanings, such as, software defect prediction [3], machinery fault diagnosis [4], spam filtering [5].
Class imbalance brings two challenges for binary classification, where Challenge I, most classifiers are good at focusing on majority classes instead of minority classes [6, 7], e.g., Indefinite Core Vector Machine (ICVM) [8], Varying Coefficient SVM (VCSVM) [9], whereas, this kind of attention easily weakens generalization ability of classifiers. Challenge II, imbalanced data may hide the noise interfering classifiers. Compared with majority classes, minority classes are easily masked by noise because of the number scarcity [10]. Moreover, noise also induces class overlapping [11, 12] so that classifiers obtain incorrect classification results. Consequently, binary classification on imbalanced datasets is a tough work.
We proposed dense fuzzy classification method consisting of fuzzy and density estimation to classify imbalanced data, named FSVM. FSVM does not have to depend on a specific scenario through calculating the density of majority and minority classes. The complexity of data distribution has negative effects on classification results of imbalanced datasets. We proposed to evaluate the easy-to-calculate fuzzy to resist the negative effects caused by complex data distributions. Moreover, the fuzzy assists classifiers to gain superior classification boundaries.
This paper is organized as follows. Section 2 reviews the related work. Section 3 illustrates the method, including the calculation of dense fuzzy and model’s implementation. Experimental settings are described in Section 4. The results and discussions are given in Section 5. Finally, the conclusions are summarized and future work is directed in Section 6.
Related work
Some efforts have been obtained aiming at the binary classification, such as data level-based methods, Algorithm level-based methods, and deep architectures-based methods. As follows.
(I) Data level-based methods
Data level-based methods, which often adopt under-sampling techniques and over-sampling techniques. For under-sampling techniques by removing majority class instances or over-sampling techniques by replicating minority class instances, they are all to balance class distribution. Unfortunately, under-sampling and over-sampling techniques have to be face with pretreating data distribution before training classifiers [13]. Such as, these methods implemented in [14, 15]. The [16] proposed a weighted kernel-based SMOTE through synthesizing positive class samples in feature space, which addresses the drawback of linear interpolation of SMOTE. The [17] proposed a SMOTE based class-specific extreme learning machine via exploiting both the minority oversampling and the class-specific regularization. Tao et al. [18] proposed an over-sampling approach by the real-value negative selection to synthesize minority samples. And the [19] proposed an adaptive weighted over-sampling method. Additionally, for binary classification on high-dimensional datasets, the [20] proposed an alternative distance metric of computing the neighbors for each minority sample through using SMOTE oversampling strategy. Similarly, these instances implemented in [21–23]. Results indicate that these methods effectively address the issue of between-class imbalance, whereas, the distribution of minority classes are not taken into account for them. In this situation, the generalization ability of classifiers may be reduced so that they difficult adapt to complex data sets with high imbalance ratio.
(II) Algorithm level-based methods
Unlike data level-based methods, such methods treat imbalanced data by modifying classifiers or modifying decision thresholds, e.g., T-SVM (twin SVM) [24], Pin-TSVM (Pinball loss twin SVM) [25]. Indeed, precision metric and recall metric can be improved by modifying decision thresholds when using algorithm level-based methods, but AUC (area under the curve) metric may not be improved.
(III) Deep network architectures-based methods
Deep network architectures-based methods, which are applied to binary classification because of excellent learning capabilities. Such as GAN (Generative Adversarial Network) [26], AC-GAN (Auxiliary Classifier-GAN) [27], and MFC-GAN (Multiple Fake Class-GAN) [28]. Whereas, GAN-based network architectures suffer from mode collapse during training. In addition, Zhai [29] et al proposed the novel classifier Based on Modified D2GAN D2GAN. And also including the methods in [30–32].
(IV) Fuzzy-based methods
Generally, fuzzy is considered to assist models accurately classify imbalance data. Such as, the Fuzzy-SVM (fuzzy support vector machine) [33], while Fuzzy-SVM only considers the distance between the training samples and the class center, so that minority classes may be mistaken as noise. Liu [34] developed the extended FSVM-CIL (fuzzy support vector machines for class imbalance learning). Similarly, the NF-SVM (new fuzzy support vector machine) [35] and F-SVM (fuzzy support vector machine) [36]. Chen [37] et al proposed a fuzzy support vector machine with graph to classify imbalanced datasets. In addition, fuzzy is also introduced into the traditional classification methods, such as, the Fuzzy Soft k-Nearest Neighbor Classifier [38] proposed by S.MemişaS. Indeed, these indicate that classification capabilities of those classifiers can be improved by utilizing fuzzy [39].
Methods
Preliminary
This section gives the relative definition in advance to illustrate the proposed method. As follows.
For an unknown point x u in Fig. 1(a), there are two results, of which one is that x u was classified into majority classes in Fig. 1(b), another one is that x u was classified into minority classes in Fig. 1(c). However, we are eager to know how to determine the classification of the point x u , accurately.

Classification procedure of an unknown point. Unknown points are marked as purple diamond. Majority classes and minority classes are marked as red circles and yellow squares, respectively. Blue curves are classification boundaries. (a) displays the scenario that an unknown point x u is not classified. Figure 1(b) and 1(c) display the classified results of an unknown point x u .
To improve the classification precision, fuzzy membership is imported during classifying. Supposing that C +1, C - 1 are the class center of the majority class and minority class, respectively, illustrated in Fig. 2(a). For an unknown point x i , the distance square of x i and C +1, and the distance square of x i and C - 1 are calculated, respectively, illustrated in Fig. 2(b). As follows

Calculation of an unknown point. Black circle points are the class centre. Unknown points are marked as purple diamond. Majority classes and minority classes are marked as red circles and yellow squares, respectively. Blue curves are classification boundaries.
Where φ (x i ) is a mapping function. L +1, l - 1 are the number of majority classes and minority classes. κ (x i , x) is a kernel function. For calculation of C +1, C - 1, we used the class density estimation method in [40] to estimate the C +1, C - 1. Having that
Where |C +1|, |C - 1| are the number of elements belonging to majority classes and minority classes, respectively. τ is confidence level. P is the probability. The advantage of the calculation manner in [41] is that through estimating class label density, C +1, C - 1 are estimated from a probabilistic point of view, therefore, calculation results are more credible. For the corresponding algorithm of the class label density estimation, please refer to [40]. Fuzzy membership of point x i is given in Equation (12).
Where θ > 1 is a constant to prevent the denominator from being zero.
RBF (Radial Basis Function) is used as the κ (,) in Equation (1) and (2), i.e., κ (x i , x) = exp {- γ||x - x i ||2}, where γ is a kernel parameter. Because SVM with RBF can obtain very smooth estimations, it gains superior classification performance [41].
Assuming that {(x1, y1, f1) , …, (x
i
, y
i
, f
i
)} is a set of training data, which consists of i samples with the corresponding fuzzy memberships 0 < f
i
≤ 1. The proposed fuzzy support vector machine, namely FSVM, is defined as follows
Where ξ
i
is slack variables and C is a constant. w, b are the weight and bias, respectively. By constructing Lagrangian equation, Equation (6) can be converted into Equation (7).
Where α
i
, β
i
the Lagrange multiplier. Finding the partial derivative of L (w, b, ξ, α, β), as follows
Importing Equations (8)–(10) into Equations (7), (6) can be written
And the Karush-Kuhn-Tucker (K.K.T) conditions [42] are described as follows
The implementation of our FSVM is given in Algorithm 1 and Algorithm 2.
Algorithm 1 displays the training of FSVM.Training sample X = {x1, . . . , x i , . . . , } is used for the input of FSVM.The final input of FSVM is the learned class label {+1, …, -1}. Firstly, the parameters are initialized in Step 1. The procedure of Step 2 and Step 6 displays the calculated process that each point in X provides the assistance to the training of the two support vector machines. The details are follows.
Utilizing Equations (1), (2) calculates the distance square between point x i and C+1, and the distance square between point x i and C-1, respectively, as shown Step 4 and Step 5 in Algorithm 1. Then, Step 6 in Algorithm 1 invokes Algorithm 2 to calculate the fuzzy membership of pointx i . The calculated description is as follows.
The output of Algorithm 2 is class labels. For point x
i
, if d2 (C +1) < d2 (C - 1) holds, we use
By using the returned results in Step 6 in Algorithm 1, the learned class labels are obtained in Step 7 in Algorithm 1. The training is terminated until each point in X is judged, as shown in Step 8 to Step 11 in Algorithm 1. Finally, Step 12 in Algorithm 1 outputs the learned class labels.
Datasets
Two synthetic datasets S1 and S2 were generated to verify the proposed FSVM, of which one used the Gaussian distribution N(0,1), the other used a random distribution. Imbalanced ratio (IR) between the number of majority classes and minority classes is 9 : 1, i.e., IR = 9. We also selected ten UCI datasets from UCI machine learning repository [43]. Table 1 presents the details regarding the two synthetic datasets and the ten UCI datasets. Figure 3 displays the graphic illustration of the two synthetic datasets. For the twelve datasets, i.e., two synthetic datasets S1, S2, and ten UCI datasets U1-U10, 80% of each dataset in them is randomly selected as a training set, which is used to train our model and the competitors. The rest of 20% is used as a testing set, which can verify our model and the competitors.
Basic information of the synthetic and UCI datasets
Basic information of the synthetic and UCI datasets

Graphic illustration of synthetic datasets. Red circles and green circles are the minority class and the majority class, respectively. Datasets S1 and S2 used the Gaussian distribution N(0,1), a random distribution, respectively.
As for comparison methods, we selected a similar fuzzy support vector machine Fuzzy-SVM [33], FSVM-CIL [34], NF-SVM [35] and F-SVM [36]. Given that the structure of the proposed FSVM, T-SVM [24], Pin-TSVM [25] are selected. In addition, GAN [26] was used for a comparison.
Accuracy metric, F1-score metric, Precision metric and Recall metric are used for evaluated indicators.
Where TP, TN, FP, FN are the number of true majority classes, true minority classes, false majority classes and false minority classes, respectively.
The corresponding algorithms of the eight methods were implemented using Python on Tensorflow framework, and then they were run the same GPU using the same environment settings. Additional, to reduce the influence of randomness, each experiment was independently run 100 times, and then the average was used for assessment.
This section presents experimental results, including the ability of learning boundaries, classification accuracy and running time. Results show that our method defeats the comparison methods in classification accuracy and learning boundaries. Furthermore, we discussed the experimental results to give some insights regarding the classification in imbalanced data.
Result analysis
(1) Classification boundaries
We used the synthetic datasets S1 and S2 to evaluate that our method and competing methods learn classification boundaries, as shown in Figs. 4 5.

Classification performance on synthetic datasets. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy.

Visualization of classification results. Dataset S1 used the Gaussian distribution. Dataset S2 used a random distribution. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy. Red circles and green circles are the minority class and the majority class, respectively. Black curves are the boundaries learned by the models.
Figure 4 indicates that FSVM wins over the seven competitors in classification performance. The eight methods (our method and seven competitors) classification capabilities drop quickly when data distributions become complex. Where, the five SVMs with fuzzy (FSVM, Fuzzy-SVM, FSVM-CIL, NF-SVM, F-SVM) outperform the two SVMs without fuzzy (T-SVM, Pin-TSVM).
Figure 5 unveils that these boundaries learned by FSVM are better than those learned by the seven competitors. Although the eight models learn superior boundaries on synthetic dataset S1, FSVM learns the boundaries surrounding around minority classes better than the seven competitors do. This implies that FSVM can focus more on minority classes than the seven competitors do. As for dataset S2, the boundaries surrounding around minority classes learned by FSVM are better than those learned by the seven competitors. Clearly, these boundaries learned by the five SVMs with fuzzy (i.e., FSVM, Fuzzy-SVM, FSVM-CIL, NF-SVM, F-SVM) are better than the SVMs without fuzzy. (e.g., T-SVM, Pin-TSVM).
(2) Classification accuracy
In this subsection, we utilized the UCI datasets U1-U10 to analyze FSVM’s classification ability, results of which show that the proposed FSVM outperforms the seven competitors in classification capabilities on most datasets, illustrated in Tables 2–5. Particularly, on high IR datasets U10 (IR = 87.8) and U9 (IR = 58.4), FSVM has outstanding advantages more than the seven competitors. Figure 7 displays ROC with the corresponding AUC values of FSVM and the seven competitors on dataset U10. Similarly, the same conclusions can be also drawn that the SVMs with fuzzy outperform these without fuzzy on most datasets in terms of classification capabilities.
Accuracy values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy
F1-score values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy
Precision values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy
Recall values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy
Together, some observation can be obtained on the two synthetic and ten UCI datasets, (i) fuzzy improves classification capabilities of these models, especially on highly imbalanced dataset U9 and U10. (ii) Fuzzy assists these models to learn superior classification boundaries, including on imbalanced datasets complicated distributions, e.g., dataset S2.
(3) Running time
Figure 6 displays running time of the eight algorithms on the synthetic datasets. It can be seen that the ascendency of our algorithm in running-time is not as significant as its classification accuracy, since the calculation cost mainly spends label density, that is, Equations (11) spend time too much during estimating label density. Nevertheless, compared with these competitors, it is not the poorest in running-time. As data distributions become complexity, i.e., synthetic datasets S1 and S2 used the Gaussian distribution N(0,1), a random distribution, respectively, running time of the eight algorithms starts to augment.

Running time of different algorithms. Synthetic datasets S1 and S2 used the Gaussian distribution N(0,1), a random distribution, respectively. Symbol √ indicates the algorithms with fuzzy, otherwise, they are marked as symbol×.
(1) Insights
Compared with the seven competing methods, the proposed method has more ascendency in classifying imbalanced data. This is because the assistance provided by points for the training of SVM can be estimated by the fuzzy membership in Equations (2). By this doing, each instance points in the sample can be classified into corresponding classes. Consequently, the proposed method can gain superior classification accuracy on imbalanced datasets.
In addition to classification methods themself restrict the classification accuracy, the attributions of those datasets are another major factor, such as, imbalanced ratio, data dimensionality and data volume, etc. Additionally, noise is also a negative effect factor for classification accuracies because of the interference capabilities, indicating that noise can mask minority classes so that those classifiers are easily misled.
(2) Limitations
However, the proposed method also has disadvantages, for instance, the classification ability relies on the proposed fuzzy, which means that Equations (2) have pivotal effects on classification results. As such, time consumption of the training increases as data volume or data dimensionality augment. Once the need of using large-scale data trains our model, the convergence epochs may be increased. Nevertheless, this does not mean that our model does not converge, but the training epochs may become longer.
Conclusions
This paper proposed a novel fuzzy support vector machine to classify imbalanced datasets. Through calculating fuzzy membership of instance points, the assistance provided by points for the training of the support vector machine is assessed, which increases classification accuracy of the model. Results on the synthetic datasets with high imbalance ratio (IR = 9 : 1) show that the class accuracy of the proposed method is 0.888, and our method defeats comparative methods. Results on UCI datasets indicate that our method is 0.747 in terms of classification accuracy when the imbalanced ratio reaches 87.8, and significantly wins over competitors. We demonstrate that negative effects caused by the complexity of data distribution can be reduced by fuzzy. Moreover, those classifiers also learned superior classification boundaries by fuzzy. In future work, we will explore binary classification of highly imbalanced data in the presence of noise interference. Usually, real-world datasets contain the noise masking minority classes. Minority classes and noise can exist in any of subspaces of data spaces. Along with the increasing of the data dimensionality, the number of subspaces increases exponentially so that an exponential searching space is formed. To distinguish those classes from noise, selecting a subspace of highlighting class attributes is so difficult in an exponential searching space.
Declaration
The authors have no conflicts. The data in this work is available.
Data availability statement
The data can be used. https://www.ics.uci.edu/mlearn/MLRepository.html
Contributions
Qingling Wang and Jian Zheng proposed the method and wrote the manuscript. Qingling Wang and Wenjing Zhang designed and analyzed the experiments. Qingling Wang performed the source codes.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
