Dense fuzzy support vector machine to binary classification for imbalanced data

Abstract

Majority classes are easily to be found in imbalance datasets, instead, minority classes are hard to be paid attention to due to the number of is rare. However, most existing classifiers are better at exploring majority classes, resulting in that classification results are unfair. To address this issue of binary classification for imbalance data, this paper proposes a novel fuzzy support vector machine. The thought is that we trained two support vector machines to learn the majority class and the minority class, respectively. Then, the proposed fuzzy is used to estimate the assistance provided by instance points for the training of the support vector machines. Finally, it can be judged for unknown instance points through evaluating that they provided the assistance to the training of the support vector machines. Results on the ten UCI datasets show that the class accuracy of the proposed method is 0.747 when the imbalanced ratio between the classes reaches 87.8. Compare with the competitors, the proposed method wins over them in classification performance. We find that aiming at the classification of imbalanced data, the complexity of data distribution has negative effects on classification results, while fuzzy can resist these negative effects. Moreover, fuzzy can assist those classifiers to gain superior classification boundaries.

Keywords

Binary classification fuzzy imbalanced data support vector machines

1 Introduction

Real-world data is usually imbalanced. The so-called imbalanced data is that there exhibits the extreme difference in the number of samples between classes in data [1, 2]. In terms of imbalanced data, binary classification has positive meanings, such as, software defect prediction [3], machinery fault diagnosis [4], spam filtering [5].

Class imbalance brings two challenges for binary classification, where Challenge I, most classifiers are good at focusing on majority classes instead of minority classes [6, 7], e.g., Indefinite Core Vector Machine (ICVM) [8], Varying Coefficient SVM (VCSVM) [9], whereas, this kind of attention easily weakens generalization ability of classifiers. Challenge II, imbalanced data may hide the noise interfering classifiers. Compared with majority classes, minority classes are easily masked by noise because of the number scarcity [10]. Moreover, noise also induces class overlapping [11, 12] so that classifiers obtain incorrect classification results. Consequently, binary classification on imbalanced datasets is a tough work.

Motivation. The motivation of this work, therefore, is to classify imbalanced datasets. Certainly, there also devotes at giving the referred suggestions that classifiers can pay attention to those hard-observe minority classes. However, the final goal is to assist classifiers to be suitable for imbalanced datasets with highly imbalanced ratio. Consequently, this paper proposes a novel fuzzy support vector machine. To increase binary classification accuracy, the proposed fuzzy is used to estimate the assistance provided by instance points for the training of the support vector machine.

Contributions. We summarize the main contributions of this work as follows.

We proposed dense fuzzy classification method consisting of fuzzy and density estimation to classify imbalanced data, named FSVM. FSVM does not have to depend on a specific scenario through calculating the density of majority and minority classes.

The complexity of data distribution has negative effects on classification results of imbalanced datasets. We proposed to evaluate the easy-to-calculate fuzzy to resist the negative effects caused by complex data distributions. Moreover, the fuzzy assists classifiers to gain superior classification boundaries.

This paper is organized as follows. Section 2 reviews the related work. Section 3 illustrates the method, including the calculation of dense fuzzy and model’s implementation. Experimental settings are described in Section 4. The results and discussions are given in Section 5. Finally, the conclusions are summarized and future work is directed in Section 6.

2 Related work

Some efforts have been obtained aiming at the binary classification, such as data level-based methods, Algorithm level-based methods, and deep architectures-based methods. As follows.

(I) Data level-based methods

Data level-based methods, which often adopt under-sampling techniques and over-sampling techniques. For under-sampling techniques by removing majority class instances or over-sampling techniques by replicating minority class instances, they are all to balance class distribution. Unfortunately, under-sampling and over-sampling techniques have to be face with pretreating data distribution before training classifiers [13]. Such as, these methods implemented in [14, 15]. The [16] proposed a weighted kernel-based SMOTE through synthesizing positive class samples in feature space, which addresses the drawback of linear interpolation of SMOTE. The [17] proposed a SMOTE based class-specific extreme learning machine via exploiting both the minority oversampling and the class-specific regularization. Tao et al. [18] proposed an over-sampling approach by the real-value negative selection to synthesize minority samples. And the [19] proposed an adaptive weighted over-sampling method. Additionally, for binary classification on high-dimensional datasets, the [20] proposed an alternative distance metric of computing the neighbors for each minority sample through using SMOTE oversampling strategy. Similarly, these instances implemented in [21 –23]. Results indicate that these methods effectively address the issue of between-class imbalance, whereas, the distribution of minority classes are not taken into account for them. In this situation, the generalization ability of classifiers may be reduced so that they difficult adapt to complex data sets with high imbalance ratio.

(II) Algorithm level-based methods

Unlike data level-based methods, such methods treat imbalanced data by modifying classifiers or modifying decision thresholds, e.g., T-SVM (twin SVM) [24], Pin-TSVM (Pinball loss twin SVM) [25]. Indeed, precision metric and recall metric can be improved by modifying decision thresholds when using algorithm level-based methods, but AUC (area under the curve) metric may not be improved.

(III) Deep network architectures-based methods

Deep network architectures-based methods, which are applied to binary classification because of excellent learning capabilities. Such as GAN (Generative Adversarial Network) [26], AC-GAN (Auxiliary Classifier-GAN) [27], and MFC-GAN (Multiple Fake Class-GAN) [28]. Whereas, GAN-based network architectures suffer from mode collapse during training. In addition, Zhai [29] et al proposed the novel classifier Based on Modified D2GAN D2GAN. And also including the methods in [30 –32].

(IV) Fuzzy-based methods

Generally, fuzzy is considered to assist models accurately classify imbalance data. Such as, the Fuzzy-SVM (fuzzy support vector machine) [33], while Fuzzy-SVM only considers the distance between the training samples and the class center, so that minority classes may be mistaken as noise. Liu [34] developed the extended FSVM-CIL (fuzzy support vector machines for class imbalance learning). Similarly, the NF-SVM (new fuzzy support vector machine) [35] and F-SVM (fuzzy support vector machine) [36]. Chen [37] et al proposed a fuzzy support vector machine with graph to classify imbalanced datasets. In addition, fuzzy is also introduced into the traditional classification methods, such as, the Fuzzy Soft k-Nearest Neighbor Classifier [38] proposed by S.MemişaS. Indeed, these indicate that classification capabilities of those classifiers can be improved by utilizing fuzzy [39].

3 Methods

3.1 Preliminary

This section gives the relative definition in advance to illustrate the proposed method. As follows.

Definition 1. Given an imbalanced dataset $S = {x_{u} | u = 1, 2, . .,} \in R^{m}$ , x_u is the data. $R^{m}$ is a m-dimensional Euler space. The task of binary classification is to train a model and to strengthen the learning of class label y = {+1, - 1} of the model, where +1 and –1 are the learned majority and minority class labels.

For an unknown point x_u in Fig. 1(a), there are two results, of which one is that x_u was classified into majority classes in Fig. 1(b), another one is that x_u was classified into minority classes in Fig. 1(c). However, we are eager to know how to determine the classification of the point x_u, accurately.

Fig. 1

Classification procedure of an unknown point. Unknown points are marked as purple diamond. Majority classes and minority classes are marked as red circles and yellow squares, respectively. Blue curves are classification boundaries. (a) displays the scenario that an unknown point x_u is not classified. Figure 1(b) and 1(c) display the classified results of an unknown point x_u.

3.2 Calculations of dense fuzzy

To improve the classification precision, fuzzy membership is imported during classifying. Supposing that C₊₁, C_- 1 are the class center of the majority class and minority class, respectively, illustrated in Fig. 2(a). For an unknown point x_i, the distance square of x_i and C₊₁, and the distance square of x_i and C_- 1 are calculated, respectively, illustrated in Fig. 2(b). As follows

Fig. 2

Calculation of an unknown point. Black circle points are the class centre. Unknown points are marked as purple diamond. Majority classes and minority classes are marked as red circles and yellow squares, respectively. Blue curves are classification boundaries.

$\begin{matrix} d^{2} (C_{+ 1}) = | | φ (x_{i}) - C_{+ 1} | |^{2} \\ = φ (x_{i})^{2} - 2 φ (x_{i}) C_{+ 1} + C_{+ 1}^{2} \\ = κ (x_{i}, x) - \frac{2}{l_{+ 1}} \sum κ (x_{i}, x) C_{+ 1} + C_{+ 1}^{2} \end{matrix}$ (1) $\begin{matrix} d^{2} (C_{- 1}) = | | φ (x_{i}) - C_{- 1} | |^{2} \\ = φ (x_{i})^{2} - 2 φ (x_{i}) C_{- 1} + C_{- 1}^{2} \\ = κ (x_{i}, x) - \frac{2}{l_{- 1}} \sum κ (x_{i}, x) C_{- 1} + C_{- 1}^{2} \end{matrix}$ (2)

Where φ (x_i) is a mapping function. L₊₁, l_- 1 are the number of majority classes and minority classes. κ (x_i, x) is a kernel function. For calculation of C₊₁, C_- 1, we used the class density estimation method in [40] to estimate the C₊₁, C_- 1. Having that

$C_{+ 1} = sup {C_{+ 1} : \frac{1}{| C_{+ 1} |} \sum P (x_{j} \in majority classes) \geq 1 - τ)}$ (3)

$C_{- 1} = sup {C_{- 1} : \frac{1}{| C_{- 1} |} \sum P (x_{k} \in classes) \geq 1 - τ)}$ (4)

Where |C₊₁|, |C_- 1| are the number of elements belonging to majority classes and minority classes, respectively. τ is confidence level. P is the probability. The advantage of the calculation manner in [41] is that through estimating class label density, C₊₁, C_- 1 are estimated from a probabilistic point of view, therefore, calculation results are more credible. For the corresponding algorithm of the class label density estimation, please refer to [40]. Fuzzy membership of point x_i is given in Equation (12).

$f_{i} = {\begin{matrix} 1 - \frac{1}{\sqrt{d^{2} (C_{+ 1}) + θ}} if | | φ (x_{i}) - C_{+ 1} | |^{2} \leq | | φ (x_{i}) - C_{- 1} | |^{2} \\ 1 - \frac{1}{\sqrt{d^{2} (C_{- 1}) + θ}} if | | φ (x_{i}) - C_{+ 1} | |^{2} > | | φ (x_{i}) - C_{- 1} | |^{2} \end{matrix}$ (5)

Where θ > 1 is a constant to prevent the denominator from being zero.

RBF (Radial Basis Function) is used as the κ (,) in Equation (1) and (2), i.e., κ (x_i, x) = exp {- γ||x - x_i||²}, where γ is a kernel parameter. Because SVM with RBF can obtain very smooth estimations, it gains superior classification performance [41].

3.3 Model’s implementations

Assuming that {(x₁, y₁, f₁) , …, (x_i, y_i, f_i)} is a set of training data, which consists of i samples with the corresponding fuzzy memberships 0 < f_i ≤ 1. The proposed fuzzy support vector machine, namely FSVM, is defined as follows $\begin{matrix} min \frac{1}{2} w^{T} w + C \sum_{i = 1}^{l} f_{i} ξ_{i} \\ s . t . y_{i} (w^{T} z_{i} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0, i = 1, . . ., l \end{matrix}$ (6)

Where ξ_i is slack variables and C is a constant. w, b are the weight and bias, respectively. By constructing Lagrangian equation, Equation (6) can be converted into Equation (7). $\begin{matrix} L (w, b, ξ, α, β) = \frac{1}{2} w^{T} w + C \sum_{i = 1}^{l} f_{i} ξ_{i} \\ - \sum_{i = 1}^{l} α_{i} (y_{i} (w^{T} z_{i} + b) - 1 + ξ_{i}) - \sum_{i = 1}^{l} β_{i} ξ_{i} \end{matrix}$ (7)

Where α_i, β_i the Lagrange multiplier. Finding the partial derivative of L (w, b, ξ, α, β), as follows $\frac{\partial L (w, b, ξ, α, β)}{\partial w} = 0 \Rightarrow w - \sum_{i = 1}^{l} α_{i} y_{i} z_{i} = 0$ (8) $\frac{\partial L (w, b, ξ, α, β)}{\partial b} = 0 \Rightarrow - \sum_{i = 1}^{l} α_{i} y_{i} = 0$ (9)

$\frac{\partial L (w, b, ξ, α, β)}{\partial ξ_{i}} = 0 \Rightarrow f_{i} C - α_{i} - β_{i} = 0$ (10)

Importing Equations (8)–(10) into Equations (7), (6) can be written $\begin{matrix} min \sum_{i = 1}^{l} α_{i} - \frac{1}{2} \sum_{i = 1}^{l} \sum_{j = 1}^{l} α_{i} α_{j} y_{i} y_{j} κ (x_{i}, x_{j}) \\ s . t . y_{i} \sum_{i = 1}^{l} y_{i} α_{i} = 0, 0 \leq α_{i} \leq f_{i} C i = 1, . . ., l \end{matrix}$ (11)

And the Karush-Kuhn-Tucker (K.K.T) conditions [42] are described as follows $\begin{matrix} α_{i} (y_{i} (w^{T} z_{i} + b) - 1 + ξ_{i}) = 0, i = 1, . ., l \\ (f_{i} C - α_{i}) ξ_{i} = 0, i = 1, . . ., l \end{matrix}$ (12)

The implementation of our FSVM is given in Algorithm 1 and Algorithm 2.

Algorithm 1 displays the training of FSVM.Training sample X = {x₁, . . . , x_i, . . . , } is used for the input of FSVM.The final input of FSVM is the learned class label {+1, …, -1}. Firstly, the parameters are initialized in Step 1. The procedure of Step 2 and Step 6 displays the calculated process that each point in X provides the assistance to the training of the two support vector machines. The details are follows.

Utilizing Equations (1), (2) calculates the distance square between point x_i and C₊₁, and the distance square between point x_i and C_-1, respectively, as shown Step 4 and Step 5 in Algorithm 1. Then, Step 6 in Algorithm 1 invokes Algorithm 2 to calculate the fuzzy membership of pointx_i. The calculated description is as follows.

The output of Algorithm 2 is class labels. For point x_i, if d² (C₊₁) < d² (C_- 1) holds, we use $f_{i} = 1 - 1 / \sqrt{d^{2} (C_{+ 1}) + θ}$ in Equation (5) to calculate the fuzzy membership. Then, point x_i is assigned into the majority class, and gains a majority class label+1. The learned class label+1 are returned, as shown in the procedure between Step 2 and Step 7 in Algorithm 2. Instead, if d² (C₊₁) ≥ d² (C_- 1) holds, point x_i is assigned into the minority class, and gains a minority class label –1. The learned class label +1 are returned, illustrated in Step 8 to Step 13 in Algorithm 2.

By using the returned results in Step 6 in Algorithm 1, the learned class labels are obtained in Step 7 in Algorithm 1. The training is terminated until each point in X is judged, as shown in Step 8 to Step 11 in Algorithm 1. Finally, Step 12 in Algorithm 1 outputs the learned class labels.

Algorithm 1 the training of FSVM

Input: training sample X = {x₁, . . . , x_i, . . . , }.

Output: learned class label {+1, - 1}.

1 Initialization C₊₁ = 0, C_-1 = 0;

2 foreachx_iinXdo:

3 using Equations (3), (4) calculates the distance square of point x_i and C₊₁, C_-1;

4 d² (C₊₁) = < x_i, C₊₁ > ← Eq . (1);

5 d² (C_- 1) = < x_i, C_-1 > ← Eq . (2);

6 labels = DensityFuzzy (d² (C₊₁) , d² (C_- 1)); /* invoking Algorithm 2*/

7 obtain the learned class labels {+1, . . . , -1 . . . , } ← labels;

8 ifx_i is an empty set then: /* judging each point */

9 break;

10 end if

11 end foreach

12 return {+1, . . . , -1 . . . , }

Algorithm 2 Calculating fuzzy membership

Input: d² (C₊₁) , d² (C_- 1).

Output: class label {+1, …, -1 … , }

1 Initialization parameter θ;

2 ifd² (C₊₁) < d² (C_- 1) then:

3 using f_i in Equation (5) calculates fuzzy membership of point x_i;

f_{i} = 1 - 1 / \sqrt{d^{2} (C_{+ 1}) + θ}

;

5 point x_i is regarded as the majority class and gains a majority class label;

6 x_i ← +1;

7 return {+1};

8 else:

9 using f_i in Equation (5) calculates fuzzy membership of point x_i;

f_{i} = 1 - 1 / \sqrt{d^{2} (C_{- 1}) + θ}

;

11 point x_i is classified into the minority class and gains a minority class label;

12 x_i ← -1;

13 return {-1};

4 Experimental settings

4.1 Datasets

Two synthetic datasets S1 and S2 were generated to verify the proposed FSVM, of which one used the Gaussian distribution N(0,1), the other used a random distribution. Imbalanced ratio (IR) between the number of majority classes and minority classes is 9 : 1, i.e., IR = 9. We also selected ten UCI datasets from UCI machine learning repository [43]. Table 1 presents the details regarding the two synthetic datasets and the ten UCI datasets. Figure 3 displays the graphic illustration of the two synthetic datasets. For the twelve datasets, i.e., two synthetic datasets S1, S2, and ten UCI datasets U1-U10, 80% of each dataset in them is randomly selected as a training set, which is used to train our model and the competitors. The rest of 20% is used as a testing set, which can verify our model and the competitors.

Table 1
Basic information of the synthetic and UCI datasets

ID Dataset Note The number of minority classes The number of majority classes #Instances IR

S1 Synthetic dataset Artificial datasets 40 360 400 9

S2 Synthetic dataset 40 360 400 9

U1 Yeastl UCI datasets 53 131 184 2.46

U2 Spect 56 208 264 3.75

U3 Segment0 329 1979 2308 6.02

U4 Blocks0 559 4913 5472 8.79

U5 Vowel0 90 898 988 9.98

U6 led7 37 406 443 11

U7 Ecoli 20 316 336 15.8

U8 yeast6 35 1449 1484 41.4

U9 Poker 25 1460 1485 58.4

U10 wine-red 18 1581 1599 87.8

ID	Dataset	Note	The number of minority classes	The number of majority classes	#Instances	IR
S1	Synthetic dataset	Artificial datasets	40	360	400	9
S2	Synthetic dataset		40	360	400	9
U1	Yeastl	UCI datasets	53	131	184	2.46
U2	Spect		56	208	264	3.75
U3	Segment0		329	1979	2308	6.02
U4	Blocks0		559	4913	5472	8.79
U5	Vowel0		90	898	988	9.98
U6	led7		37	406	443	11
U7	Ecoli		20	316	336	15.8
U8	yeast6		35	1449	1484	41.4
U9	Poker		25	1460	1485	58.4
U10	wine-red		18	1581	1599	87.8

Fig. 3

Graphic illustration of synthetic datasets. Red circles and green circles are the minority class and the majority class, respectively. Datasets S1 and S2 used the Gaussian distribution N(0,1), a random distribution, respectively.

4.2 Comparison methods and assessment metrics

As for comparison methods, we selected a similar fuzzy support vector machine Fuzzy-SVM [33], FSVM-CIL [34], NF-SVM [35] and F-SVM [36]. Given that the structure of the proposed FSVM, T-SVM [24], Pin-TSVM [25] are selected. In addition, GAN [26] was used for a comparison.

Accuracy metric, F1-score metric, Precision metric and Recall metric are used for evaluated indicators. $Accuracy = \frac{TP + TN}{TP + FP + TN + FN}$ (13) $F 1 - score = \frac{2 TP}{2 TP + FP + FN}$ (14) $Precision = \frac{TP}{TP + FP}$ (15) $Recall = \frac{TP}{TP + FN}$ (16)

Where TP, TN, FP, FN are the number of true majority classes, true minority classes, false majority classes and false minority classes, respectively.

The corresponding algorithms of the eight methods were implemented using Python on Tensorflow framework, and then they were run the same GPU using the same environment settings. Additional, to reduce the influence of randomness, each experiment was independently run 100 times, and then the average was used for assessment.

5 Results and discussions

This section presents experimental results, including the ability of learning boundaries, classification accuracy and running time. Results show that our method defeats the comparison methods in classification accuracy and learning boundaries. Furthermore, we discussed the experimental results to give some insights regarding the classification in imbalanced data.

5.1 Result analysis

(1) Classification boundaries

We used the synthetic datasets S1 and S2 to evaluate that our method and competing methods learn classification boundaries, as shown in Figs. 4 5.

Fig. 4

Classification performance on synthetic datasets. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy.

Fig. 5

Visualization of classification results. Dataset S1 used the Gaussian distribution. Dataset S2 used a random distribution. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy. Red circles and green circles are the minority class and the majority class, respectively. Black curves are the boundaries learned by the models.

Figure 4 indicates that FSVM wins over the seven competitors in classification performance. The eight methods (our method and seven competitors) classification capabilities drop quickly when data distributions become complex. Where, the five SVMs with fuzzy (FSVM, Fuzzy-SVM, FSVM-CIL, NF-SVM, F-SVM) outperform the two SVMs without fuzzy (T-SVM, Pin-TSVM).

Figure 5 unveils that these boundaries learned by FSVM are better than those learned by the seven competitors. Although the eight models learn superior boundaries on synthetic dataset S1, FSVM learns the boundaries surrounding around minority classes better than the seven competitors do. This implies that FSVM can focus more on minority classes than the seven competitors do. As for dataset S2, the boundaries surrounding around minority classes learned by FSVM are better than those learned by the seven competitors. Clearly, these boundaries learned by the five SVMs with fuzzy (i.e., FSVM, Fuzzy-SVM, FSVM-CIL, NF-SVM, F-SVM) are better than the SVMs without fuzzy. (e.g., T-SVM, Pin-TSVM).

(2) Classification accuracy

In this subsection, we utilized the UCI datasets U1-U10 to analyze FSVM’s classification ability, results of which show that the proposed FSVM outperforms the seven competitors in classification capabilities on most datasets, illustrated in Tables 2 –5. Particularly, on high IR datasets U10 (IR = 87.8) and U9 (IR = 58.4), FSVM has outstanding advantages more than the seven competitors. Figure 7 displays ROC with the corresponding AUC values of FSVM and the seven competitors on dataset U10. Similarly, the same conclusions can be also drawn that the SVMs with fuzzy outperform these without fuzzy on most datasets in terms of classification capabilities.

Table 2

Accuracy values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy

#	FSVM	Fuzzy-SVM	FSVM-CIL	NF-SVM	F-SVM	T-SVM	Pin-TSVM	GAN
	(√)	(√)	(√)	(√)	(√)	(×)	(×)	(×)
U10	0.747	0.745	0.742	0.741	0.741	0.743	0.740	0.733
U9	0.755	0.752	0.754	0.751	0.754	0.750	0.707	0.709
U8	0.877	0.936	0.941	0.930	0.907	0.917	0.944	0.927
U7	0.859	0.858	0.859	0.847	0.843	0.837	0.803	0.849
U6	0.909	0.907	0.909	0.905	0.904	0.903	0.867	0.901
U5	0.889	0.813	0.822	0.811	0.823	0.771	0.801	0.622
U4	0.755	0.736	0.737	0.739	0.735	0.701	0.6824	0.733
U3	0.831	0.815	0.888	0.808	0.789	0.9610	0.9613	0.744
U2	0.859	0.844	0.855	0.849	0.854	0.840	0.822	0.805
U1	0. 911	0.847	0. 899	0.877	0.869	0.835	0.802	0.911

Table 3

F1-score values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy

#	FSVM	Fuzzy-SVM	FSVM-CIL	NF-SVM	F-SVM	T-SVM	Pin-TSVM	GAN
	(√)	(√)	(√)	(√)	(√)	(×)	(×)	(×)
U10	0.808	0.805	0.807	0.802	0.804	0.801	0.800	0.799
U9	0.822	0.817	0.817	0.811	0.812	0.810	0.804	0.708
U8	0.807	0.987	0.987	0.917	0.889	0.987	0.911	0.912
U7	0.850	0.800	0.811	0.822	0.781	0.833	0.833	0.817
U6	0.833	0.857	0.841	0.846	0.851	0.840	0.863	0.852
U5	0.891	0.822	0.821	0.817	0.833	0.666	0.812	0.723
U4	0.750	0.701	0.750	0.707	0.744	0.355	0.6422	0.620
U3	0.854	0.844	0.833	0.804	0.783	0.971	0.917	0.881
U2	0.891	0.830	0.831	0.841	0.833	0.828	0.803	0.800
U1	0.894	0.863	0.874	0.869	0.880	0.858	0.853	0.484

Table 4

Precision values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy

#	FSVM	Fuzzy-SVM	FSVM-CIL	NF-SVM	F-SVM	T-SVM	Pin-TSVM	GAN
	(√)	(√)	(√)	(√)	(√)	(×)	(×)	(×)
U10	0.711	0.708	0.711	0.708	0.708	0.702	0.707	0.707
U9	0.777	0.768	0.771	0.767	0.766	0.763	0.733	0.758
U8	0.803	0.912	0.911	0.933	0.811	0.900	0.933	0.883
U7	0.866	0.864	0.861	0.861	0.859	0.857	0.811	0.778
U6	0.811	0.846	0.838	0.826	0.833	0.822	0.806	0.816
U5	0.888	0.855	0.860	0.857	0.848	0.788	0.833	0.733
U4	0.722	0.720	0.722	0.715	0.718	0.688	0.709	0.712
U3	0.722	0.900	0.933	0.908	0.889	0.955	0.922	0.767
U2	0.901	0.888	0.866	0.879	0.894	0.893	0.901	0.888
U1	0.811	0.811	0.807	0.810	0.811	0.809	0.811	0.771

Table 5

Recall values on the ten UCI datasets. The best results are highlighted using bold. Symbol √ indicates the models with fuzzy. Symbol×means the models without fuzzy

#	FSVM	Fuzzy-SVM	FSVM-CIL	NF-SVM	F-SVM	T-SVM	Pin-TSVM	GAN
	(√)	(√)	(√)	(√)	(√)	(×)	(×)	(×)
U10	0.755	0.741	0.752	0.751	0.747	0.737	0.722	0.708
U9	0.729	0.724	0.722	0.723	0.722	0.707	0.721	0.709
U8	0.788	0.888	0.818	0.829	0.766	0.868	0.806	0.724
U7	0.823	0.787	0.723	0.777	0.807	0.807	0.811	0.777
U6	0.822	0.813	0.811	0.819	0.822	0.809	0.806	0.803
U5	0.788	0.755	0.769	0.711	0.766	0.665	0.777	0.717
U4	0.688	0.721	0.755	0.717	0.733	0.639	0.710	0.692
U3	0.767	0.888	0.877	0.876	0.875	0.872	0.866	0.777
U2	0.911	0.899	0.907	0.881	0.900	0.911	0.900	0.811
U1	0.833	0.801	0.822	0.806	0.811	0.820	0.777	0.787

Together, some observation can be obtained on the two synthetic and ten UCI datasets, (i) fuzzy improves classification capabilities of these models, especially on highly imbalanced dataset U9 and U10. (ii) Fuzzy assists these models to learn superior classification boundaries, including on imbalanced datasets complicated distributions, e.g., dataset S2.

(3) Running time

Figure 6 displays running time of the eight algorithms on the synthetic datasets. It can be seen that the ascendency of our algorithm in running-time is not as significant as its classification accuracy, since the calculation cost mainly spends label density, that is, Equations (11) spend time too much during estimating label density. Nevertheless, compared with these competitors, it is not the poorest in running-time. As data distributions become complexity, i.e., synthetic datasets S1 and S2 used the Gaussian distribution N(0,1), a random distribution, respectively, running time of the eight algorithms starts to augment.

Fig. 6

Running time of different algorithms. Synthetic datasets S1 and S2 used the Gaussian distribution N(0,1), a random distribution, respectively. Symbol √ indicates the algorithms with fuzzy, otherwise, they are marked as symbol×.

5.2 Discussions

(1) Insights

Compared with the seven competing methods, the proposed method has more ascendency in classifying imbalanced data. This is because the assistance provided by points for the training of SVM can be estimated by the fuzzy membership in Equations (2). By this doing, each instance points in the sample can be classified into corresponding classes. Consequently, the proposed method can gain superior classification accuracy on imbalanced datasets.

In addition to classification methods themself restrict the classification accuracy, the attributions of those datasets are another major factor, such as, imbalanced ratio, data dimensionality and data volume, etc. Additionally, noise is also a negative effect factor for classification accuracies because of the interference capabilities, indicating that noise can mask minority classes so that those classifiers are easily misled.

(2) Limitations

However, the proposed method also has disadvantages, for instance, the classification ability relies on the proposed fuzzy, which means that Equations (2) have pivotal effects on classification results. As such, time consumption of the training increases as data volume or data dimensionality augment. Once the need of using large-scale data trains our model, the convergence epochs may be increased. Nevertheless, this does not mean that our model does not converge, but the training epochs may become longer.

6 Conclusions

This paper proposed a novel fuzzy support vector machine to classify imbalanced datasets. Through calculating fuzzy membership of instance points, the assistance provided by points for the training of the support vector machine is assessed, which increases classification accuracy of the model. Results on the synthetic datasets with high imbalance ratio (IR = 9 : 1) show that the class accuracy of the proposed method is 0.888, and our method defeats comparative methods. Results on UCI datasets indicate that our method is 0.747 in terms of classification accuracy when the imbalanced ratio reaches 87.8, and significantly wins over competitors. We demonstrate that negative effects caused by the complexity of data distribution can be reduced by fuzzy. Moreover, those classifiers also learned superior classification boundaries by fuzzy. In future work, we will explore binary classification of highly imbalanced data in the presence of noise interference. Usually, real-world datasets contain the noise masking minority classes. Minority classes and noise can exist in any of subspaces of data spaces. Along with the increasing of the data dimensionality, the number of subspaces increases exponentially so that an exponential searching space is formed. To distinguish those classes from noise, selecting a subspace of highlighting class attributes is so difficult in an exponential searching space.

Declaration

The authors have no conflicts. The data in this work is available.

Data availability statement

The data can be used. https://www.ics.uci.edu/mlearn/MLRepository.html

Contributions

Qingling Wang and Jian Zheng proposed the method and wrote the manuscript. Qingling Wang and Wenjing Zhang designed and analyzed the experiments. Qingling Wang performed the source codes.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

Zhu

, Wang

, Li

, et al., Geometric structural ensemble learning for imbalanced problems[J], IEEE Transactions on Cybernetics 50(4) (2020), 1617–1629.

Nan Wang , Ruozhou Liang , Xibin Zhao , et al., Cost-Sensitive Hypergraph Learning With F-Measure Optimization[J], IEEE Transactions on Cybernetics 3 (2021), 1–12.

Malhotra

and Kamal

, An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data[J], Neurocomputing 343 (2019), 120–140.

Zhang

, Li

, Jia

X.D.

, et al., Machinery fault diagnosis with imbalanced data using deep generative adversarial networks[J], Measurement 152 (2020), 1–18.

Fernández

, García

, Galar

, et al., Learning from Imbalanced Data Sets[M], Springer, 2018.

Haixiang

, Yijing

, Shang

, et al., Learning from class-imbalanced data: Review of methods and applications[J], Expert System Application 73 (2017), 220–239.

Schleif

F.M.

and Tino

, Indefinite core vector machine.[J], Pattern Recognition 71 (2017), 187–195.

, Dong

, Liu

, et al., Varying coefficient support vector machines[J], Statist Probab Lett 132 (2018), 107–115.

Jian Zheng , Hongchun Qu , Zhaoni Li , et al., An irrelevant attributes resistant approach to anomaly detection in high-dimensional space using a deep hyper sphere structure[J], Applied Soft Computing 116 (2022), 1–20.

10.

Jian Zheng , Hongchun Qu , Zhaoni Li , et al., A deep hypersphere approach to high-dimensional anomaly detection[J], Applied Soft Computing 125 (2022), 1–17.

11.

Nekooeimehr

and Lai Yuen

S.K.

, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets[J], Expert System Application 46 (2016), 405–416.

12.

Xiaojie Li , Jiancheng Lv and Zhang Yi , An Efficient Representation-Based Method for Boundary Point and Outlier Detection[J], IEEE Transactions on Neural Networks and Learning System 29(1) (2018), 51–62.

13.

Lin Feng , Huibing Wang , Bo Jin , et al, Le Wang, Learning a Distance Metric by Balancing KL-Divergence for Imbalanced Datasets[J], IEEE Transaction on Systems, Man, and Cybernetics: Systems 49(12) (2019), 2384–2395.

14.

Douzas

, Bacao

and Last

, Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J], Information Science 465 (2018), 1–20.

15.

Douzas

and Bacao

, Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE[J], Information Science 501 (2019), 118–135.

16.

Mathew

, Pang

C.K.

, Luo

, et al., Classification of imbalanced data by oversampling in kernel space of support vector machines[J], IEEE Transactions Neural Networks Learning System 29(9) (2018), 4065–4076.

17.

Raghuwanshi

B.S.

and Shukla

, SMOTE based class-specific extreme learning machine for imbalanced learning[J], Knowledge Based System 187 (2020), 1–15.

18.

Tao

, Li

, Ren

, et al., Realvalue negative selection over-sampling for imbalanced data set learning[J], Expert System Applications 129 (2019), 118–134.

19.

Tao

, Li

, Guo

, et al., Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering[J], Information Science 519 (2020), 43–73.

20.

Maldonado

, López

and Vairetti

, An alternative SMOTE oversampling strategy for high-dimensional datasets[J], Applied Soft Computing 76 (2019), 380–389.

21.

Kovécs

, An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets[J], Applied Soft Computing 83 (2019), 1–15.

22.

Elreedy

and Atiya

A.F.

, A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance[J], Information Science 505 (2019), 32–64.

23.

Fernandez

, Garcia

, Herrera

, et al., SMOTE for learning from imbalanced data: Progress and challenges, marking the 15- year anniversary[J], J Artif Intell Res 61 (2018), 863–905.

24.

Yang

Z.M.

, Wu

H.J.

, Li

C.N.

, et al., Least squares recursive projection twin support vector machine for multi-class classification[J], International Journal of Machine Learning and Cybernetics 7(3) (2016), 411–426.

25.

, Yang

and Pan

, A novel twin support-vector machine with pinball loss[J], IEEE Transactions on Neural Networks and Learning System 28(2) (2017), 359–370.

26.

Goodfellow

, Pouget Abadie

, Mirza

, et al., Generative adversarial nets[C], Proc Adv Neural Inf Process Syst (1) (2014), 2672–2680.

27.

Odena

, Olah

and Shlens

, Conditional image synthesis with auxiliary classifier GANs[C], Proc Int Conf Mach Learn 70 (2017), 2642–2651.

28.

Ali Gombe

and Elyan

, MFC-GAN: Class-imbalanced dataset classification using multiple fake class generative adversarial network[J], Neurocomputing 361 (2019), 212–221.

29.

Junhai Zhai , Jiaxing Qi and Sufang Zhang , Binary Imbalanced Data Classification Based on Modified D2GAN Oversampling and Classifier Fusion[J], IEEE Access 8 (2020), 169456–169469.

30.

Uğur Erkan , A Precise and Stable Machine Learning Algorithm: Eigenvalue Classification (EigenClass)[J], Neural Computing and Applications 33(10) (2021), 5381–5392.

31.

Junhai Zhai , Jiaxing Qi and Sufang Zhang , Imbalanced data classification based on diverse sample generation and classifier fusion[J], International Journal of Machine Learning and Cybernetics 13(3) (2021), 735–750.

32.

Bhagat Singh Raghuwanshi and Sanyam Shukla , Generalized class-specific kernelized extreme learning machine for multiclass imbalanced learning[J], Expert Systems with Applications 121 (2019), 244–255.

33.

Sevakula

R.K.

and Verma

N.K.

, Compounding general purpose membership functions for fuzzy support vector machine under noisy environment[J], IEEE Transactions on Fuzzy Systems 25(6) (2017), 1446–1459.

34.

Jie Liu , Fuzzy support vector machine for imbalanced data with borderline noise[J], Fuzzy Sets and Systems 413(15) (2021), 64–73.

35.

Xiaoqing Gu , Tongguang Ni and Hongyuan Wang , New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification [J], Scientific World Journal, Special Issue, 2014:1–12.

36.

Xiaohong Fan and Zongyao He , A Fuzzy Support Vector Machine for Imbalanced Data Classification[C], 2010 International Conference on Optoelectronics and Image Processing, 2010, pp. 11–14, Haikou, China.

37.

Baihua Chen , Yuling Fan , Weiyao Lan , Jinghua Liu , Chao Cao and Yunlong Gao , Fuzzy support vector machine with graph for classifying imbalanced datasets[J], Neurocomputing 514 (2022), 296–312.

38.

Memisa

, Enginoğlub

and Erkanc

, Fuzzy Parameterized Fuzzy Soft-Nearest Neighbor Classifier[J], Neurocomputing 500 (2022), 351–378.

39.

Salim Rezvani , Xizhao Wang and Farhad Pourpanah , Intuitionistic Fuzzy Twin Support Vector Machines[J], IEEE ransactions on Fuzzy Systems 27(11) (2019), 2040–2151.

40.

Yotam Hechtlinger , Barnabés Póczos and Larry Wasserman , Cautious deep learning[J], arXiv preprint arXiv:1805.09460, 2019.

41.

Smola

AJ.

, Learning with kernels [M], Technical University of Berlin, 1998.

42.

Duda

D.S.R.O.

and Hart

P.E.

, Pattern classification[M], Springer, 2001.

43.

Blake

C.L.

and Merz

C.J.

, UCI Repository of Machine Learning Databases, Department of Information and Computer Science, 1998. [Online]. Available: https://www.ics.uci.edu/mlearn/MLRepository.html