A novel twin-support vector machines method for binary classification to imbalanced data

Abstract

Most existing classifiers are better at identifying majority classes instead of ignoring minority classes, which leads to classifier degradation. Therefore, it is a challenge for binary classification to imbalanced data, to address this, this paper proposes a novel twin-support vector machine method. The thought is that majority classes and minority classes are found by two support vector machines, respectively. The new kernel is derived to promote the learning ability of the two support vector machines. Results show that the proposed method wins over competing methods in classification performance and the ability to find minority classes. Those classifiers based-twin architectures have more advantages than those classifiers based-single architecture in classification ability. We demonstrate that the complexity of imbalanced data distribution has negative effects on classification results, whereas, the advanced classification results and the desired boundaries can be gained by optimizing the kernel.

Keywords

Binary classification imbalanced data support vector machine

1 Introduction

Data classification is an important links in the field of data mining. Currently, data classification has been successfully used in image recognition, medical diagnosis, digital recognition etc. Usually, classification includes binary classification and multi-classification. Although there are many mature methods and models in the field of classification research, those classification methods and those classifiers are better at classifying on balanced distributions or assume that the data is balanced distribution. Those classification methods and those classifiers may drop in classification performance once they suffer from an imbalanced data distribution.

Usually, the data is imbalanced distribution in applications, for example, the proportion of normal electricity consumption data is much larger than that of abnormal electricity consumption data in the electricity consumption data. Therefore, this type of datasets is often referred to as imbalanced datasets. Imbalanced datasets are composed of majority classes and minority classes. Since the number of minority classes is much smaller than that of majority classes [1], minority classes can hide in any subspace so that they are difficulty discovered [2, 3]. Most existing those classifiers are better at finding the majority class instead of ignoring the minority class, which results in classifier degradation. Furthermore, Class imbalance may cause that the minority class is masked by the majority class [4, 5], thereby difficulty finding suitable classification regions to distinguish minority classes from majority classes. Consequently, binary classification on imbalanced data is a tough task.

Some efforts have been presented to address the classification of imbalanced data, such as (i) resampling-based methods, which apply under-sampling techniques and oversampling techniques to balance classes. And (ii) cost-sensitive learning-based methods, which focus on the costs associated with misclassifying samples. And (iii) ensemble learning-based methods, which improve classification accuracies through combining several classifiers. Additional, deep network architectures-based methods are also used to handle the issue of imbalanced classification, due to the high learning capacity. However, these classification methods tend to favor majority classes in order to gain high classification score, clearly, the classification result obtained in this way is unfair.

The main contributions of this paper are summarized, as follows.

A novel twin-support vector machines method is proposed for binary classification on imbalanced data, and the new kernel is derived.

For imbalanced data, the complexity of data distribution has negative effects on classification results, however, through using optimizing kernels, those advanced classification results can be gained and those desired boundaries are learned.

The rest of this paper is organized as following. Section 2 summarizes the related work. Section 3 presents the proposed model, including the model implementation, the derivation of the kernel and the model training. Experiments are given in Section 4. Section 5 presents and discusses the results. Section 6 draws a conclusion and directs the future work.

2 Related work

For a binary classification task, classifiers pay more attention to the accuracy of the majority class and tend to classify unknown instances into the majority class [6]. Such classification operation can improve the accuracy of sample classification, but little attention is paid to minority class instances, or may ignore minority class instances. In fact, these operations are unreasonable to imbalanced data, since the finding of minority classes has greater valuable, for example, the identifying of cancer cells and normal cells in medical filed, generally, the number of cancer cells (the minority class) is is more concerned than that of normal cells (the majority class). Usually, classification methods of imbalanced data can be classified into sampling technique-based methods and machine learning-based methods [7].

Sampling technique-based methods include under-sampling techniques and oversampling techniques, where under-sampling techniques are to balance the minority class by reducing the number of majority class instances. The principle is to discard majority class instances with random manners so as to balance the number between the majority class and the minority class [8], however, random discarding may disrupt the distribution of the data. To reduce the classification loss caused by random discarding, the [9] used clustering method to extract features of the data, and then the classifier is trained by these extracted features, unfortunately, there is still the possibility to loss data features. The [10] selected majority class samples and minority class samples by calculating sample sensitivity. The [11] used the neighborhood density to estimate the number of classified samples. These methods in [10] and [11] compensate for the classification loss caused by random discarding, but they still do not focus on minority classes. Unlike under-sampling techniques, oversampling techniques are to balance class distributions by replicating minority class instances until they can reach the number of majority class instances. For example, these oversampling techniques in [12 –15] achieve ideal classification results, but they can only learn on specific decision regions of the replicated data, which limit the learning ability of classifiers.

Machine learning-based methods. A typical representation is SVM (Support Vector Machine). Indeed, SVM has an outstanding advantage, i.e., it is the only solution once the optimal solution is obtained. Unfortunately, SVM-based architectures suffer from the problem of being only able to learn on specific decision regions [16]. To address this, therefore, the improved models based on SVM architectures are proposed, e.g., TPMSVM [17], THSVM [18] and VTHSVM [18]. These improved models based on SVM architectures have achieved great success in classification ability, whereas, for nonlinear cases they have to reconstruct the optimization problem by considering the surface generated by the kernel [19], increasing calculation complexity, meanwhile, difficult maintaining the precision of solutions. In addition, neural network architectures-based methods are also commonly used for classification of imbalanced data, such as feedback neural network methods. It is easy to fall into a local optimal solution using the gradient descent method in neural networks, so that using neural network structures (in the case of without optimizing them) does not easily gain ideal classification results. By constructing the loss function (can be considered as an optimization), Lee [20] et al improves the classification ability to neural networks

The motivation of this paper is to implement binary classification for imbalanced data. Here, this paper proposes a novel twin-support vector machines method to achieve it. The thought of the method is to construct two support vector machines on imbalanced data, of which one finds majority classes and the other finds minority classes. Furthermore, the kernel is derived to enhance the learning ability of the two support vector machines.

3. Methodology

3.1 Preliminary

For a binary classification task, SVM gains the maximum classification margin by using hyperplanes. Given dataset D = {(x₁, y₁) , . . . , (x_i, y_i)}, $x_{i} \in R^{m}$ , $y_{i} \in {- 1, 1}, i = 1, 2, \dots, R^{m}$ is the Euclidean space. SVM can be defined as follows $\begin{matrix} min_{w, b} \frac{1}{2} | | w | |^{2} \\ s . t . y_{i} (w^{T} x_{i} + b) ⩾ 1 (i = 1, 2, . . ., m) \end{matrix}$ (1)

Where b is bias and w is weight. w is weight vector consisting of w. Equation (1) can be converted into Equation (2) by Lagrangian function. $\begin{matrix} f (x) = w^{T} x + b \\ = \sum_{i = 1}^{m} α_{i} y_{i} x_{i}^{T} x + b \end{matrix}$ (2)

Where α_i > 0 is the Lagrange multiplier. Equation (2) can be converted into a dual quadratic programming $\begin{matrix} f (x) = w^{T} φ (x) + b \\ = \sum_{i = 1}^{m} α_{i} y_{i} φ (x_{i})^{T} φ (x_{j}) + b \\ = \sum_{i = 1}^{m} α_{i} y_{i} κ (x_{i}, x_{j}) + b \end{matrix}$ (3)

Where φ () is a mapping function. κ (x_i, x_j) = φ (x_i) ^T φ (x_j) is a kernel function.

To describe the proposed method in detail, some definitions and lemmas are given in advance.

Definition 1. A binary classification task. Given imbalanced dataset χ = {x₁, x₂, . . . , x_i} and $x_{i} \in R^{m}$ . A binary classification task is to train a model Ω on dataset χ and to reinforce the learning of class labels by using Ω. The formal description is Ω : χ → Y, where the output domain Y = {P₊₁, N_-1} consists of majority class labels P₊₁ and minority class labels N_-1. Binary classification for imbalanced data is achieved by learning class labels.

Lemma 1 [21]. Mercer theorem demonstrates that any semi-positive definite symmetric function can be used as a kernel function.

Lemma 2 [22, 23]. Let x be a nonempty set. A kernel $f : (x \times x) \to R$ is called a positive definite kernel if f is symmetric and $\sum_{i, j = 1}^{n} c_{i} c_{j} f (x_{i}, x_{j}) ⩾ 0$ for all on n ∈ N, {x₁, . . . , x_n} ∈ χ and ${c_{1}, . . ., c_{n}} \in R$ .

Lemma 3 [23, 24]. A well-known closure property of p.d (positive definite) kernels on a nonempty set, i.e. if two kernels K₁, K₂ are positive definite kernels, then so is k₁ ∘ k₂, and therefore $K_{1}^{n}$ for all n ∈ N.

3.2 The proposed model

3.2.1 Model implementation

Using Equation (1) creates two support vector machines H(+) and H(-) to find majority classes and minority classes, respectively. The proposed model, namely TW-SVM, is given in Equations (5) $\begin{matrix} H (+) min_{w_{+ 1}, b_{+ 1}} \frac{1}{2} \sum {(w_{+ 1}^{T} x_{i} + b}_{+ 1})^{2} \\ s . t . - (w_{+ 1}^{T} x_{i} + b_{+ 1} ⩾ 1) ⩾ 1 \end{matrix}$ (4) $\begin{matrix} H (-) = min_{w_{- 1}, b_{- 1}} \frac{1}{2} \sum {(w_{- 1}^{T} x_{j} + b}_{- 1})^{2} \\ s . t . w_{- 1}^{T} x_{j} + b_{- 1} ⩾ 1 \end{matrix}$ (5)

Equations (5) can be converted by introducing Lagrange multiplier α_i > 0 and α_j > 0. As follows $\begin{matrix} max \sum α_{i} - \frac{1}{2} \sum α_{i_{1}} α_{i_{2}} δ_{i_{1}}^{T} (\sum δ_{i} δ_{i}^{T})^{- 1} δ_{i_{2}} \\ s . t . 0 ⩽ α_{i} \end{matrix}$ (6) $\begin{matrix} max \sum α_{j} - \frac{1}{2} \sum α_{j_{1}} α_{j_{2}} δ_{j_{1}}^{T} (\sum δ_{j} δ_{j}^{T})^{- 1} δ_{j_{2}} \\ s . t . 0 ⩽ α_{j} \end{matrix}$ (7)

Where $δ_{h} = (x_{h}^{T}, 1)^{T}$ . Similarly, Equations (7) can be converted into a dual quadratic programming. $f (w_{+ 1}^{T}, b_{+ 1}) = - \sum α_{i} δ_{i}^{T} (\sum δ_{j} δ_{j}^{T})^{- 1}$ (8) $f (w_{- 1}^{T}, b_{- 1}) = \sum α_{j} δ_{j}^{T} (\sum δ_{i} δ_{i}^{T})^{- 1}$ (9)

Using Equation (3) converts Equations (9) into kernel function form, having that $\begin{matrix} H (+) = min_{w_{+ 1}, b_{+ 1}} \frac{1}{2} \sum_{i \in P_{+ 1}} {(\sum_{κ \in (P_{+ 1} \cup N_{- 1})} w_{+ 1} κ (x_{κ}, x_{i}) + b_{+ 1})}^{2} \\ s . t . - \sum_{κ \in (P_{+ 1} \cup N_{- 1})} w_{+ 1} κ (x_{κ}, x_{i}) - b_{+ 1} ⩾ 1 \end{matrix}$ (10) $\begin{matrix} H (-) = min_{w_{- 1}, b_{- 1}} \frac{1}{2} \sum_{j \in N_{- 1}} {(\sum_{κ \in (P_{+ 1} \cup N_{- 1})} w_{- 1} κ (x_{κ}, x_{j}) + b_{- 1})}^{2} \\ s . t . \sum_{κ \in (P_{+ 1} \cup N_{- 1})} w_{- 1} κ (x_{κ}, x_{j}) + b_{- 1} ⩾ 1 \end{matrix}$ (11)

The decision function of TW-SVM is given in Equation (12), where ||||₂ is norm-2. ${\begin{matrix} F (x) = arg min {s_{+ 1} (H (+)), s_{- 1} (H (-))} \\ s_{+ 1} (H (+)) = \frac{w_{+ 1}^{T} x + b_{+ 1}}{| | w_{+ 1} | |_{2}} \\ s_{- 1} (H (-)) = \frac{w_{- 1}^{T} x + b_{- 1}}{| | w_{- 1} | |_{2}} \end{matrix}$ (12)

For an unknown point x, it is calculated using decision function F(x) in Equation (12). According to the calculated results of F(x), it can be determined whether an unknown point x should be classified to the majority class or the minority class. For instance, if point x is closer to support vector machine H(+) in Equation (10), point x is classified to the majority class. Similarly, if point x is closer to support vector machine H(-) in Equation (11), point x should be classified into the minority class.

Algorithm 1. Training of the model

3.2.2 Derivation of the kernel

The learning ability of TW-SVM depends on the kernel κ (,) in Equations (11), which is used to separate the majority class and the minority class. Here, the kernel κ (,) is derived according to Mercer theorem. (Please see Lemma 1 in Section 3.1). Mercer theorem has demonstrated that any semi-positive definite symmetric function can be used as a kernel function. Therefore, a positive definite kernel (please see Lemma 2 in Section 3.2) is considered through calculating the cumulative distribution function [25], as follows $κ_{cdf} = (1 - (1 - x_{1}^{γ 1})^{γ 2}, 1 - (1 - x_{2}^{γ 1})^{γ 2} | γ 1, γ 2)$ (13)

Where γ1, γ2 are the non-negative kernel parameters.

In addition, the Matern52 kernel in [41] is also considered, which is a continuous positive definite kernel, as following

$\begin{matrix} κ_{Matern 52} = φ_{k} (1 + \sqrt{k_{1} γ^{2} (x_{1}, x_{2})} + k_{2} γ^{2} (x_{1}, x_{2})) \\ exp {- \sqrt{k_{3} r^{2} (x_{1}, x_{2})}} \end{matrix}$ (14)

Where φ_k, γ² are kernel parameter and kernel radius, respectively. k₁, k₂ and k₃ are a constant. The Matern52 kernel can allow the radius to be warping concave and non-decreasing so that more areas with small radii can be observed [25, 26]. This implies that those local regions containing minority classes can be better explored. Consequently, the kernel κ (,) of TW-SVM consists of κ_cdf in Equation (13) and κ_Matern52 in Equation (14), as follows $κ (,) = κ_{cdf} \circ κ_{Matern 52}$ (15)

κ_cdf and κ_Matern52 are a positive definite kernel. According to a well-known closure property of p.d (please see Lemma 3 in Section 3.1), the derived kernel κ (,) is a positive definite kernel.

3.2.3 Model training

Algorithm 1 displays the training of TW-SVM. Inputting training set Train_set, the TW-SVM outputs the training accuracy TrainAcc(q_max) and the learned majority class labels {P₊₁, …, P₊₁} and minority class labels {N_-1, …, N_-1} once it is well trained. For each point x_i in Train_set, it is calculated by using F (x_i) in Equation (12), and then according to the calculated result of F (x_i), i.e., R (x_i, q), point x_i is classified into the corresponding classes, illustrated in the procedure between Step 3 and Step 15. The process between Step 6 and Step 9 shows that if R (x_i, q) is closer to H(+) in Equation (10), point x_i is classified into majority classes and gains a majority class label {P₊₁}. Otherwise, point x_i should be classified into minority classes and gains a minority class label {P₊₁}, illustrated in the procedure between Step 10 and Step 14. Thereafter, the training accuracy TrainAcc(q) is obtained in Step 16. Whole procedure between Step 1 and Step 17 shows that TW-SVM is iteratively trained until it convergences. Finally, TW-SVM sends out the maximum training accuracy and the learned class labels, illustrated in Step 18 and Step 19.

4 Experiments

4.1 Datasets

To test the proposed TW-SVM, two imbalanced datasets were synthesized, as shown in Table 1. For the two synthesized imbalanced datasets S1, S2, the imbalanced ratio (IR) of majority classes and minority classes is 5 : 1. For dataset S1, it contains 300 training points and 300 testing points. These points in majority classes and in minority classes were randomly generated using the normal distribution (N(0, 0.1)), (N(0, 0.5)), respectively, and the overlap of these two classes is less than 30%, illustrated in Fig. 1(a). As for dataset S2, it contains 350 training points and 350 testing points. These points in majority classes and in minority classes were generated by the mixture of Gaussian distribution, and the overlap of these two classes reaches about 90%, as shown in Fig. 1(b).

Table 1
Synthetic datasets

Datasets Training points Testing points Data dimension IR

S1 300 300 2 5 : 1

S2 350 350 2 5 : 1

Datasets	Training points	Testing points	Data dimension	IR
S1	300	300	2	5 : 1
S2	350	350	2	5 : 1

Fig. 1

Synthetic datasets. Yellow circles are marked as majority class points. Purple circles are marked as minority class points.

To further test the model, eight UCI datasets used in classification tasks were selected from the perspective of different data dimensions and imbalanced ratio, and the detailed descriptions are given in Table 2.

Table 2

UCI datasets

Datasets	Training points	Testing points	Data dimension	IR
Banana	400	4900	2	5.11 : 1
Energy	615	153	8	15 : 1
Heart	170	100	13	1.25 : 1
Spect	214	53	44	3.94 : 1
Musk	380	96	166	1.30 : 1
Sonar	166	42	60	1.14 : 1
WDBC	400	169	34	3.21 : 1
Ionosphere	200	151	34	1.79 : 1

4.2 Competitors and assessment metrics

The proposed TW-SVM is a model based on SVM architectures, to have fairness, so the three models based on SVM architectures are used as comparisons, i.e., TPMSVM [19], THSVM [20] and VTHSVM [20], and the defaults parameters of competing models in the corresponding literature were used. The corresponding algorithms of the four models were implemented by using Python on Tensorflow framework in Linux system, and the experiments were run using the same setting.

In terms of assessment metrics, Accuracy is used as an evaluation metric for the classification ability of the four models, as follows $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (16)

Where TP, TN, FP, FN are the number of true positive, false negative, false positive and false negative, respectively. F1-score is also used as an evaluation metric, as follows $F 1 - score = \frac{2 TP}{2 TP + FP + FN}$ (17)

To compare the ability to find minority classes for the four models, Sensitive metric is used, as follows $Sensitive = \frac{TP}{TP + FN}$ (18)

5 Results and discussion

5.1 Results on the synthetic datasets

Results in Fig. 2 show that TW-SVM outperforms TPMSVM, THSVM and NHSVM in classification performance. On synthetic datasets S1 and S2, the classification accuracies of TW-SVM are 95.12% and 91.33%, respectively. It can be seen that TW-SVM has clear classification advantages on the two synthetic datasets. Moreover, TW-SVM also wins over the competitors on F1-score metric. While for Sensitive metric, results in Fig. 3 show that the ability of TW-SVM to find minority classes is better than that of the three competitors. This implies that TW-SVM is more adaptable to be suitable for binary classification on complex imbalanced datasets.

Fig. 2

Classification performance. Using Accuracy and F1-socre metrics evaluate the four models.

Fig. 3

Comparisons of finding minority classes. Using Sensitive metric evaluates the four models.

Figure 4 visualized these classification results. On synthetic dataset S1 and S2, TW-SVM gains desired classification results, meanwhile, also learns advanced classification boundaries. Together, these results gained by TW-SVM on datasets S1 and S2 outperform these gotten by the three competing methods.

Fig. 4

Visualization of classification results. Yellow circles are marked as majority class points. Purple circles are marked as minority class points. Black lines are the boundaries learned by the four models.

5.2 Results on the real datasets

To eliminate the effects of randomness, we run 100 times on these eight real datasets, independently, the average is taken. Results in Table 3 show that TW-SVM gains advanced classification results on seven datasets, except for dataset Spect. Particularly, on highly imbalanced ratio dataset Energy (i.e., IR = 15 : 1), our TW-SVM has outstanding advantage over the three competitors. Certainly, on high-dimensional datasets Musk (dimension = 166) and Sonar (dimension = 66), our TW-SVM also wins over the three competitors. Unfortunately, on dataset Spect, the classification results obtained by the three competitors are better than these by TW-SVM, but the advantages are not significant. Overall, the classification ability of our TW-SVM is better than that of the competing TPMSVM, THSVM and NHSVM on imbalanced datasets.

Table 3
Classification results on UCI datasets. The best results are highlighted using bold

Datasets Metrics TW-SVM TPMSVM THSVM NTHSVM

Banana Accuracy 0.9022±0.003 0.8911±0.001 0.8941±0.006 0.8834±0.016

F1-score 0.8333 0.8022 0.8111 0.7889

Energy Accuracy 0.8027±0.021 0.7509±0.002 0.7733±0.017 76.83±0.022

F1-score 0.8098 0.7643 0.7723 0.7882

Heart Accuracy 0.8707±0.011 0.8433±0.028 0.8511±0.039 0.8655±0.022

F1-score 0.8144 0.7339 0.8022 0.8038

Spect Accuracy 0.7513±0.082 0.7504±0.021 0.7566±0.058 0.7577±0.029

F1-score 0.7708 0.8107 0.7723 0.8082

Musk Accuracy 0.9801±0.059 0.9637±0.021 0.9707±0.019 0.9667±0.023

F1-score 0.9811 0.9633 0.9791 0.9762

Sonar Accuracy 0.9777±0.009 0.9705±0.021 0.9702±0.012 0.9711±0.009

F1-score 0.9788 0.9707 0.9709 0.9718

WDBC Accuracy 0.9799±0.016 0.9005±0.012 0.9677±0.014 0.9552±0.016

F1-score 0.9609 0.9200 0.9371 0.9364

Ionosphere Accuracy 0.9433±0.039 0.9066±0.030 0.9233±0.042 0.9262±0.037

F1-score 0.9500 0.9009 0.9016 0.9411

Datasets	Metrics	TW-SVM	TPMSVM	THSVM	NTHSVM
Banana	Accuracy	0.9022±0.003	0.8911±0.001	0.8941±0.006	0.8834±0.016
	F1-score	0.8333	0.8022	0.8111	0.7889
Energy	Accuracy	0.8027±0.021	0.7509±0.002	0.7733±0.017	76.83±0.022
	F1-score	0.8098	0.7643	0.7723	0.7882
Heart	Accuracy	0.8707±0.011	0.8433±0.028	0.8511±0.039	0.8655±0.022
	F1-score	0.8144	0.7339	0.8022	0.8038
Spect	Accuracy	0.7513±0.082	0.7504±0.021	0.7566±0.058	0.7577±0.029
	F1-score	0.7708	0.8107	0.7723	0.8082
Musk	Accuracy	0.9801±0.059	0.9637±0.021	0.9707±0.019	0.9667±0.023
	F1-score	0.9811	0.9633	0.9791	0.9762
Sonar	Accuracy	0.9777±0.009	0.9705±0.021	0.9702±0.012	0.9711±0.009
	F1-score	0.9788	0.9707	0.9709	0.9718
WDBC	Accuracy	0.9799±0.016	0.9005±0.012	0.9677±0.014	0.9552±0.016
	F1-score	0.9609	0.9200	0.9371	0.9364
Ionosphere	Accuracy	0.9433±0.039	0.9066±0.030	0.9233±0.042	0.9262±0.037
	F1-score	0.9500	0.9009	0.9016	0.9411

5.3 Discussion

Advantages. These results on synthetic and UCI datasets indicate that the classification ability of our TW-SVM outperforms that of competitors TPMSVM, THSVM, NHSVM. This is because the decision function F(x) in Equation (12) can calculate whether an unknown point is closer to the SVM H(+) in Equation (10) or is closer to the SVM H(-) in Equation (11), so that the SVM H(+) accurately learns majority class points. Similarly, the SVM H(-) also has the ability to accurately learn minority class points. More importantly, through using the derived kernel in Equation (15), those minority classes of hard being found are accurately explored and those advanced classification boundaries are also learned, together. As such, TW-SVM shows excellent ability in classifying imbalanced data.

Limitations. Certainly, our TW-SVM also has disadvantages. In the process of classification data, we do not consider the noise hidden in the datasets. Noise easily creates interference but our TW-SVM exist a flaw in resisting noise. Not only our model, but also the three competing models TPMSVM, THSVM, NHSVM have this flaw, as well as, these models mentioned in Section related work. Indeed, noise interference is a tough task, besides of data classification, also including other fields, such as anomaly detection, the [27] and the [28] indicate that noise interference is the core issue in the curse of dimensionality for classification data.

6 Conclusion

This paper proposes twin-support vector machines for binary classification to imbalanced data. The principle is that by constructing two support vector machines H(+) and H(-), the SVM H(+) learns majority class points and is as far away as possible from minority class points. Similarly, the SVM H(-) also adopts this learning manner. To improve the ability to find minority classes, the kernel was derived for the two support vector machines. Results show that the proposed method is superior to the mainstream classification methods in terms of classification ability and finding minority classes for imbalanced datasets. Results also show that those classifiers-based twin architectures have more advantages than those classifiers-based single architecture. Although the complexity of imbalanced data distribution has negative effects on classification results, those advanced classification results can be gained and those desired boundaries are also learned through optimizing the kernel. In future work, we will look at exploring binary classification methods for imbalanced data while noise disturbs.

Footnotes

Acknowledgments

This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission under Grant KJQN20190240.

Declarations

Competing interests

The authors have no conflicts of this article.

Consent for publication

The authors agree with availability of data and material.

Contributions

Jingyi Li proposed the thought and wrote the original manuscript. Shiwei Chao designed and performed the experiments. Jingyi Li and Shiwei Chao analyzed the experimental results

References

Yitian Xu , , Maximum Margin of Twin Spheres Support Vector Machine for Imbalanced Data Classification[J], IEEE Transactions on Cybernetics 47(6) (2017), 1540–1550.

Tingting Zhou , Wei Liu , Congyu Zhou , et al., GAN-Based Semi-supervised For Imbalanced Data Classification[C], 2018 4th IEEE International Conference on Information Management, 2018:17–21.

Zhang

, Gao

, Song

, et al., An unbalanced data classification algorithm of improved auto encoder neural network[C], Eighth International Conference on Advanced Computational Intelligence, 2016:95–99.

Xiaolin Huang , Johan Suykens

A.K.

, Shuning Wang , et al., Classification With Truncated L1 Distance Kernel[J], IEEE Transactions on Neural Networks and Learning Systems 29(5) (2018), 2025–2030.

Qing Ai and Anna Wang , A Novel Feature Weighted Twin-hypersphere Support Vector Machine for Pattern Recognition[C], IEEE 7th Data Driven Control and Learning Systems Conference, 2018:676–681.

ZhengwuYuan and Pu Zhao , An Improved Ensemble Learning for Imbalanced Data Classification[C], IEEE 8th Joint International Information Technology and Artificial Intelligence Conference, 2019:408–411.

JianJun Cao , GuoJun Lv , Chen Chang , et al., A Feature Selection Based Serial SVM Ensemble Classifier[J], IEEE Access 7 (2019), 144516–144523.

Prusa

, Khoshgoftaar

T.M.

, Dittman

D.J.

, et al., Using random under-sampling to alleviate class imbalance on tweet sentiment data[C], Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration, 2015:197–202.

Yen

and Lee

, Cluster based under-sampling approaches for imbalanced data distributions[J], Expert Systems with Applications 36(3) (2009), 5718–5727.

10.

W.W.

, Hu

, Yeung

D.S.

, et al., Diversified sensitivity based under-sampling for imbalance classification problems[J], IEEE Transactions on Cybernetics 45(11) (2017), 2402–2412.

11.

Liu Yueting , Imbalanced dataset classification algorithm based on NDSVM[C], Journal of Physics: Conference Series 1871(1) (2021), 1–4.

12.

Zhu Zhengyi , Satten

G.A.

, Mitchell

, et al., Constraining PERMANOVA and LDM to within-set comparisons by projection improves the efficiency of analyses of matched sets of microbiome data[J], Microbiome 9(133) (2021), 1–19.

13.

Luo Zhengbo , Hamïd

, Harish

, et al., Dealing with Imbalanced Dataset Leveraging Boundary Samples Discovered by Support Vector Data Description[J], Computers, Materials & Continua 66(3) (2021), 2691–2708.

14.

Amit

and Chinmay

, Early and accurate prediction of diabetics based on FCBF feature selection and SMOTE[J], International Journal of System Assurance Engineering and Management 136(8) (2021), 1–15.

15.

Li Xingqiu , Jiang Hongkai , Liu Shaowei , et al., A unified framework incorporating predictive generative denoising autoencoder and deep Coral network for rolling bearing fault diagnosis with unbalanced data[J], Measurement 178 (2021), 1–1.

16.

Erfani

S.M.

, Rajasegarar

, Karunasekera

, et al., High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning[J], Pattern Recognition 58 (2016), 121–134.

17.

Peng

, TPMSVM: A novel twin parametric-margin support vector machine for pattern recognition[J], Pattern Recognition 44(10) (2011), 2678–2692.

18.

Yitian Xu , , Maximum Margin of Twin Spheres Support Vector Machine for Imbalanced Data Classification[J], IEEE Transactions on Cybernetics 47(6) (2017), 1–11.

19.

Liming Liu , Maoxiang Chu , Rongfen Gong , et al., An Improved Nonparallel Support Vector Machine[J], IEEE Transactions on Neural Networks and Learning System 32(11) (2021), 5129–5143.

20.

Lee

, Kim

J.Y.

, Lee

M.H.

, et al., Imbalanced Loss-Integrated Deep-Learning-Based Ultrasound Image Analysis for Diagnosis of Rotator-Cuff Tear[J], Sensors 21(6) (2021), 1–19.

21.

Degang Chen , Hengyou Wang and Eric Tsang

C.C.

, Generalized Mercer Theorem and its application to feature space related to indefinite kernels[C], International Conference on Machine Learning and Cybernetics 2008:1–5.

22.

Sadeep Jayasumana , Richard Hartley , Mathieu Salzmann , et al., Optimizing Over Radial Kernels on Compact Manifolds [C], Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014:3802–3809.

23.

Schoenberg

I.J.

, Positive Definite Functions on Spheres[M]. Springer, 1942.

24.

Berg

, Christensen

J.P.R.

and Ressel

, Harmonic Analysis on Semigroups[M]. Springer, 1984.

25.

Jayasumana

, Hartley

, Salzmann

, et al., Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices[C], In CVPR, 2013:1–1.

26.

Snoek , Jasper , Swersky , et al., Ryan. Input warping for bayesian optimization of non-stationary functions[C], International Conference on Machine Learning 2014:1674–1682.

27.

Jian Zheng , Hongchun Qu , Zhaoni Li a , et al., A deep hypersphere approach to high-dimensional anomaly detection[J], Applied Soft Computing 125 (2022), 1–17.

28.

Jian Zheng , Hongchun Qu , Zhaoni Li a , et al., An irrelevant attributes resistant approach to anomaly detection in high-dimensional space using a deep hyper sphere structure[J], Applied Soft Computing 116 (2022), 1–20.