A novel second-order cone programming support vector machine model for binary data classification

Abstract

The support vector machine is a classification approach in machine learning. The second-order cone optimization formulation for the soft-margin support vector machine can ensure that the misclassification rate of data points do not exceed a given value. In this paper, a novel second-order cone programming formulation is proposed for the soft-margin support vector machine. The novel formulation uses the l₂-norm and two margin variables associated with each class to maximize the margin. Two regularization parameters α and β are introduced to control the trade-off between the maximization of margin variables. Numerical results illustrate that the proposed second-order cone programming formulation for the soft-margin support vector machine has a better prediction performance and robustness than other second-order cone programming support vector machine models used in this article for comparision.

Keywords

Support vector machine second-order cone programming binary data classification

1 Introduction

The support vector machine (SVM) proposed by Cortes and Vapnik [1] is one of the popular classification approaches in machine learning. In recent years, the support vector machine has an explosive development in science and technology [2 –5].

The second-order cone programming (SOCP) problem is a family of structural optimization problem, which can be applied to the support vector machine. Thus, some scholars begin to make a study for the second-order cone programming support vector machines. The hard-margin support vector machine is written as a second-order cone programming support vector machine [6] that has received a lot of attention within support vector machine community. A quantum algorithm is proposed to solve the second-order cone programming problem in paper [7]. What’s more, the quantum algorithm is applied for the second-order cone programming support vector machine. In addition, the second-order cone programming formulation is extended for twin support vector machine [8], which constructs two nonparallel classifiers by solving two quadratic chance-constrained programming problems. The second-order cone programming formulations can also be used for the imbalanced data classification [9] and multivariate classification [10].

The cost-sensitive learning is an efficient classification approach, which considers the misclassification cost of a classifier. The total misclassification cost is divided into two parts, each part for each class, in cost-sensitive support vector machine [11]. Two different cost factors are used for each class respectively. Moreover, in paper [12], the second-order cone programming formulation for support vector machine uses two margin variables associated with each class to maximize the margin and introduces a parameter for controlling the maximization within these margin variables.

The second-order cone programming soft-margin support vector machine formulation presented in papers [13] and [14] has as many second-order cone constraints as training samples, resulting in that the classification models are computationally very expensive in terms of running time. This paper studies the second-order cone programming formulation for the soft-margin support vector machine. We divide the soft-margin errors into two parts. Moreover, a novel second-order cone programming soft-margin support vector machine formulation with two second-order cone constraints is presented. Compared with other second-order cone optimization methods for the soft-margin support vector machine, the novel model has the less number of second-order cone constraints. The experimental results show that the proposed method has a better classification results than the compared classification methods.

The paper is structured as follows: In Section 2, we introduce the second-order cone programming formulation deduced from the hard-margin support vector machine. The new second-order cone programming support vector machine is presented in Section 3. Section 4 provides experimental results. A conclusion of this paper is shown in Section 5.

2 The second-order cone programming support vector machine

In this section, we will introduce the second-order cone programming support vector machine formulation [6, 8].

Given a set of training points and their respective labels (x_i, y_i), where y_i ∈ {+1, - 1} and $x_{i} \in R^{n}, i = 1, 2, \dots, m$ . The hard-margin support vector machine, developed by Cortes and Vapnik [1], is aimed to find an optimal hyperplane f (x) = w^Tx + b that separates two classes by solving the following quadratic programming problem:

$\begin{matrix} min_{w, b} & \frac{1}{2} ∥ w ∥^{2} \\ s . t . & y_{i} (w^{T} x_{i} + b) \geq 1, & i = 1, \dots, m . \end{matrix}$ (1)

The second-order cone programming support vector machine constructs a maximum margin classifier such that the false positive and the false negative error rates do not exceed 1 - η₁ ∈ (0, 1) and 1 - η₂ ∈ (0, 1). Let the random vector X₁ and X₂ be associated to the positive class and negative class, respectively. Suppose the mean and covariance matrix of the random vector X_i are $μ_{i} \in R^{n}$ and $Σ_{i} \in R^{n \times n}$ , i = 1, 2. The hard-margin support vector machine can be written as the following quadratic chance-constrained programming problem:

$\begin{matrix} min_{w, b} & \frac{1}{2} ∥ w ∥^{2} \\ s . t . & \Pr {w^{T} X_{1} + b \geq 1} \geq η_{1} \\ \Pr {w^{T} X_{2} + b \leq - 1} \geq η_{2} \\ X_{1} \sim (μ_{1}, Σ_{1}), X_{2} \sim (μ_{2}, Σ_{2}) . \end{matrix}$ (2)

The above formulation suggests that we can classify each pattern correctly. The correct rate is not lower than η_i, i = 1, 2. The above probability constraint can be replaced with their robust counterparts [15], so the above quadratic chance-constrained programming problem can be written as:

$\begin{matrix} min_{w, b} & \frac{1}{2} ∥ w ∥^{2} \\ s . t . & inf_{X_{1} \sim (μ_{1}, Σ_{1})} \Pr {w^{T} X_{1} + b \geq 1} \geq η_{1} \\ inf_{X_{2} \sim (μ_{2}, Σ_{2})} \Pr {w^{T} X_{2} + b \leq - 1} \geq η_{2} . \end{matrix}$ (3) Then, applying the multivariate Chebyshev inequality [15], the following quadratic second-order cone programming problem can be deduced:

$\begin{matrix} min_{w, b} & \frac{1}{2} ∥ w ∥^{2} \\ s . t . & w^{T} μ_{1} + b \geq 1 + κ_{1} ∥ S_{1}^{T} w ∥ \\ - (w^{T} μ_{2} + b) \geq 1 + κ_{2} ∥ S_{2}^{T} w ∥, \end{matrix}$ (4)

where $κ_{i} = \sqrt{\frac{η_{i}}{1 - η_{i}}}$ , $Σ_{i} = S_{i} S_{i}^{T}, i = 1, 2$ .

A second-order cone (SOC) constraint on the variable $x \in R^{n}$ has the following form:

$∥ Ax + b ∥ \leq c^{T} x + d,$

where $A \in R^{n \times n}$ , $d \in R$ and $b, c \in R^{n}$ . Thus, a linear SOCP formulation with three SOC constraints can be deduced from formulation (4) by introducing a new variable t and a constraint ∥w ∥ ≤ t. A linear SOCP formulation can be solved by some SOCP solvers, such as SeDuMi Matlab Toolbox [16].

3 A novel second-order cone programming support vector machine

A second-order cone programming formulation deduced from the soft-margin support vector machine is presented in papers [13, 14]. It mainly solves the following optimization problem:

$\begin{matrix} min_{w, b, ξ_{i}} & \frac{1}{2} ∥ w ∥^{2} + C \sum_{i = 1}^{m} ξ_{i} \\ s . t . & y_{i} (w^{⊤} μ_{i} + b) \geq 1 - ξ_{i} + \sqrt{\frac{1 - ɛ}{ɛ}} ∥ Σ_{i}^{\frac{1}{2}} w ∥ \\ ξ_{i} \geq 0, i = 1, \dots, m, \end{matrix}$ (5)

where $w \in R^{n}$ , $b \in R$ . The parameter ɛ ∈ (0, 1) given by user is the error rate and m is the number of the data points including positive class and negative class. There are m second-order cone constraints and m + n + 1 variables in problem (5). In our proposed SVM model, $C \sum_{i = 1}^{m} ξ_{i}$ is replaced by two parts, each part for each class.

In soft margin support vector machine, the soft margin errors of the training points and the regularization parameter are introduced to improve the classification performance. Thus, the soft-margin support vector machine formulation is given [1]:

$\begin{matrix} min_{w, b, ξ_{i}} & \frac{1}{2} ∥ w ∥^{2} + C \sum_{i = 1}^{m} ξ_{i} \\ s . t . & y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i}, & i = 1, \dots, m \\ ξ_{i} \geq 0, i = 1, \dots, m, \end{matrix}$ (6)

where ξ_i, i = 1, ⋯ , m are slack variables that measure the degree of misclassification and C is a penalty parameter that controls the trade-off.

Let $X_{1} = [x_{1}^{1}, x_{2}^{1}, x_{3}^{1}, \dots, x_{m_{1}}^{1}]$ be a n × m₁ data matrix, $x_{i}^{1} = (x_{i 1}^{1}, \dots, x_{in}^{1})^{T}$ represents a data point. Similarly, let $X_{2} = [x_{1}^{2}, x_{2}^{2}, x_{3}^{2}, \dots, x_{m_{2}}^{2}]$ be a n × m₂ data matrix containing another class and $x_{i}^{2} = (x_{i 1}^{2}, \dots, x_{in}^{2})^{T}$ . Suppose the mean and covariance matrix of the data matric X_i are $μ_{i} \in R^{n}$ and $Σ_{i} \in R^{n \times n}$ , i = 1, 2. Let us consider the following quadratic chance-constrained programming problem: $\begin{matrix} min_{w, b, R_{1}, R_{2}} & \frac{1}{2} ∥ w ∥^{2} + α R_{1} + β R_{2} \\ s . t . & sup_{X_{1} \sim (μ_{1}, Σ_{1})} H_{1} \leq 1 - η_{1} \\ sup_{X_{2} \sim (μ_{2}, Σ_{2})} H_{2} \leq 1 - η_{2} \\ H_{1} = \Pr {w^{T} X_{1} + b \leq 1 - R_{1}} \\ H_{2} = \Pr {w^{T} X_{2} + b \geq - 1 + R_{2}} \\ R_{1} \geq 0, R_{2} \geq 0, \end{matrix}$ (7)

where $R_{1} = {max}_{i = 1}^{m_{1}} {ξ_{i}}$ , $R_{2} = {max}_{j = 1}^{m_{2}} {ξ_{j}}$ . The parameter η_i ∈ (0, 1) is correct rates and the penalty parameters α, β control this trade-off between the maximization of these margin variables. We can propose a novel second-order cone programming support vector machine formulation by Theorem 1:

$\begin{matrix} min_{w, b, R_{1}, R_{2}} & \frac{1}{2} ∥ w ∥^{2} + α R_{1} + β R_{2} \\ s . t . & w^{T} μ_{1} + b \geq 1 - R_{1} + κ_{1} ∥ S_{1}^{T} w ∥ \\ - (w^{T} μ_{2} + b) \geq 1 - R_{2} + κ_{2} ∥ S_{2}^{T} w ∥ \\ R_{i} \geq 0, κ_{i} = \sqrt{\frac{η_{i}}{1 - η_{i}}}, i = 1, 2 . \end{matrix}$ (8)

Theorem 1. The chance-constraints in formulation (7) can be written as the following form: $\begin{matrix} w^{T} μ_{1} + b - 1 + R_{1} \geq \sqrt{\frac{η_{1}}{1 - η_{1}}} ∥ S_{1}^{T} w ∥, \\ - (w^{T} μ_{2} + b) - 1 + R_{2} \geq \sqrt{\frac{η_{2}}{1 - η_{2}}} ∥ S_{2}^{T} w ∥, \end{matrix}$ (9)

where $Σ_{i} = S_{i} S_{i}^{T}$ , i = 1, 2.

Proof 1. The multivariate Chebyshev inequality [17, 18] can be expressed as:

$\begin{matrix} sup_{x^{'} \in (μ, Σ)} \Pr {x^{'} \in S} = \frac{1}{1 + d^{2}}, \\ d^{2} = inf_{x \in S} (x - μ)^{T} Σ^{- 1} (x - μ), \end{matrix}$ (10)

where S is an arbitrary closed convex set.

Let S ={ w^TX₁ + b ≤ 1 - R₁ }. Obviously, S is a closed convex set. Hence, we can get the following equations from formulation (10):

$\begin{matrix} sup_{X_{1} \in (μ_{1}, Σ_{1})} H_{1} = \frac{1}{1 + d_{1}^{2}}, \\ d_{1}^{2} = inf_{w^{T} x + b \leq 1 - R_{1}} (x - μ_{1})^{T} Σ_{1}^{- 1} (x - μ_{1}) . \end{matrix}$ (11)

Then, we will solve the equation:

$\begin{matrix} d_{1}^{2} = inf_{w^{T} x + b \leq 1 - R_{1}} (x - μ_{1})^{T} Σ_{1}^{- 1} (x - μ_{1}) . \end{matrix}$

If w^Tμ₁ + b ≤ 1 - R₁, substituting x = μ₁, $d_{1}^{2} = 0$ and $sup_{X_{1} \in (μ_{1}, Σ_{1})} H_{1} = 1$ into the chance-constraint in formulation (7): $sup_{X_{1} \sim (μ_{1}, Σ_{1})} H_{1} \leq 1 - η_{1},$ we know that it can’t be true when the bound 1 ≤ 1 - η₁, η₁ ∈ (0, 1).

If w^Tμ₁ + b > 1 - R₁, let $u_{1} = S_{1}^{- 1} (x - μ_{1})$ , $v_{1} = S_{1}^{T} w$ and r₁ = - (w^Tμ₁ + b) +1 - R₁ < 0. We have

$\begin{matrix} d_{1}^{2} = inf_{v_{1}^{T} u_{1} \leq r_{1}} u_{1}^{T} u_{1}, \end{matrix}$ (12)

which can be considered as a quadratic objective function with respect to u₁.

In addition, the Lagrangian function of problem (12) is $L (u_{1}, λ) = u_{1}^{T} u_{1} + λ (v_{1}^{T} u_{1} - r_{1})$ with λ ≥ 0. By taking partial derivatives of L (u₁, λ) with regard to u₁, λ, we have 2u₁ = - λv₁, $v_{1}^{T} u_{1} = r_{1}$ .

So, we have

$\begin{matrix} d_{1}^{2} & = inf_{w^{T} x + b \leq 1 - R_{1}} (x - μ_{1})^{T} Σ_{1}^{- 1} (x - μ_{1}) \\ = inf_{v_{1}^{T} u_{1} \leq r_{1}} u_{1}^{T} u_{1} \\ = \frac{r_{1}^{2}}{v_{1}^{T} v_{1}} \\ = \frac{{(w^{T} μ_{1} + b - 1 + R_{1})}^{2}}{w^{T} Σ_{1} w} . \end{matrix}$

Since $sup_{X_{1} \in (μ_{1}, Σ_{1})} H_{1} = \frac{1}{1 + d_{1}^{2}} \leq 1 - η_{1}$ , we have

$\begin{matrix} d_{1}^{2} \geq \frac{η_{1}}{1 - η_{1}} . \end{matrix}$

Therefore, we have the following inequality constraint:

$w^{T} μ_{1} + b - 1 + R_{1} \geq \sqrt{\frac{η_{1}}{1 - η_{1}}} ∥ S_{1}^{T} w ∥,$

where $Σ_{1} = S_{1} S_{1}^{T}$ .

Similarly, the constraint $sup_{X_{2} \sim (μ_{2}, Σ_{2})}$ H₂ ≤ 1 - η₂ can be written as $- (w^{T} μ_{2} + b) - 1 + R_{2} \geq \sqrt{\frac{η_{2}}{1 - η_{2}}} ∥ S_{2}^{T} w ∥,$

where $Σ_{2} = S_{2} S_{2}^{T}$ .

The proof is completed.

The problem (8) is a quadratic second-order cone programming problem,which can be written as a linear SOCP problem by introducing a new variable t and a constraint ∥w ∥ ≤ t. We refer to the formulation (8) as the R₁R₂-SOCP-SVM. Compared with formulation (5), the R₁R₂-SOCP-SVM (formulation (8)) only contains two second-order cone constrains and n + 3 variables, which requires less running time.

If the dual formulation of problem (8) are feasible, we can solve the problem (8) by primal-dual interior point method. Then, we discuss the dual formulation of the R₁R₂-SOCP-SVM by Theorem 2.

Theorem 2. The dual formulation of formulation (8) has the following form: $\begin{matrix} min_{z_{i}} & {∥ z_{2} - z_{1} ∥}^{2} \\ s . t . & z_{i} \in B_{i} (μ_{i}, S_{i}, κ_{i}), \end{matrix}$ (13) where $B_{i} (μ_{i}, S_{i}, κ_{i}) = {μ_{i} + (- 1)^{i} κ_{i} S_{i} u_{i}, ∥ u_{i} ∥ \leq 1}$ are ellipsoids centered at μ_i, whose shape is determined by S_i and sized by κ_i, i = 1, 2.

Proof 2. The Lagrangian function of formulation (8) is:

$\begin{matrix} L (w, b, R_{1}, R_{2}, λ_{1}, λ_{2}, t_{1}, t_{2}) \\ = \frac{1}{2} ∥ w ∥^{2} + α R_{1} + β R_{2} + \\ λ_{1} (- w^{T} μ_{1} - b + 1 - R_{1} + κ_{1} ∥ S_{1}^{T} w ∥) + \\ λ_{2} (w^{T} μ_{2} + b + 1 - R_{2} + κ_{2} ∥ S_{2}^{T} w ∥) - t_{1} R_{1} - t_{2} R_{2}, \end{matrix}$ (14)

where λ₁, λ₂, t₁, t₂ ≥ 0. Since $∥ v ∥ = \max_{∥ u ∥ \leq 1} u^{T} v$ for any $u \in R^{n}$ holds, the Lagrangian function (14) can be equivalently written as:

$\begin{matrix} L (w, b, R_{1}, R_{2}, λ_{1}, λ_{2}, t_{1}, t_{2}) \\ = max_{u_{i}} {L_{1} (w, b, R_{1}, R_{2}, λ_{1}, λ_{2}, t_{1}, t_{2}, u_{1}, u_{2})}, \end{matrix}$ where ∥u_i ∥ ≤ 1, i = 1, 2 and

$\begin{matrix} L_{1} (w, b, R_{1}, R_{2}, λ_{1}, λ_{2}, t_{1}, t_{2}, u_{1}, u_{2}) \\ = \frac{1}{2} ∥ w ∥^{2} + α R_{1} + β R_{2} + \\ λ_{1} (- w^{T} μ_{1} - b + 1 - R_{1} + κ_{1} w^{T} S_{1} u_{1}) + \\ λ_{2} (w^{T} μ_{2} + b + 1 - R_{2} + κ_{2} w^{T} S_{2} u_{2}) - t_{1} R_{1} - t_{2} R_{2} . \end{matrix}$ (15)

Thus, formulation (8) can be reformulated as:

$min_{w, b, R_{i}} max_{λ_{i}, t_{i}, u_{i}} {L_{1} (w, b, R_{1}, R_{2}, λ_{i}, t_{i}, u_{i}) : ∥ u_{i} ∥ \leq 1},$ where i = 1, 2.Then, the dual formulation of formulation (8) can be deduced:

$max_{λ_{i}, t_{i}, u_{i}} min_{w, b, R_{i}} {L_{1} (w, b, R_{1}, R_{2}, λ_{i}, t_{i}, u_{i}) : ∥ u_{i} ∥ \leq 1},$ (16) where i = 1, 2.

The problem $min_{w, b, R_{i}} {L_{1} (w, b, R_{1}, R_{2}, λ_{i}, t_{i}, u_{i}) : ∥ u_{i} ∥ \leq 1, i = 1, 2}$ is an unconstrained optimization problem. It has the following optimality condition:

$\begin{matrix} {\begin{matrix} w + λ_{1} (- μ_{1} + κ_{1} S_{1} u_{1}) + λ_{2} (μ_{2} + κ_{2} S_{2} u_{2}) = 0, \\ - λ_{1} + λ_{2} = 0, \\ α - t_{1} - λ_{1} = 0, \\ β - t_{2} - λ_{2} = 0 . \end{matrix} \end{matrix}$ (17)

Then, by substituting optimality condition (17) to the formulation (15), the dual formulation (16) can be stated as:

$\begin{matrix} max_{λ, u_{i}} & - \frac{1}{2} λ^{2} {(μ_{2} - μ_{1} + κ_{1} S_{1} u_{1} + κ_{2} S_{2} u_{2})}^{2} + 2 λ \\ s . t . & λ \geq 0, ∥ u_{i} ∥ \leq 1, i = 1, 2, \end{matrix}$ (18)

where λ = λ₁ = λ₂.

The formulation (18) can be considered as a concave quadratic objective function with respect to λ. We can obtain its maximum at

$λ^{'} = \frac{2}{{∥ μ_{2} - μ_{1} + κ_{1} S_{1} u_{1} + κ_{2} S_{2} u_{2} ∥}^{2}}$

with the maximum value $\frac{2}{{∥ μ_{2} - μ_{1} + κ_{1} S_{1} u_{1} + κ_{2} S_{2} u_{2} ∥}^{2}} .$

Then, the dual problem of formulation (8) can be written as:

$\begin{matrix} max_{u_{i}} & \frac{2}{{∥ μ_{2} - μ_{1} + κ_{1} S_{1} u_{1} + κ_{2} S_{2} u_{2} ∥}^{2}} \\ s . t . & ∥ u_{i} ∥ \leq 1, i = 1, 2 . \end{matrix}$ (19)

The problem (19) can be simplified as

$\begin{matrix} min_{z_{i}} & {∥ z_{2} - z_{1} ∥}^{2} \\ s . t . & z_{i} \in B_{i} (μ_{i}, S_{i}, κ_{i}), \end{matrix}$ (20)

where $B_{i} (μ_{i}, S_{i}, κ_{i}) = {z_{i} \in R^{n} : z_{i} = μ_{i} + (- 1)^{i} κ_{i} S_{i} u_{i}}$ ∥u_i ∥ ≤ 1, i = 1, 2.

The proof is completed.

The problem (8) can be solved by some SOCP solvers. Then, we can get the classification hyperplane f (x) = w^Tx + b. For a data point x′, if the value f(x′) is positive, x′ is classified as positive class; if the value f(x′) is negative, x′ is classified as negative class.

4 Experimental results

In this section, we report the numerical results of the R₁R₂-SOCP-SVM for binary classification. All numerical experiments are carried out in MATLAB R2018b 64-bit running on a PC.

4.1 The relevant preparation for our experiments

The benchmark data sets implemented in our work are shown in Table 1. We can find the relevant information of those data sets including the number of variables, the size of sample and the imbalance ratio (IR). More information on these data sets can be found in the UCI Repository [19].

Table 1
The information for all data sets

Dataset #variables #examples IR

Australian Credit (AUS) 14 690 1.2

Wisconsin Breast Cancer (WBC) 30 569 1.7

Ionosphere (IONO) 34 351 1.8

Heart/Statlog (HEART) 13 270 1.25

Vertebral Column (C2CW) 6 310 2.1

German Credit (GER) 24 1000 2.3

Qualitative Bankruptcy (QBCY) 6 250 1.3

seeds-1_vs_2-3 (SEED) 7 210 2

Dataset	#variables	#examples	IR
Australian Credit (AUS)	14	690	1.2
Wisconsin Breast Cancer (WBC)	30	569	1.7
Ionosphere (IONO)	34	351	1.8
Heart/Statlog (HEART)	13	270	1.25
Vertebral Column (C2CW)	6	310	2.1
German Credit (GER)	24	1000	2.3
Qualitative Bankruptcy (QBCY)	6	250	1.3
seeds-1_vs_2-3 (SEED)	7	210	2

Traditionally, F-measure and G-mean are two types of the most popular assessment metrics which are functions of the confusion matrix in Table 2.

Table 2

Confusion matrix for classification

	Predicted positive class	Predictive negative class
Actual positive class	True Positive (TP)	False Negative (FN)
Actual negative class	False Positive (FP)	True negative (TN)

The following model selection procedure was performed: training and test subsets were constructed using 10-flod cross-validation for all data sets. A grid search was performed for parameter C in linear SVM. The values of parameters η, ɛ ∈ {0.2, 0.4, 0.6, 0.8} were studied. We studied the values for parameters C, α, β ∈ {2^-7, 2^-6, 2^-5, 2^-4, 2^-3, 2^-2, 2^-1, 2⁰, 2¹, 2², 2³, 2⁴, 2⁵, 2⁶, 2⁷}.

For the above procedure, we used the LIBSVM [20] for SVM (formulation (6)) and the SeDuMI Toolbox [16] for the SOCP-SVM (formulation (4)), the soft-SOCP-SVM (formulation (5)), the LP-SOCP-SVM (a SOCP-SVM formulation deduced from the LP-SVM) [12] and the R₁R₂-SOCP-SVM (formulation (8)).

4.2 Summary of classification performance for all data sets

In this subsection, we present the best classification results of all compared classification models. The F-measure and G-mean value are calculated to show the classification results.

By considering all possible combinations of parameters to summarize the best classification performance in terms of F-measure and G-mean, the results are shown in Tables 3 and 4. In Table 3, the best classification results are achieved by using the R₁R₂-SOCP-SVM in AUS, HEART, C2CW, GER and SEED data sets, while the SVM has a best F-measure value in WBC and IONO data sets and the soft-SOCP-SVM has slightly higher result than the novel model in QBCY data set. The results in Table 4 show that the R₁R₂-SOCP-SVM has a best G-mean value in all data sets besides IONO data set. Compared with other SOCP-SVM models, the novel model has a better predictive result. Therefore, the novel second-order cone programming support vector machine has a better classification performance than the compared classification methods with respect to the F-measure and G-mean.

Table 3
Predictive performance summary for all data sets with respect to F-measure

Dataset SVM SOCP-SVM soft-SOCP-SVM LP-SOCP-SVM Novel model

AUS 0.8463 0.8634 0.8609 0.8583 0.8696

WBC 0.9719 0.9034 0.9005 0.8800 0.9716

IONO 0.9590 0.8392 0.8403 0.8276 0.9282

HEART 0.8597 0.8757 0.8643 0.8632 0.8865

C2CW 0.8962 0.7937 0.8068 0.7405 0.8980

GER 0.8448 0.7870 0.8234 0.7465 0.8468

QBCY 0.9968 0.9968 0.9974 0.9789 0.9971

SEED 0.9011 0.8042 0.7920 0.7211 0.9620

Dataset	SVM	SOCP-SVM	soft-SOCP-SVM	LP-SOCP-SVM	Novel model
AUS	0.8463	0.8634	0.8609	0.8583	0.8696
WBC	0.9719	0.9034	0.9005	0.8800	0.9716
IONO	0.9590	0.8392	0.8403	0.8276	0.9282
HEART	0.8597	0.8757	0.8643	0.8632	0.8865
C2CW	0.8962	0.7937	0.8068	0.7405	0.8980
GER	0.8448	0.7870	0.8234	0.7465	0.8468
QBCY	0.9968	0.9968	0.9974	0.9789	0.9971
SEED	0.9011	0.8042	0.7920	0.7211	0.9620

Table 4

Predictive performance summary for all data sets with respect to G-mean

Dataset	SVM	SOCP-SVM	soft-SOCP-SVM	LP-SOCP-SVM	Novel model
AUS	0.8573	0.8715	0.8708	0.8689	0.8785
WBC	0.9807	0.9307	0.9310	0.8907	0.9816
IONO	0.9543	0.7729	0.7729	0.7682	0.9273
HEART	0.8455	0.8629	0.8474	0.8511	0.8746
C2CW	0.8342	0.7238	0.7189	0.7024	0.8584
GER	0.7194	0.6706	0.6758	0.6433	0.7500
QBCY	0.9973	0.9975	0.9975	0.9797	0.9978
SEED	0.9376	0.8456	0.8336	0.7436	0.9638

4.3 Robustness analysis of the R₁R₂-SOCP-SVM

In this subsection, we carry out the robustness analysis [21] to compare the overall performance between the R₁R₂-SOCP-SVM and the other classification methods. The relative performance of a method M on a data set i is represented by the ratio of its precision p_i (M) and the highest precision $\max_{j} (p_{i} (j))$ among all the compared methods j:

$Precision {Ratio}_{i} (M) = \frac{p_{i} (M)}{\max_{j} (p_{i} (j))} .$

The larger the value of PrecisionRatio_i (M) is, the better the performance of the method M on the data set i is. Thus, a good measurement of the robustness of a method M is represented by the value of ∑_iPrecisionRatio_i (M). The larger its value is, the better the robustness and overall performance is [21].

The results in Table 5 show the value of ∑_iPrecis-ionRatio_i on F-measure and G-mean for all SOCP-SVM models for all datastes. It’s can be found that the R₁R₂-SOCP-SVM has the better overall performance and robustness compared with the compared SOCP-SVM methods. Thus, the proposed SOCP-SVM model has a better performance in robustness analysis than other SOCP-SVM models used in this section.

Table 5
The total of PrecisionRatio_i of all classification methods with respect to F-measure and G-mean

SOCP-SVM soft-SOCP-SVM LP-SOCP-SVM Novel model

∑_iF - measureRatio_i 7.434 7.462 7.166 7.967

∑_iG - meanRatio_i 7.375 7.322 7.104 7.972

	SOCP-SVM	soft-SOCP-SVM	LP-SOCP-SVM	Novel model
∑_iF - measureRatio_i	7.434	7.462	7.166	7.967
∑_iG - meanRatio_i	7.375	7.322	7.104	7.972

4.4 Influence of the parameters in the R₁R₂-SOCP-SVM

In this subsection, we analyse the sensitivity of the relevant parameters in the proposed model and characterize their influence on final results. We also discuss whether the classification results are stable with the different values of the parameters α, β and η_i, i = 1, 2.

Table 6 summarizes the predictive performance in terms of the G-mean for the R₁R₂-SOCP-SVM with the different values of η_i, i = 1, 2. In Table 6, the average, minimum and the maximum performance with different values of η_i, i = 1, 2 are shown. It can be found that the maximum value is significantly higher than the respective mean value for seven datasets by Student’s t test. It can be known that how significant the difference between the maximum value and the mean value by the Student’s t test. Only in QBCY data set, the value of p-value is 0.02. Notice that there is a slight difference between the maximum value and the mean value for QBCY dataset. The parameters η₁, η₂ have strongly affected the classification results in seven data sets. Thus, it is vital to set them using cross validation in this experiment.

Table 6
Max, Min and Mean G-mean with different values of η, and t test for model selection stability, for the novel model, for all data sets

AUS WBC IONO HEART C2CW GER QBCY SEED

Max 0.8721 0.9809 0.9242 0.8561 0.8453 0.7285 0.9970 0.9627

Min 0.8334 0.8608 0.8088 0.7935 0.6353 0.5433 0.9785 0.7894

Mean 0.8599 0.9419 0.8598 0.8271 0.7418 0.6585 0.9936 0.8803

p-value <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 0.02 <0.01

	AUS	WBC	IONO	HEART	C2CW	GER	QBCY	SEED
Max	0.8721	0.9809	0.9242	0.8561	0.8453	0.7285	0.9970	0.9627
Min	0.8334	0.8608	0.8088	0.7935	0.6353	0.5433	0.9785	0.7894
Mean	0.8599	0.9419	0.8598	0.8271	0.7418	0.6585	0.9936	0.8803
p-value	<0.01	<0.01	<0.01	<0.01	<0.01	<0.01	0.02	<0.01

Fig. 1

G-mean by varying regularization parameter for the soft-SOCP-SVM and R₁R₂-SOCP-SVM, in all datasets. (a) AUS. (b) WBC. (c) C2CW.(d) GER. (e) HEART. (f) IONO. (g) QBCY. (h) SEED.

Then, we study the influence of the hyperparameters α, β on predictive results. Fig. 1 presents the predictive performance in terms of the G-mean for soft-SOCP-SVM and the R₁R₂-SOCP-SVM by varying the regularization parameters with the set described earlier.

In subfigure (a) and (g), the parameter α has little influence on the prediction results of R₁R₂-SOCP-SVM, when the value of the parameter β is greater than 1, it has little influence on the model. The R₁R₂-SOCP-SVM is stable for the parameter α, and the parameter β has little influence on our model when its value is greater than 10 in subfigure (b), (f) and (h). The prediction results of the novel model are better than that of the soft-SOCP-SVM after selecting appropriate parameter values. Although the soft-SOCP-SVM model has better stability than the new model in subfigure (c) and (d), the prediction results of the novel model are better than that of the soft-SOCP-SVM model with appropriate parameter values. In subfigure (e), the two models have basically the same stability, and the prediction results of the new model are slightly better than that of soft-SOCP-SVM. Therefore, it is still highly recommended that ones should perform a grid search varying the parameters α and β with the suggested values to obtain the best classification result for the R₁R₂-SOCP-SVM.

5 Conclusion

This paper proposes a novel second-order cone programming soft-margin support vector machine. We divide the relaxation variables into two parts in the soft-margin support vector machine. Two regularization parameters are introduced to control the trade-off. The proposed second-order cone programming formulation reduces the number of the second-order cone constraints. The experimental results show that the proposed second-order cone programming support vector machine is better in F-measure and G-mean than the compared classification methods for most data sets. What’s more, the proposed formulation has a better performance in robustness analysis than the compared models.

References

Cortes

and Vapnik

, Support-vector networks, Mach Learn 20(3) (1995), 273–297. Doi: 10.1007/BF00994018

Khemchandani

and Jayadeva

S.C.

, Fuzzy twin support vector machines for pattern classification, In Mathematical Programming And Game Theory For Decision Making, (2008), 131–142. Doi: 10.1142/9789812813220_0009

, Chen

W.P.

, Ko

C.H.

, Lee

Y.J.

and Chen

J.S.

, Two smooth support vector machines for “-insensitive regression, Comput Optim Appl 70(1) (2018), 171–199. Doi: 10.1007/s10589-017-9975-9

Huang

, Shi

and Suykens

J.A.

, Ramp loss linear programming support vector machine, J Mach Learn Res 15(1) (2014), 2185–2211.

Weston

and Watkins

, Multi-class suport vector machine. In: Proceedings of the seventh European Symposium on Artificial Neural Networks (1999).

Nath

J.S.

and Bhattacharyya

, Maximum margin classifiers with specified false positive and false negative error rates, In: Proceedings of the SIAM International Conference on Data Mining, (2007), 35–46. Doi: 10.1137/1.9781611972771.4

Kerenidis

, Prakash

and Szilágyi

, Quantum algorithms for Second-Order Cone Programming and Support Vector Machines. arXiv preprint arXiv:1908.06720 (2019).

Maldonado

, López

and Carrasco

, A secondorder cone programming formulation for twin support vector machines, Appl Intell 45(2) (2016), 45265–45276. Doi: 10.1007/s10489-016-0764-4

Roshanfekr

, Esmaeili

, Ataeian

and Amiri

, Weighted second-order cone programming twin support vector machine for imbalanced data classification, arXiv preprint arXiv:1904.11634. (2019).

10.

López

and Maldonado

, Multi-class second-order cone programming support vector machines, Inf Sci 330 (2016), 328–341. Doi: 10.1016/j.ins.2015.10.016

11.

Iranmehr

, Masnadi-Shirazi

and Vasconcelos

, Costsensitive support vector machines, Neurocomputing 343 (2019), 50–64. Doi: 10.1016/j.neucom.2018.11.099

12.

Maldonado

and López

, Imbalanced data classification using second-order cone programming support vector machines, Pattern Recognit 47(5) (2014), 2070–2079. Doi: 10.1016/j.patcog.2013.11.021

13.

Shivaswamy

P.K.

, Bhattacharyya

and Smola

A.J.

, Second order cone programming approaches for handling missing and uncertain data, J Mach Learn Res 7(Jul) (2006), 1283–1314.

14.

Wang

, Fan

and Pardalos

P.M.

, Robust chanceconstrained support vector machines with second-order moment information, Ann Oper Res 263(1-2) (2018), 45–68. Doi: 10.1007/s10479-015-2039-6

15.

Lanckriet

G.R.

, Ghaoui

L. E.

, Bhattacharyya

and Jordan

M. I.

A robust minimax approach to classification, J Mach Learn Res 3(Dec) (2002), 555–582. Doi: 10.1162/153244303321897726

16.

Sturm

J.F.

, Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones, Optim Method Softw 11(1-4) (1999), 625–653. Doi: 10.1080/10556789908805766

17.

Marshall

A.W.

and Olkin

, Multivariate chebyshev inequalities, The Annals of Mathematical Statistics 31(4) (1960), 1001–1014. https://www.jstor.org/stable/2237799

18.

Bertsimas

and Popescu

, Optimal inequalities in probability theory: A convex optimization approach, Siam J Optim 15(3) (2005), 780–804. Doi: 10.1137/S1052623401399903

19.

Dua

and Graff

, UCI Machine Learning Repository, (2019), http://archive.ics.uci.edu/ml

20.

Chang

C.C.

and Lin

C.J.

, LIBSVM: A library for support vector machines, ACM Trans Intell Syst Technol 2(3) (2011), 1–27. software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

21.

Geng

, Zhan

D.C.

and Zhou

Z.H.

, Supervised nonlinear dimensionality reduction for visualization and classification, IEEE Trans Syst Man Cybern B Cybern 35(6) (2005), 1098–1107. Doi: 10.1109/TSMCB.2005.850151