Abstract
The support vector machine is a classification approach in machine learning. The second-order cone optimization formulation for the soft-margin support vector machine can ensure that the misclassification rate of data points do not exceed a given value. In this paper, a novel second-order cone programming formulation is proposed for the soft-margin support vector machine. The novel formulation uses the l2-norm and two margin variables associated with each class to maximize the margin. Two regularization parameters α and β are introduced to control the trade-off between the maximization of margin variables. Numerical results illustrate that the proposed second-order cone programming formulation for the soft-margin support vector machine has a better prediction performance and robustness than other second-order cone programming support vector machine models used in this article for comparision.
Introduction
The support vector machine (SVM) proposed by Cortes and Vapnik [1] is one of the popular classification approaches in machine learning. In recent years, the support vector machine has an explosive development in science and technology [2–5].
The second-order cone programming (SOCP) problem is a family of structural optimization problem, which can be applied to the support vector machine. Thus, some scholars begin to make a study for the second-order cone programming support vector machines. The hard-margin support vector machine is written as a second-order cone programming support vector machine [6] that has received a lot of attention within support vector machine community. A quantum algorithm is proposed to solve the second-order cone programming problem in paper [7]. What’s more, the quantum algorithm is applied for the second-order cone programming support vector machine. In addition, the second-order cone programming formulation is extended for twin support vector machine [8], which constructs two nonparallel classifiers by solving two quadratic chance-constrained programming problems. The second-order cone programming formulations can also be used for the imbalanced data classification [9] and multivariate classification [10].
The cost-sensitive learning is an efficient classification approach, which considers the misclassification cost of a classifier. The total misclassification cost is divided into two parts, each part for each class, in cost-sensitive support vector machine [11]. Two different cost factors are used for each class respectively. Moreover, in paper [12], the second-order cone programming formulation for support vector machine uses two margin variables associated with each class to maximize the margin and introduces a parameter for controlling the maximization within these margin variables.
The second-order cone programming soft-margin support vector machine formulation presented in papers [13] and [14] has as many second-order cone constraints as training samples, resulting in that the classification models are computationally very expensive in terms of running time. This paper studies the second-order cone programming formulation for the soft-margin support vector machine. We divide the soft-margin errors into two parts. Moreover, a novel second-order cone programming soft-margin support vector machine formulation with two second-order cone constraints is presented. Compared with other second-order cone optimization methods for the soft-margin support vector machine, the novel model has the less number of second-order cone constraints. The experimental results show that the proposed method has a better classification results than the compared classification methods.
The paper is structured as follows: In Section 2, we introduce the second-order cone programming formulation deduced from the hard-margin support vector machine. The new second-order cone programming support vector machine is presented in Section 3. Section 4 provides experimental results. A conclusion of this paper is shown in Section 5.
The second-order cone programming support vector machine
In this section, we will introduce the second-order cone programming support vector machine formulation [6, 8].
Given a set of training points and their respective labels (x
i
, y
i
), where y
i
∈ {+1, - 1} and
The second-order cone programming support vector machine constructs a maximum margin classifier such that the false positive and the false negative error rates do not exceed 1 - η1 ∈ (0, 1) and 1 - η2 ∈ (0, 1). Let the random vector X1 and X2 be associated to the positive class and negative class, respectively. Suppose the mean and covariance matrix of the random vector X
i
are
The above formulation suggests that we can classify each pattern correctly. The correct rate is not lower than η i , i = 1, 2. The above probability constraint can be replaced with their robust counterparts [15], so the above quadratic chance-constrained programming problem can be written as:
where
A second-order cone (SOC) constraint on the variable
where
A second-order cone programming formulation deduced from the soft-margin support vector machine is presented in papers [13, 14]. It mainly solves the following optimization problem:
where
In soft margin support vector machine, the soft margin errors of the training points and the regularization parameter are introduced to improve the classification performance. Thus, the soft-margin support vector machine formulation is given [1]:
where ξ i , i = 1, ⋯ , m are slack variables that measure the degree of misclassification and C is a penalty parameter that controls the trade-off.
Let
where
where
where S is an arbitrary closed convex set.
Let S ={ w T X1 + b ≤ 1 - R1 }. Obviously, S is a closed convex set. Hence, we can get the following equations from formulation (10):
Then, we will solve the equation:
If w
T
μ1 + b ≤ 1 - R1, substituting x = μ1,
If w
T
μ1 + b > 1 - R1, let
which can be considered as a quadratic objective function with respect to u1.
In addition, the Lagrangian function of problem (12) is
So, we have
Since
Therefore, we have the following inequality constraint:
where
Similarly, the constraint
where
The proof is completed.
The problem (8) is a quadratic second-order cone programming problem,which can be written as a linear SOCP problem by introducing a new variable t and a constraint ∥w ∥ ≤ t. We refer to the formulation (8) as the R1R2-SOCP-SVM. Compared with formulation (5), the R1R2-SOCP-SVM (formulation (8)) only contains two second-order cone constrains and n + 3 variables, which requires less running time.
If the dual formulation of problem (8) are feasible, we can solve the problem (8) by primal-dual interior point method. Then, we discuss the dual formulation of the R1R2-SOCP-SVM by Theorem 2.
where λ1, λ2, t1, t2 ≥ 0. Since
Thus, formulation (8) can be reformulated as:
The problem
Then, by substituting optimality condition (17) to the formulation (15), the dual formulation (16) can be stated as:
where λ = λ1 = λ2.
The formulation (18) can be considered as a concave quadratic objective function with respect to λ. We can obtain its maximum at
with the maximum value
Then, the dual problem of formulation (8) can be written as:
The problem (19) can be simplified as
where
The proof is completed.
The problem (8) can be solved by some SOCP solvers. Then, we can get the classification hyperplane f (x) = w T x + b. For a data point x′, if the value f(x′) is positive, x′ is classified as positive class; if the value f(x′) is negative, x′ is classified as negative class.
In this section, we report the numerical results of the R1R2-SOCP-SVM for binary classification. All numerical experiments are carried out in MATLAB R2018b 64-bit running on a PC.
The relevant preparation for our experiments
The benchmark data sets implemented in our work are shown in Table 1. We can find the relevant information of those data sets including the number of variables, the size of sample and the imbalance ratio (IR). More information on these data sets can be found in the UCI Repository [19].
The information for all data sets
The information for all data sets
Traditionally, F-measure and G-mean are two types of the most popular assessment metrics which are functions of the confusion matrix in Table 2.
Confusion matrix for classification
The following model selection procedure was performed: training and test subsets were constructed using 10-flod cross-validation for all data sets. A grid search was performed for parameter C in linear SVM. The values of parameters η, ɛ ∈ {0.2, 0.4, 0.6, 0.8} were studied. We studied the values for parameters C, α, β ∈ {2-7, 2-6, 2-5, 2-4, 2-3, 2-2, 2-1, 20, 21, 22, 23, 24, 25, 26, 27}.
For the above procedure, we used the LIBSVM [20] for SVM (formulation (6)) and the SeDuMI Toolbox [16] for the SOCP-SVM (formulation (4)), the soft-SOCP-SVM (formulation (5)), the LP-SOCP-SVM (a SOCP-SVM formulation deduced from the LP-SVM) [12] and the R1R2-SOCP-SVM (formulation (8)).
In this subsection, we present the best classification results of all compared classification models. The F-measure and G-mean value are calculated to show the classification results.
By considering all possible combinations of parameters to summarize the best classification performance in terms of F-measure and G-mean, the results are shown in Tables 3 and 4. In Table 3, the best classification results are achieved by using the R1R2-SOCP-SVM in AUS, HEART, C2CW, GER and SEED data sets, while the SVM has a best F-measure value in WBC and IONO data sets and the soft-SOCP-SVM has slightly higher result than the novel model in QBCY data set. The results in Table 4 show that the R1R2-SOCP-SVM has a best G-mean value in all data sets besides IONO data set. Compared with other SOCP-SVM models, the novel model has a better predictive result. Therefore, the novel second-order cone programming support vector machine has a better classification performance than the compared classification methods with respect to the F-measure and G-mean.
Predictive performance summary for all data sets with respect to F-measure
Predictive performance summary for all data sets with respect to F-measure
Predictive performance summary for all data sets with respect to G-mean
In this subsection, we carry out the robustness analysis [21] to compare the overall performance between the R1R2-SOCP-SVM and the other classification methods. The relative performance of a method M on a data set i is represented by the ratio of its precision p
i
(M) and the highest precision
The larger the value of PrecisionRatio i (M) is, the better the performance of the method M on the data set i is. Thus, a good measurement of the robustness of a method M is represented by the value of ∑ i PrecisionRatio i (M). The larger its value is, the better the robustness and overall performance is [21].
The results in Table 5 show the value of ∑ i Precis-ionRatio i on F-measure and G-mean for all SOCP-SVM models for all datastes. It’s can be found that the R1R2-SOCP-SVM has the better overall performance and robustness compared with the compared SOCP-SVM methods. Thus, the proposed SOCP-SVM model has a better performance in robustness analysis than other SOCP-SVM models used in this section.
The total of PrecisionRatio
i
of all classification methods with respect to F-measure and G-mean
The total of PrecisionRatio i of all classification methods with respect to F-measure and G-mean
In this subsection, we analyse the sensitivity of the relevant parameters in the proposed model and characterize their influence on final results. We also discuss whether the classification results are stable with the different values of the parameters α, β and η i , i = 1, 2.
Table 6 summarizes the predictive performance in terms of the G-mean for the R1R2-SOCP-SVM with the different values of η i , i = 1, 2. In Table 6, the average, minimum and the maximum performance with different values of η i , i = 1, 2 are shown. It can be found that the maximum value is significantly higher than the respective mean value for seven datasets by Student’s t test. It can be known that how significant the difference between the maximum value and the mean value by the Student’s t test. Only in QBCY data set, the value of p-value is 0.02. Notice that there is a slight difference between the maximum value and the mean value for QBCY dataset. The parameters η1, η2 have strongly affected the classification results in seven data sets. Thus, it is vital to set them using cross validation in this experiment.
Max, Min and Mean G-mean with different values of η, and t test for model selection stability, for the novel model, for all data sets
Max, Min and Mean G-mean with different values of η, and t test for model selection stability, for the novel model, for all data sets

G-mean by varying regularization parameter for the soft-SOCP-SVM and R1R2-SOCP-SVM, in all datasets. (a) AUS. (b) WBC. (c) C2CW.(d) GER. (e) HEART. (f) IONO. (g) QBCY. (h) SEED.
Then, we study the influence of the hyperparameters α, β on predictive results. Fig. 1 presents the predictive performance in terms of the G-mean for soft-SOCP-SVM and the R1R2-SOCP-SVM by varying the regularization parameters with the set described earlier.
In subfigure (a) and (g), the parameter α has little influence on the prediction results of R1R2-SOCP-SVM, when the value of the parameter β is greater than 1, it has little influence on the model. The R1R2-SOCP-SVM is stable for the parameter α, and the parameter β has little influence on our model when its value is greater than 10 in subfigure (b), (f) and (h). The prediction results of the novel model are better than that of the soft-SOCP-SVM after selecting appropriate parameter values. Although the soft-SOCP-SVM model has better stability than the new model in subfigure (c) and (d), the prediction results of the novel model are better than that of the soft-SOCP-SVM model with appropriate parameter values. In subfigure (e), the two models have basically the same stability, and the prediction results of the new model are slightly better than that of soft-SOCP-SVM. Therefore, it is still highly recommended that ones should perform a grid search varying the parameters α and β with the suggested values to obtain the best classification result for the R1R2-SOCP-SVM.
This paper proposes a novel second-order cone programming soft-margin support vector machine. We divide the relaxation variables into two parts in the soft-margin support vector machine. Two regularization parameters are introduced to control the trade-off. The proposed second-order cone programming formulation reduces the number of the second-order cone constraints. The experimental results show that the proposed second-order cone programming support vector machine is better in F-measure and G-mean than the compared classification methods for most data sets. What’s more, the proposed formulation has a better performance in robustness analysis than the compared models.
