Abstract
In this paper, we propose a least squares support vector machine with parametric margin (Par-LSSVM) for binary classification, which only needs to solve a system of linear equation. Par-LSSVM is able to handle the datasets with heteroscedastic noise. And the closer hyperplane to the test data point gives the class label, and this makes Par-LSSVM capable of dealing with “Cross Planes” datasets. The experimental results on several artificial, benchmark and USPS datasets indicate that our proposed algorithm outperforms Par-ν-SVM for binary classification problem.
Introduction
Support vector machine(SVM) is a novel kernel-based tool for binary classification and regression problems [1–3]. SVM is based on the statistical learning theory and was introduced by Vapnik et al[3]. The standard SVM aims at minimizing an upper bound on the generalization error through maximizing the margin between two parallel hyperplanes for binary classification problem, resulting in a constrained optimization criterion using quadratic programming (QP) problem. This QP problem leads to higher computational cost. Suykens and Vandewalle established least squares support vector machine(LS-SVM)[4] by replacing inequality constraints with equality constraints in formulation of the standard SVM, which tries to avoid the above shortcoming and obtain an analytical solution directly from solving a set of linear equations.
The standard SVM generates a margin with tube shape by parallel transporting the decision hyperplane, however, a margin with arbitrary shape is more accord with the distribution of data points in most cases. Based upon the idea, Pei-Yi Hao[5] introduced a parametric margin ν-support vector machine (Par-ν-SVM), in which a parametric margin model of arbitrary shape by incorporating a change from the functional margin ρ in ν-support vector machine(ν-SVM)[6]. This model is more effective in many cases, especially when the noise is heteroscedastic, that is, the noise strongly depends on the input value. The main point of Par-ν-SVM is the utilization of nonparallel hyperplanes, which is similar to the the generalized eigenvalues in proximal support vector machine(7)[7], the twin support vector machine(8)[8], and the nonparallel hyperplane support vector machine(9)[9]. These algorithms need to solve one (9) or two (8) quadratic programming problems to obtain the two nonparallel hyperplanes. Experimental results presented in [7, 9] underscore their efficacy and their special abilities in dealing with the “Cross Planes” datasets.
Inspired by Pei-Yi Hao’s Par-ν-SVM and LS-SVM, in this paper, we introduce a parametric margin model into LS-SVM and keep the advantages of both Par-ν-SVM and LS-SVM, and finally develop a new algorithm, called least squares support vector machine with parametric margin, Par-LSSVM for short. Our Par-LSSVM uses the square of slack variable ξ
i
with weight instead of the slack variable ξ
i
with weight , i = 1, …, N . More precisely, for the binary classification problem, our algorithm aims at a pair of nonparallel hyperplanes such that each one is close to one of the two classes. These two hyperplanes are obtained by rotating and transporting the f (x) with h (x) and expressed as
The rest of this paper is organized as follows. A brief introduction of the Par-ν-SVM is given in Section 2. Section 3 develops our Par-LSSVM for binary classification problem. Section 4 deals with the experimental results of toy examples, several UCI benchmark datasets and USPS datasets. Section 5 concludes this paper.
The ν-support vector machine with parametric margin model(Par-ν-SVM) [5] is the extensions of ν-SVM for binary classification problem. This section gives a overview of the Par-ν-SVM for classification problem. Interested readers may consult Pei-Yi Hao [5] for details.
Consider a binary classification problem with training set
By exploiting the KKT conditions, the solution of the prime problem (3) and the solution of the dual problem (4) have the following relation
In this section, we propose the least squares version of SVM with parametric margin for binary classification problem in details. Our Par-LSSVM is a fast and simple algorithm and only needs to solve a system of linear equations for generating both linear and nonlinear classifiers. As earlier mentioned, the hyperplanes f (x) + h (x) =0 and f (x) - h (x) =0 are margins in Par-ν-SVM. While in our Par-LSSVM, which is inspired by clustering, these two hyperplanes are positive and negative hyperplanes which are closer to the positive and negative class, respectively. This lead to the following primal problem:
In order to get the solution of the prime problem (11), we introduce the Lagrangian function as follows
After choosing the proper kernel function, and the parameters C1, C2, the Lagrangian multiplier vector α and biases b, d can be get by solving the system of linear equations (15). Here, the Matlab command x = A \ b is used to solve the system of linear equations (15). And the functions f (x) = 〈
Note that the class of a test point x ∈ R
n
is assigned by the following
This leads to the following algorithm:
Given the training set (2); Select the appropriate kernel function and the parameters C1, C2; Solve the system of linear equations (15) and get its solution b*, d* and α*; Construct the decision function
In this section, we give the experimental results about our Par-LSSVM on toy examples, some UCI benchmark datasets[10] and USPS dataset, which are commonly used in testing machine learning algorithms. We compare our implementation with state of the standard SVM, ν-SVM and Par-ν-SVM. All numerical experiments are carried out on a personal notebook with Inter(R) Core TM i5 2450M CPU@2.50GHz, 4.00GB memory, and Windows 7 operation system such that the same platform is provided for simulations. The standard SVM and ν-SVM are implemented by LibSVM [14]. And the dual QP problems arising in Par-ν-SVM is solved by optimization toolbox for Matlab 2010. For the value of parameters in these algorithms, we select C and ν from the sets of values {2 i |i = -15, - 14, ⋯ , 15} and {0.1, 0.2, ⋯ , 0.9} by employing the standard 10-fold cross validation techique[11] respectively. In nonlinear case, we employ RBF kernel K (x, x′) = exp(- γ||x - x′||2) and the kernel parameter γ is selected from the set {2 i |i = -15, - 14, ⋯ , 15}. Note that all data sets are normalized such that the features locate in [-1, 1] before learning.
Toy examples
In this subsection, we first construct two examples of two-dimensional datasets to compare our Par-LSSVM with the other SVMs.
Example 1 is a simple two-dimensional “Cross Planes” dataset, which was also tested in [7, 12]. The “Cross Planes” dataset are generated by perturbing points lying on two intersecting lines. Figure 2(a) shows the result of SVM, ν-SVM and Par-ν-SVM in linear case, these algorithms are all single plane decision method; Figure 2(b) shows the result of SVM, ν-SVM and Par-ν-SVM in nonlinear case with RBF kernel; Figure 2(c) shows the result of our Par-LSSVM algorithm in linear case. It is ease to see that our Par-LSSVM can get a much better accuracy than the other algorithms even in linear case. And in order to get a comparable accuracy, we only need to utilize the linear kernel, which reduce the compute complexity much more.
Example 2, also utilized in [13], is a artificial dataset with heteroscedastic noise, where the positive class of training points consists of uniform points satisfying x1 ∈ U [- π/2, 2π], x2 ∈ U [0.6sinx1 - 0.25, 0.6sinx1 + 0.25], while the negative class of training points consists of uniform points satisfying x1 ∈ U [- π/2, 2π], x2 ∈ U [0.6sin(x1/1.05 + 0.4) -1.35, 0.6sin(x1/1.05 - 0.85)], where the input data X = (x1, x2) T (see Fig. 3). The width of the margin between the two classes of the dataset depends on the input location. Fig. 3(a–d) describe the results of SVM, ν-SVM, Par-ν-SVM and our Par-LSSVM with the RBF kernels on this data set. It can be seen that the separating hyperplanes of these classifications have similar results. However, our proposed Par-LSSVM gives a pair of more suitable hyperplanes which best describe the tendency of these two classes.
Benchmark datasets
In order to further investigate our proposed Par-LSSVM with SVM, ν-SVM and Par-ν-SVM, we choose eleven datasets from the UCI machine learning repository [10].
Table 1 is concerned with Par-LSSVM, SVM, and par-v-SVM and Par-ν-SVM with linear kernel. We report the accuary and the training time of these algorithms. We computed the means and standard errors of the accuracies through ten 10-fold cross-validation that generates then disjoint validation sets. We can see that Par-LSSVM is more than ten times faster than Par-v-SVM because it solves a linear equation in Par-LSSVM with same size. The classification accuracy of linear Par-LSSVM is the best on four UCI datasets:Australian, Haberman, Heart and Sonar. Although the performance of Par-LSSVM is not the best on the rest of UCI datasets, it is comparable with of SVM, ν-SVM and Par-ν-SVM.
Table 2 shows the comparison of classification accuracies and the training times of SVM, ν-SVM, Par-ν-SVM and Par-LSSVM with the nonlinear RBF kernel. We also give the means and standard errors of the accuracies through ten 10-fold cross-validation. Table 2 reveals that the accuracies of Par-LSSVM are better than that of Par-ν-SVM. And it is to be noted that computing time of Par-LSSVM is more than ten times faster than Par-ν-SVM. The classification accuracies of Par-LSSVM are the best on six of eleven UCI datasets. And the performance of Par-LSSVM are similar to other algorithms on the rest five UCI datasets.
By the way, in fact, our proposed Par-LSSVM requires solving a linear system of equations. The training time of Par-LSSVM should be faster than SVM, ν-SVM and Par-ν-SVM. However, in Table 1 and Table 2, the computing times of our Par-LSSVM are less compared to those of SVM and ν-SVM with both linear and nonlinear cases. This is because that SVM and ν-SVM are implemented by LibSVM. It is well known that LibSVM implements an SMO-type algorithm and its source code was wrote by C++.
In addition, our proposed Par-LSSVM is an approach based on nonparallel hyperplanes idea, so we give another set of numerical experiment results including nine UCI benchmark datasets, which compare with two classical nonparallel hyperplanes approaches: 7 and 8. Table 3 give the detail results. From the Table 3, we can see that the performance of Par-LSSVM is better than those of 7 and 8 in six datasets. And the classification accuracies of Par-LSSVM have greater difference than the rest two algorithms in the following datasets: BUPA liver, Australian and Pima-Indian.
USPS dataset
The USPS dataset is the well-known United States Postal Service(USPS) handwritten digits recognition corpus. Figure 4 shows the appearance of USPS dataset. We compute ten pairwise digits of varying difficulty for digit recognition. Table 4 shows the results of experiments for SVM, ν-SVM, Par-ν-SVM and Par-LSSVM with RBF kernel function. Similarly, we also give the means and standard errors of the accuracies through ten 10-fold cross-validation. Tables 3 discloses that the accuracies of Par-LSSVM are similar or better comparing to SVM, ν-SVM and Par-ν-SVM.
Conclusion
In this paper, we have proposed the least squares support vector machine with parametric margin model for the binary classification problem(Par-LSSVM). Compared with the previous Par-ν-SVM, its computational cost reduced since it needs only solving system of linear equation instead of solving quadratic programming problem. Par-LSSVM uses a pair of nonparallel hyperplanes to close each class respectively. The closer hyperplane to the test data point gives the class label, and this makes our algorithm capable for dealing with “Cross Planes” data sets. The experimental results on toy, UCI benchmark datasets and USPS dataset indicated that Par-LSSVM had a comparable generalization performance with SVM, ν-SVM and Par-ν-SVM, and faster learning speed than Par-ν-SVM.
