Fuzzy and interval-valued fuzzy nonparallel support vector machine

Abstract

By using fuzzy set theory (FS), interval-valued fuzzy set theory (IVFS) and nonparallel support vector machine theory (NPSVM), the fuzzy nonparallel support vector machine (F-NPSVM) and interval-valued fuzzy nonparallel support vector machine (IVF-NPSVM) are constructed. Both F-NPSVM and IVF-NPSVM consider the membership degree of the training points in their models and the difference is the method to determine them. Then the solutions to them are derived. The experiments on both artificial data set and benchmark data sets show that most of the classification results by using the F-NPSVM and IVF-NPSVM are more accurate than NPSVM, support vector machine (SVM), interval-valued fuzzy support vector machine (IVF-SVM), generalized eigenvalue proximal support vector machine (GEPSVM) and twin support vector machine (TWSVM). Finally, Friedman test is used to verify that there is a significant difference between the two new models and the previous ones.

Keywords

Classification fuzzy interval-valued fuzzy nonparallel support vector machine

1 Introduction

Support vector machine (SVM) is a new classification technique which has drawn much attention on this topic in recent years [1 –5]. The theory of SVM is based on the idea of structural risk minimization [3]. In many applications, SVM has been shown to provide higher performance than traditional learning machines [1] and has been introduced as powerful tools for solving classification problems. SVM first maps the input points into a high-dimensional feature space and finds a separating hyperplane that maximizes the margin between two classes in this space. Maximizing the margin is a quadratic programming problem and can be solved from its dual problem by introducing Lagrangian multipliers. Without any knowledge of the mapping, SVM finds the optimal hyperplane by using the kernel functions in feature space. The solution of the optimal hyperplane can be written as a combination of a few support vectors. After the theory of SVM, other parallel support vector machines are introduced. For the least squares support vector machine (LSSVM) [6], least square function with equality constraints is used as a loss function instead of the complex QP problem of SVM, so the solving speed is relatively fast. For the ν- support vector machine (ν– SVM) [7], a parameter ν lets one effectively control the number of support vectors.

The theory of fuzzy set is produced by Zadeh [8] in 1965 and has been widely used in many fields of modern society [9]. However, fuzzy membership degree is only a real number, which can only be supportive or against in practical application in decision making and so on. So, there is certain limit in using only membership degree of fuzzy set in some actual problems. Intuitionistic fuzzy set theory introduced by Atanassov [10–11] is an extension of the classic fuzzy set theory. The classical membership degree is transformed into a membership degree, a non-membership degree and a hesitation degree. Another well-known generalization is interval-valued fuzzy set which is introduced by B. Gorzafczany [12 –14]. The classical membership degree is changed to an interval-valued membership degree and a hesitation degree. After that, some authors investigate the topic and obtain some meaningful conclusions [15 –17].

There are more and more applications using the SVM techniques today. However, in many applications, some input points may not be exactly assigned to one of these two classes. Some are more important to be fully assigned to one class so that they should be separated more correctly. While some data points corrupted by noises should be better discarded otherwise once they become support vectors, they will produce a great effect on the classification result. SVM lacks these abilities. For this kind of situation, Lin [18–19] construct fuzzy support vector machine (FSVM). The inputs are assigned to different membership degree according to the different contribution to the class, thereby weakening the influence of noises on classification. How to determine the membership degree of the points is the key point to FSVM. At present, the more common method is based on the distance from the sample to the cluster center [20]. Points far from the center are given a smaller membership degree, while points in the opposite case are given a greater membership degree. But the limitation of this approach is that without considering the close degree among the points. Given two kinds of points which are sparse and dense separately, the former is more likely to be outliers than the latter. If the distance from the points to the class center is equal and both are endowed with the same membership degree, it will be easy to cause some classification error. Interval-valued fuzzy support vector machine (IVF-SVM) [21] considers both the distance and the close degree among the points when defining the membership degree so the classification accuracy has been improved in both artificial data and UCI data experiments.

In recent years, some nonparallel hyperplane classifiers, which are different from those searching for two parallel support hyperplanes, have been proposed. For the generalized eigenvalue proximal support vector machine (GEPSVM) [22], data points of each class are proximal to one of the two nonparallel planes. The nonparallel planes are eigenvectors corresponding to the smallest eigenvalues of two related generalized eigenvalue problems. For the twin support vector machine (TWSVM) [23 –27], it seeks two nonparallel proximal hyperplanes such that each hyperplane is closer to one of the two classes and is at least one distance from the other. The strategy results that TWSVM solves two smaller QPPs, whereas SVM solves one larger QPP, which increases the TWSVM training speed by approximately four-fold compared to that of SVM. For the NPSVM [28 –30], it seeks two nonparallel hyperplanes such that each class locates as much as possible in the ɛ-band of the hyperplane and each hyperplane is at least one distance from the other. NPSVM has several advantages compared to GEPSVM and TWSVM: the structural risk minimization principle, the kernel trick and the sparseness.

In fact, the application of fuzzy set theory in the field of nonparallel support vector machines has aroused great interest. The combination of fuzzy set theory and nonparallel support vector machines has produced some new classification models, such as fuzzy proximal support vector classification via generalized eigenvalues [31], fuzzy twin support vector machine [32] and fuzzy least squares twin support vector machine [33], etc. The performance of these models has been improved to some extent in both artificial data and UCI data experiments.

In this paper, the fuzzy set theory and interval-valued fuzzy set theory are applied to NPSVM. Two new NPSVMs are proposed which are referred to as fuzzy NPSVM (F-NPSVM) and interval-valued fuzzy NPSVM (IVF-NPSVM). Through the distribution of points, we determine the fuzzy membership degree and interval-valued fuzzy membership degree of them. At last SVM is used to classify the points after they are processed in fuzzy and interval-valued fuzzy way.

The rest of this paper is organized as follows. A brief review of FS, IVFS, SVM and NPSVM will be described in Section 2. Then F-NPSVM and IVF-NPSVM will be derived in Section 3. Experiments and a statistical test will be presented in Section 4. Some concluding remarks will be given in Section 5.

2 Preliminary

2.1 FS and IVFS

2.1.1 FS [8]

Let X be a non-empty set, F = {< x, μ_F (x) > |x ∈ X} is called a fuzzy set in X.μ_F is the membership degree function of F, μ_F : X → [0, 1], μ_F (x) represents the membership degree of x belonging to F, and 0 ≤ μ_F (x) ≤1 .

2.1.2 IVFS [12]

Let X be a non-empty set, L = { [a, b] |0 ≤ a ≤ b ≤ 1}. A = {< x, [A^- (x) , A⁺ (x)] > |x ∈ X} is called an interval-valued fuzzy set in X. A^- and A⁺ are the interval-valued membership degree functions of A, A^-, A⁺ : X → L, [A^- (x) , A⁺ (x)] represents the interval of membership degree of x belonging to A, and 0 ≤ A^- (x) ≤ A⁺ (x) ≤1.

The difference π_A (x) = A⁺ (x) - A^- (x) is called the interval-valued fuzzy index and should be treated as a hesitancy margin connected with the evaluation degree while qualify or not each element x to a set A. It is the most important and original idea distinguishing the interval-valued fuzzy set theory from the fuzzy set theory.

2.2 SVM and NPSVM

2.2.1 SVM [1]

Consider the binary classification problem with the training set T = {(x₁, y₁), ⋯, (x_l, y_l)}, where x_i ∈ Rⁿ, y_i ∈ {-1, +1}, i = 1, ⋯, l. The aim of SVM is to find an optimal separating hyperplane (w · x) + b = 0 that classifies the training points correctly or basically correctly, where w ∈ Rⁿ, and the scalar b ∈ R. The separating margin between the two parallel planes (w · x) + b = 1 and (w · x) + b = -1 should be as large as possible. Now to find the optimal separating hyperplane is to solve the following optimization problem

$\begin{matrix} min_{w, b, ξ} \frac{1}{2} | | w | |^{2} + c \sum_{i = 1}^{l} ξ_{i} \\ s . t . y_{i} ((w \cdot x_{i}) + b) \geq 1 - ξ_{i}, i = 1, \dots, l, \\ ξ_{i} \geq 0, i = 1, \dots, l . \end{matrix}$ (1)c > 0 is a penalty parameter. It controls the tradeoff between the maximization of margin and the number of misclassifications.

With Wolfe theory the primal problem (1) can be transformed to its dual problem

$\begin{matrix} max_{α} \sum_{i = 1}^{l} α_{i} - \frac{1}{2} \sum_{i = 1}^{l} \sum_{j = 1}^{l} α_{i} α_{j} y_{i} y_{j} (x_{i} \cdot x_{j}) \\ s . t . \sum_{i = 1}^{l} α_{i} y_{i} = 0, 0 \leq α_{i} \leq c, i = 1, \dots, l . \end{matrix}$ (2) where α = (α₁, ⋯ , α_l) ^T is the vector of nonnegative Lagrange multipliers of problem (1). The training points corresponding to nonzero αⁱ are called support vectors.

w^* can be decided by the KKT (Karush-Kuhn-Tucher) condition of problem (2)

$w^{*} = \sum_{i = 1}^{l} α_{i}^{*} y_{i} x_{i}$ (3)

and choose an element of α^*, $α_{j}^{*} \in (0, c) 1 pt$ , j = 1, ⋯ , l

$b^{*} = y_{j} - \sum_{i = 1}^{l} y_{i} α_{i}^{*} (x_{i} \cdot x_{j})$ (4)

Then a new testing point x can be classified according to the decision function

$f (x) = sgn (\sum_{i = 1}^{l} α_{i}^{*} y_{i} (x_{i} \cdot x) + b^{*})$ (5)

Sometimes it is unnecessary to ask f (x) to be a linear function. In this case, one common strategy is to map the original input points into a high-dimensional feature space F = {φ (x) |x ∈ X} to find a separating hyperplane in it. The optimization problem corresponding to the problem (2) is as follows

$\begin{matrix} max_{α} \sum_{i = 1}^{l} α_{i} - \frac{1}{2} \sum_{i = 1}^{l} \sum_{j = 1}^{l} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) \\ s . t . \sum_{i = 1}^{l} α_{i} y_{i} = 0, \\ 0 \leq α_{i} \leq c, i = 1, \dots, l . \end{matrix}$ (6)

Where K (x_i, x_j) = φ (x_i) · φ (x_j) denotes a kernel.

The decision function is determined by

$f (x) = sgn (\sum_{i = 1}^{l} α_{i}^{*} y_{i} K (x_{i}, x) + b^{*})$ (7)

2.2.2 NPSVM [28]

Consider the binary classification problem with the training set T = {(x₁, +1), ⋯, (x_p, + 1), (x_p+1, -1), ⋯, (x_p+q,-1)}, where x_i ∈ Rⁿ, i = 1, ⋯, p are positive inputs, and x_i ∈ Rⁿ, i = p + 1, ⋯, p + q are negative inputs. In nonparallel support vector machines, NPSVM classifier has been proved more superior theoretically and more efficient compared with the existing TWSVM and GEPSVM. For the linear case, NPSVM seeks two nonparallel hyperplanes by solving two convex QPPs (Quadratic Programming Problems) as follows: $\begin{matrix} min_{w_{+}, b_{+}, η_{+}^{(*)}, ξ_{-}} \frac{1}{2} | | w_{+} | |^{2} + c_{1} \sum_{i = 1}^{p} (η_{i} + η_{i}^{*}) + c_{2} \sum_{j = p + 1}^{p + q} ξ_{j} \\ s . t . 1 pt (w_{+} \cdot x_{i}) + b_{+} \leq ɛ + η_{i}, i = 1, \dots, p, \\ - (w_{+} \cdot x_{i}) - b_{+} \leq ɛ + η_{i}^{*}, i = 1, \dots, p, \\ (w_{+} \cdot x_{j}) + b_{+} \leq - 1 + ξ_{j}, j = p + 1, \dots, p + q, \end{matrix}$ (8) $\begin{matrix} η_{i}, η_{i}^{*} \geq 0, i = 1, \dots, p, \\ ξ_{j} \geq 0, j = p + 1, \dots, p + q . \end{matrix}$ and $\begin{matrix} min_{w_{-}, b_{-}, η_{-}^{(*)}, ξ_{+}} \frac{1}{2} | | w_{-} | |^{2} + c_{3} \sum_{i = p + 1}^{p + q} (η_{i} + η_{i}^{*}) + c_{4} \sum_{j = 1}^{p} ξ_{j} \\ s . t . (w_{-} \cdot x_{i}) + b_{-} \leq ɛ + η_{i}, i = p + 1, \dots, p + q, \\ - (w_{-} \cdot x_{i}) - b_{-} \leq ɛ + η_{i}^{*}, i = p + 1, \dots, p + q, \\ (w_{-} \cdot x_{j}) + b_{-} \geq 1 + ξ_{j}, j = 1, \dots, p, \\ η_{i}, η_{i}^{*} \geq 0, i = p + 1, \dots, p + q, \\ ξ_{j} \geq 0, j = 1, \dots, p . \end{matrix}$ (9)

In the objective function of primal problem (8), the term $\frac{1}{2} | | w_{+} | |^{2}$ expresses maximizing the margin between the hyperplanes (w₊ · x) + b₊ = ± ɛ. ξ_- = (ξ_p+1, ⋯ , ξ_p+q) ^T and $η_{+}^{(*)} = (η_{1}, \dots, η_{p}, η_{1}^{*}, \dots, η_{p}^{*})^{T}$ are slack variables. The term $\sum_{i = 1}^{p} (η_{i} + η_{i}^{*})$ is an empirical risk, which limits that positive points should lie in the ɛ-band of the hyperplane (w₊ · x) + b₊ = 0, the term $\sum_{j = p + 1}^{p + q} ξ_{j}$ is another empirical risk, which restricts that negative points should lie below the bounding plane (w₊ · x) + b₊ = -1. The parameters c₁, c₂ (>0) control the tradeoff between the maximization of margin and the number of misclassifications.

With Lagrangian coefficients $α_{+}^{(*)}$ = (α₁, ⋯, α_p, $α_{1}^{*}$ , ⋯, $α_{p}^{*})^{T}$ and β_- = (β_p+1, ⋯, β_p+q) ^T, the Wolfe’s dual formulation of (8) can be written as:

$\begin{matrix} min_{α_{1 pt +}^{(*)}, β_{-}} 1 pt \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) (x_{i} \cdot x_{j}) \\ - \sum_{i = 1}^{p} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) β_{j} (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} β_{i} β_{j} (x_{i} \cdot x_{j}) + ɛ \sum_{i = 1}^{p} (α_{i}^{*} + α_{i}) - \sum_{j = p + 1}^{p + q} β_{j} \\ s . t . 1 pt \sum_{i = 1}^{p} (α_{i} - α_{i}^{*}) + \sum_{j = p + 1}^{p + q} β_{j} = 0, 1 pt \\ 0 \leq α_{i}, α_{i}^{*} \leq c_{1}, i = 1, \dots, p, \\ 0 \leq β_{j} \leq c_{2}, 1 pt 1 pt 1 pt j = p + 1, \dots, p + q . 1 pt \end{matrix}$ (10)

In a similar way, we could obtain the dual formulation of (9) as follows:

$\begin{matrix} min_{α_{1 pt -}^{(*)}, β_{+}} 1 pt 1 pt 1 pt 1 pt 1 pt 1 pt 1 pt 1 pt 1 pt 1 pt \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) (x_{i} \cdot x_{j}) \\ + \sum_{i = p + 1}^{p + q} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) β_{j} (x_{i} \cdot x_{j}) + \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} β_{i} β_{j} (x_{i} \cdot x_{j}) \\ + ɛ \sum_{i = p + 1}^{p + q} (α_{i}^{*} + α_{i}) - \sum_{j = 1}^{p} β_{j} s . t . 1 pt \sum_{i = p + 1}^{p + q} (α_{i} - α_{i}^{*}) - \sum_{j = 1}^{p} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq c_{3}, 1 pt 1 pt 1 pt i = p + 1, \dots, p + q, \\ 1 pt 0 \leq β_{j} \leq c_{4}, 1 pt 1 pt 1 pt j = 1, \dots, p . \end{matrix}$ (11)

After solving the problem (10), w₊ can be expressed as

$w_{+} = \sum_{i = 1}^{p} (α_{i}^{*} - α_{i}) x_{i} - \sum_{j = p + 1}^{p + q} β_{j} x_{j}$ (12)

and choose a component of $α_{+}^{(*)}$ ,α_i ∈ (0, c₁) 1pt, i = 1, ⋯ , p, then

$b_{+} = - (w_{+} \cdot x_{i}) + ɛ$ (13)

Similarly, w_- can be expressed as

$w_{-} = \sum_{i = p + 1}^{p + q} (α_{i}^{*} - α_{i}) x_{i} + \sum_{j = 1}^{p} β_{j} x_{j}$ (14)

and choose a component of $α_{+}^{(*)}$ , α_i ∈ (0, c₃), 1pt1pt1pti = p + 1, ⋯ , p + q, then

$b_{-} = - (w_{-} \cdot x_{i}) + ɛ$ (15)

A new testing point x is classified as +1 or −1 depending on which of the two hyperplanes it lies closer to, i.e.,

$Class i = \underset{k = +, -}{arg min} | w_{k} \cdot x + b_{k} | / | | w_{k} | |,$ (16)

where | · | is the absolute value.

In nonlinear situation, the optimization problem corresponding to the problem (8) and (9) are as follows

$\begin{matrix} min_{α_{+}^{(*)}, β_{-}} \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) K (x_{i} \cdot x_{j}) \\ - \sum_{i = 1}^{p} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) β_{j} K (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} β_{i} β_{j} K (x_{i} \cdot x_{j}) + ɛ \sum_{i = 1}^{p} (α_{i}^{*} + α_{i}) - \sum_{j = p + 1}^{p + q} β_{j} \\ s . t . \sum_{i = 1}^{p} (α_{i} - α_{i}^{*}) + \sum_{j = p + 1}^{p + q} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq c_{1}, i = 1, \dots, p, \\ 0 \leq β_{j} \leq c_{2}, j = p + 1, \dots, p + q . \end{matrix}$ (17) and

$\begin{matrix} min_{α_{-}^{(*)}, β_{+}} \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) K (x_{i} \cdot x_{j}) \\ + \sum_{i = p + 1}^{p + q} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) β_{j} K (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} β_{i} β_{j} K (x_{i} \cdot x_{j}) + ɛ \sum_{i = p + 1}^{p + q} (α_{i}^{*} + α_{i}) - \sum_{j = 1}^{p} β_{j} \\ s . t . \sum_{i = p + 1}^{p + q} (α_{i} - α_{i}^{*}) - \sum_{j = 1}^{p} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq c_{3}, i = p + 1, \dots, p + q, \\ 0 \leq β_{j} \leq c_{4}, j = 1, \dots, p . \end{matrix}$ (18)

where K (x_i, x_j) = φ (x_i) · φ (x_j) denotes a kernel. The decision functions and the judgment method are determined in a similar way.

3 Fuzzy nonparallel support vector machine(F-NPSVM)and interval-valued fuzzy nonparallel support vector machine (IVF-NPSVM)

3.1 F-NPSVM

In many real applications, the effects of the training points are different. Some training points are more important than others in the classification problem. We would require that the meaningful training points must be classified correctly and better weaken the effect of some points like noises.

That is, each training point no more exactly belongs to one of the two classes. It may 70% belong to one class and 30% be meaningless, and it may 10% belong to one class and 90% be meaningless. In other words, there is a fuzzy membership degree associated with each training point. This fuzzy membership degree can be regarded as the attitude of the corresponding training point toward one class in the classification problem. So, the concept of NPSVM is extended with fuzzy membership degree and becomes F-NPSVM.

In order to find two nonparallel hyperplanes that classify the training points more correctly, we have the following Algorithm1: F-NPSVM.

Algorithm 1: F-NPSVM

Step 1: Compute the centers of positive class and negative class.

$C^{+} = \sum_{i = 1}^{p} x_{i} / - p;$ (19)

$C^{-} = \sum_{i = p + 1}^{p + q} x_{i} / - q$ (20)

Step 2: Find the distance between x_i and the center of the same class.

For positive point x_i (i = 1, 2, ⋯ , p),

$D (x_{i}, C^{+}) = | x_{i} - C^{+} | (i = 1, 2, \dots, p)$ (21)

The distance between negative point and the class center is processed in a similar way.

Step 3: Define the fuzzy membership degree of x_i.

For positive point x_i(i = 1, 2, ⋯ , p), assume its fuzzy membership degree is m_i,0 ≤ m_i ≤ 1, then let

$m_{i} = 1 - \frac{D (x_{i}, C^{+})}{max_{k} D (x_{k}, C^{+})}$ (22)

x_k belongs to the positive class.

The fuzzy membership degree of negative point is determined in a similar way.

Step 4: Construct and solve two convex QPPs

Since m_i is the membership degree of point x_i toward one class and the parameter ξ_i is the measure of error, the term m_iξ_i is the measure of error with different weight. The optimal hyperplane problem is then regarded as the two QPPs.

$\begin{matrix} min_{w_{+}, b_{+}, η_{+}^{(*)}, ξ_{-}} \frac{1}{2} | | w_{+} | |^{2} + c_{1} \sum_{i = 1}^{p} m_{i} (η_{i} + η_{i}^{*}) + c_{2} \sum_{j = p + 1}^{p + q} m_{j} ξ_{j} \\ s . t . (w_{+} \cdot x_{i}) + b_{+} \leq ɛ + η_{i}, i = 1, \dots, p, \\ - (w_{+} \cdot x_{i}) - b_{+} \leq ɛ + η_{i}^{*}, i = 1, \dots, p, \\ (w_{+} \cdot x_{j}) + b_{+} \leq - 1 + ξ_{j}, j = p + 1, \dots, p + q, \\ η_{i}^{*} \geq 0, i = 1, \dots, p, \\ ξ_{j} \geq 0, j = p + 1, \dots, p + q . \end{matrix}$ (23) and

$\begin{matrix} min_{w_{-}, b_{-}, η_{-}^{(*)}, ξ_{+}} \frac{1}{2} | | w_{-} | |^{2} + c_{3} \sum_{i = p + 1}^{p + q} m_{i} (η_{i} + η_{i}^{*}) + c_{4} \sum_{j = 1}^{p} m_{j} ξ_{j} \\ s . t . (w_{-} \cdot x_{i}) + b_{-} \leq ɛ + η_{i}, i = p + 1, \dots, p + q, \\ - (w_{-} \cdot x_{i}) - b_{-} \leq ɛ + η_{i}^{*}, i = p + 1, \dots, p + q, \\ (w_{-} \cdot x_{j}) + b_{-} \geq 1 + ξ_{j}, j = 1, \dots, p, \\ η_{i}, η_{i}^{*} \geq 0, i = p + 1, \dots, p + q, \\ ξ_{j} \geq 0, j = 1, \dots, p . \end{matrix}$ (24)

With Lagrangian coefficients $α_{+}^{(*)} = (α_{1}, \dots, α_{p}, α_{1}^{*}, \dots, α_{p}^{*})^{T}$ and β_- = (β_p+1, ⋯ , β_p+q) ^T, the primal problems (23) and (24) can be transformed to its dual problems

$\begin{matrix} min_{α_{+}^{(*)}, β_{-}} \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) (x_{i} \cdot x_{j}) \\ - \sum_{i = 1}^{p} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) β_{j} (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} β_{i} β_{j} (x_{i} \cdot x_{j}) + ɛ \sum_{i = 1}^{p} (α_{i}^{*} + α_{i}) - \sum_{j = p + 1}^{p + q} β_{j} \\ s . t . \sum_{i = 1}^{p} (α_{i} - α_{i}^{*}) + \sum_{j = p + 1}^{p + q} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq m_{i} c_{1}, i = 1, \dots, p, \\ 0 \leq β_{j} \leq m_{j} c_{2}, j = p + 1, \dots, p + q . \end{matrix}$ (25) and

$\begin{matrix} min_{α_{-}^{(*)}, β_{+}} \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) (x_{i} \cdot x_{j}) \\ + \sum_{i = p + 1}^{p + q} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) β_{j} (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} β_{i} β_{j} (x_{i} \cdot x_{j}) + ɛ \sum_{i = p + 1}^{p + q} (α_{i}^{*} + α_{i}) - \sum_{j = 1}^{p} β_{j} \\ s . t . \sum_{i = p + 1}^{p + q} (α_{i} - α_{i}^{*}) - \sum_{j = 1}^{p} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq m_{i} c_{3}, i = p + 1, \dots, p + q, \\ 0 \leq β_{j} \leq m_{j} c_{4}, j = 1, \dots, p . \end{matrix}$ (26)

Step 5: Construct the decision functions

After solving the dual problem (25), w₊ can be expressed as

$w_{+} = \sum_{i = 1}^{p} (α_{i}^{*} - α_{i}) x_{i} - \sum_{j = p + 1}^{p + q} β_{j} x_{j}$ (27)

Choose a component of $α_{+}^{(*)}$ , α_i ∈ (0, m_ic₁), 1pti = 1, ⋯ , p, then

$b_{+} = - (w_{+} \cdot x_{i}) + ɛ$ (28)

Similarly, w_- can be expressed as

$w_{-} = \sum_{i = p + 1}^{p + q} (α_{i}^{*} - α_{i}) x_{i} + \sum_{j = 1}^{p} β_{j} x_{j}$ (29)

Choose a component of $α_{+}^{(*)}$ ,α_i ∈ (0, m_ic₃) , 1pt1pt1pt i = p + 1, ⋯ , p + q, then

$b_{-} = - (w_{-} \cdot x_{i}) + ɛ$ (30)

then the decision functions can be expressed as

$f_{+} (x) = \sum_{i = 1}^{p} (α_{i}^{*} - α_{i}) (x_{i} \cdot x) - \sum_{j = p + 1}^{p + q} β_{j} (x_{j} \cdot x) + b_{+}$ (31)

$f_{-} (x) = \sum_{i = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (x_{i} \cdot x) + \sum_{j = 1}^{p} β_{j} (x_{j} \cdot x) + b_{-}$ (32)

Step 6: Make the judgment

A new testing point x is classified as +1 or −1 depending on which of the two hyperplanes it lies closer to, i.e.,

$Class i = \underset{k = +, -}{arg min} | w_{k} \cdot x + b_{k} | / | | w_{k} | |,$ (33)

where | · | is the absolute value.

In nonlinear situation, the optimization problems corresponding to the problems (23) and (24) are as follows

$\begin{matrix} min_{α_{+}^{(*)}, β_{-}} \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) K (x_{i} \cdot x_{j}) \\ - \sum_{i = 1}^{p} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) β_{j} K (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} β_{i} β_{j} K (x_{i} \cdot x_{j}) + ɛ \sum_{i = 1}^{p} (α_{i}^{*} + α_{i}) - \sum_{j = p + 1}^{p + q} β_{j} \\ s . t . \sum_{i = 1}^{p} (α_{i} - α_{i}^{*}) + \sum_{j = p + 1}^{p + q} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq m_{i} c_{1}, i = 1, \dots, p, \\ 0 \leq β_{j} \leq m_{j} c_{2}, j = p + 1, \dots, p + q . \end{matrix}$ (34) and

$\begin{matrix} min_{α_{-}^{(*)}, β_{+}} \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) K (x_{i} \cdot x_{j}) \\ + \sum_{i = p + 1}^{p + q} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) β_{j} K (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} β_{i} β_{j} K (x_{i} \cdot x_{j}) + ɛ \sum_{i = p + 1}^{p + q} (α_{i}^{*} + α_{i}) - \sum_{j = 1}^{p} β_{j} \\ s . t . \sum_{i = p + 1}^{p + q} (α_{i} - α_{i}^{*}) - \sum_{j = 1}^{p} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq m_{i} c_{3}, i = p + 1, \dots, p + q, \\ 0 \leq β_{j} \leq m_{j} c_{4}, j = 1, \dots, p . \end{matrix}$ (35)

where K (x_i, x_j) = φ (x_i) · φ (x_j) denotes a kernel. The decision functions and the judgment method are determined in a similar way.

3.2 IVF-NPSVM

Combining NPSVM and fuzzy set theory, the classifier F-NPSVM is proposed. But this method determining the membership degree has a limitation: not considering the ambient conditions around the points. In this part, we will put forward a new classifier IVF-NPSVM which combines NPSVM and interval-valued fuzzy set theory to solve the problem.

Consider the binary classification problem with the training set T = {(x₁, + 1) , ⋯ , (x_p, + 1) , (x_p+1, - 1) , ⋯, (x_p+q, - 1)} where x_i ∈ Rⁿ, i = 1, ⋯ , p are positive inputs, and x_i ∈ Rⁿ, i = p + 1, ⋯ , p + q are negative inputs.

In order to find two nonparallel hyperplanes that classify the training points more correctly, we have the following Algorithm2: IVF-NPSVM.

Algorithm 2: IVF-NPSVM

Step 1: Compute the centers of positive class and negative class.

$C^{+} = \sum_{i = 1}^{p} x_{i} / - p;$ (36)

$C^{-} = \sum_{i = p + 1}^{p + q} x_{i} / - q$ (37)

Step 2: Find the distance between x_i and the center of the same class.

For positive point x_i (i = 1, 2, ⋯ , p)

$D (x_{i}, C^{+}) = | x_{i} - C^{+} | (i = 1, 2, \dots, p)$ (38)

The distance between negative point and the class center is processed in a similar way.

Step 3: Find the number of all points, the same class points and the different class points of x_i within R (R > 0).

For positive point x_i (i = 1, 2, ⋯ , p),

$ρ (x_{i}, R) = Num {. x_{j} | D (x_{i}, x_{j}) \leq R}$ (39)

$ρ^{+} (x_{i}, R) = Num {. x_{j} | D (x_{i}, x_{j}) \leq R, y_{j} = + 1}$ (40)

$ρ^{-} (x_{i}, R) = Num {. x_{j} | D (x_{i}, x_{j}) \leq R, y_{j} = - 1}$ (41)

Num represents the number of elements in the following set and R is the neighborhood radius for point x_i which can be adjusted.

Obviously,

$ρ (x_{i}, R) = ρ^{+} (x_{i}, R) + ρ^{-} (x_{i}, R)$ (42)

The negative points are processed in a similar way.

Step 4: Define the interval-valued membership degree of x_i.

For positive point x_i (i = 1, 2, ⋯ , p), assume its interval-valued fuzzy membership degree is $[{m_{i}}^{-}, m_{i}^{+}]$ , $0 \leq {m_{i}}^{-} \leq m_{i}^{+} \leq 1$ , then let

${m_{i}}^{-} = 1 - \frac{D (x_{i}, C^{+})}{max_{k} D (x_{k}, C^{+})},$ (43)

$+ = {m_{i}}^{-} + \frac{ρ^{+} (x_{i}, R)}{max_{k} ρ^{+} (x_{k}, R)} \cdot \frac{ρ^{+} (x_{i}, R)}{ρ (x_{i}, R)} (1 - {m_{i}}^{-})$ (44)

x_k belongs to the positive class.

The interval-valued fuzzy membership degree of negative points is determined in a similar way.

Step 5: Construct and solve two convex QPPs.

$min_{w_{+}, b_{+}, η_{+}^{(*)}, ξ_{-}} \frac{1}{2} | | w_{+} | |^{2} + c_{1} \sum_{i = 1}^{p} ({m_{i}}^{-} + t {m_{i}}^{+}) (η_{i} + η_{i}^{*}) + c_{2} \sum_{j = p + 1}^{p + q} ({m_{j}}^{-} + k {m_{j}}^{+}) ξ_{j} s . t . (w_{+} \cdot x_{i}) + b_{+} \leq ɛ + η_{i}, i = 1, \dots, p, - (w_{+} \cdot x_{i}) - b_{+} \leq ɛ + η_{i}^{*}, i = 1, \dots, p, (w_{+} \cdot x_{j}) + b_{+} \leq - 1 + ξ_{j}, j = p + 1, \dots, p + q, η_{i}, η_{i}^{*} \geq 0, i = 1, \dots, p, ξ_{j} \geq 0, j = p + 1, \dots, p + q .$ (45) and $\begin{matrix} min_{w_{-}, b_{-}, η_{-}^{(*)}, ξ_{+}} \frac{1}{2} | | w_{-} | |^{2} + c_{3} \sum_{i = p + 1}^{p + q} ({m_{i}}^{-} + k {m_{i}}^{+}) (η_{i} + η_{i}^{*}) \\ + c_{4} \sum_{j = 1}^{p} ({m_{j}}^{-} + t {m_{j}}^{+}) ξ_{j} \\ s . t . (w_{-} \cdot x_{i}) + b_{-} \leq ɛ + η_{i}, i = p + 1, \dots, p + q, \\ - (w_{-} \cdot x_{i}) - b_{-} \leq ɛ + η_{i}^{*}, i = p + 1, \dots, p + q, \\ (w_{-} \cdot x_{j}) + b_{-} \geq 1 + ξ_{j}, j = 1, \dots, p, \\ η_{i}, η_{i}^{*} \geq 0, i = p + 1, \dots, p + q, \\ ξ_{j} \geq 0, j = 1, \dots, p . \end{matrix}$ (46)t (t ≥ 0) and k (k ≥ 0) are the combination coefficients that can be adjusted.

With Lagrangian coefficients $α_{+}^{(*)} = (α_{1}, \dots, α_{p}, α_{1}^{*}, \dots, α_{p}^{*})^{T}$ and β_- = (β_p+1, ⋯ , β_p+q) ^T, the primal problems (45) and (46) can be transformed to its dual problems $\begin{matrix} min_{α_{+}^{(*)}, β_{-}} \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) (x_{i} \cdot x_{j}) \\ - \sum_{i = 1}^{p} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) β_{j} (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} β_{i} β_{j} (x_{i} \cdot x_{j}) + ɛ \sum_{i = 1}^{p} (α_{i}^{*} + α_{i}) - \sum_{j = p + 1}^{p + q} β_{j} \\ s . t . \sum_{i = 1}^{p} (α_{i} - α_{i}^{*}) + \sum_{j = p + 1}^{p + q} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq ({m_{i}}^{-} + t {m_{i}}^{+}) c_{1}, i = 1, \dots, p, \\ 0 \leq β_{j} \leq ({m_{j}}^{-} + k {m_{j}}^{+}) c_{2}, j = p + 1, \dots, p + q . \end{matrix}$ (47) and $\begin{matrix} min_{α_{-}^{(*)}, β_{+}} \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) (x_{i} \cdot x_{j}) \\ + \sum_{i = p + 1}^{p + q} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) β_{j} (x_{i} \cdot x_{j}) + \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} β_{i} β_{j} (x_{i} \cdot x_{j}) \\ + ɛ \sum_{i = p + 1}^{p + q} (α_{i}^{*} + α_{i}) - \sum_{j = 1}^{p} β_{j} s . t . \sum_{i = p + 1}^{p + q} (α_{i} - α_{i}^{*}) - \sum_{j = 1}^{p} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq ({m_{i}}^{-} + k {m_{i}}^{+}) c_{3}, i = p + 1, \dots, p + q, \\ 0 \leq β_{j} \leq ({m_{j}}^{-} + t {m_{j}}^{+}) c_{4}, j = 1, \dots, p . \end{matrix}$ (48)

Step 6: Construct the decision functions

After solving the dual problem (47), w₊ can be expressed as $w_{+} = \sum_{i = 1}^{p} (α_{i}^{*} - α_{i}) x_{i} - \sum_{j = p + 1}^{p + q} β_{j} x_{j}$ (49)

Choose a component of $α_{+}^{(*)}$ ,α_i∈ (0, (m_i^- + tm_i⁺) c₁), 1pti = 1, ⋯ , p, then $b_{+} = - (w_{+} \cdot x_{i}) + ɛ$ (50)

Similarly, w_- can be expressed as $w_{-} = \sum_{i = p + 1}^{p + q} (α_{i}^{*} - α_{i}) x_{i} + \sum_{j = 1}^{p} β_{j} x_{j}$ (51)

Choose a component of $α_{+}^{(*)}$ , α_i ∈ (0, (m_i^- + tm_i⁺) c₃), i = p + 1, ⋯ , p + q, then $b_{-} = - (w_{-} \cdot x_{i}) + ɛ$ (52) then the decision functions can be expressed as $f_{+} (x) = \sum_{i = 1}^{p} (α_{i}^{*} - α_{i}) (x_{i} \cdot x) - \sum_{j = p + 1}^{p + q} β_{j} (x_{j} \cdot x) + b_{+}$ (53) $f_{-} (x) = \sum_{i = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (x_{i} \cdot x) + \sum_{j = 1}^{p} β_{j} (x_{j} \cdot x) + b_{-}$ (54)

Step 7: Make the judgment.

A new testing point x is classified as +1 or % −1 depending on which of the two hyperplanes it lies closer to, i.e., $Class i = \underset{k = +, -}{arg min} | w_{k} \cdot x + b_{k} | / | | w_{k} | |,$ (55) where | · | is the absolute value.

In nonlinear situation, the optimization problems corresponding to the problems (45) and (46) are as follows $\begin{matrix} min_{α_{+}^{(*)}, β_{-}} \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) K (x_{i} \cdot x_{j}) \\ - \sum_{i = 1}^{p} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) β_{j} K (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} β_{i} β_{j} K (x_{i} \cdot x_{j}) + ɛ \sum_{i = 1}^{p} (α_{i}^{*} + α_{i}) - \sum_{j = p + 1}^{p + q} β_{j} \\ s . t . \sum_{i = 1}^{p} (α_{i} - α_{i}^{*}) + \sum_{j = p + 1}^{p + q} β_{j} = 0, \\ 0 \leq α_{i}, α_{i}^{*} \leq ({m_{i}}^{-} + t {m_{i}}^{+}) c_{1}, i = 1, \dots, p, \\ 0 \leq β_{j} \leq ({m_{j}}^{-} + k {m_{j}}^{+}) c_{2}, j = p + 1, \dots, p + q . \end{matrix}$ (56) and $\begin{matrix} min_{α_{-}^{(*)}, β_{+}} \frac{1}{2} \sum_{i = p + 1}^{p + q} \sum_{j = p + 1}^{p + q} (α_{i}^{*} - α_{i}) (α_{j}^{*} - α_{j}) K (x_{i} \cdot x_{j}) \\ + \sum_{i = p + 1}^{p + q} \sum_{j = 1}^{p} (α_{i}^{*} - α_{i}) β_{j} K (x_{i} \cdot x_{j}) \\ + \frac{1}{2} \sum_{i = 1}^{p} \sum_{j = 1}^{p} β_{i} β_{j} K (x_{i} \cdot x_{j}) + ɛ \sum_{i = p + 1}^{p + q} (α_{i}^{*} + α_{i}) - \sum_{j = 1}^{p} β_{j} \\ s . t . \sum_{i = p + 1}^{p + q} (α_{i} - α_{i}^{*}) - \sum_{j = 1}^{p} β_{j} = 0, \\ 0 \leq β_{j} \leq ({m_{j}}^{-} + t {m_{j}}^{+}) c_{4}, j = 1, \dots, p . \\ 0 \leq α_{i}, α_{i}^{*} \leq ({m_{i}}^{-} + k {m_{i}}^{+}) c_{3}, i = p + 1, \dots, p + q, \end{matrix}$ (57) where K (x_i, x_j) = φ (x_i) · φ (x_j) denotes a kernel. The decision functions and the judgment method are determined in a similar way.

4 Experiments results

In this section, in order to validate the performance of our F-NPSVM and IVF-NPSVM, we compare them with SVM, IVF-SVM, GEPSVM, TWSVM and NPSVM on different types of datasets. All methods are implemented in MATLAB 2012a on a PC with Intel(R) Core (TM) i7-3520M CPU 2.90GHz processor and 8GB RAM.

4.1 Experiments conducted on artificial dataset

As shown in Fig. 1, (a) shows a true data distribution of 80 artificial points. The ratio of the two types of points is 3:7, where 24 positive points are showed in red and 56 negative points are showed in blue. These two types of points are centered on (4, 1), (6, 2), and generated by Gaussian distribution respectively. (b)-(h) show the linear classifiers obtained by SVM, IVF-SVM, GEPSVM, TWSVM, NPSVM, F-NPSVM and IVF-NPSVM separately. Both F-NPSVM and IVF-NPSVM consider the membership degree of the input points. In (b)-(c), the two black lines are the support lines, and the blue one is the decision line. In (d)-(h), the two black lines are the positive proximal line and the negative proximal line, and the blue one is the decision line. From (b)-(h), it is easy to see that NPSVM, F-NPSVM and IVF-NPSVM can obtain the best accuracy. Among them, although the classification lines are very similar, the proximal lines of the positive points in F-NPSVM and IVF-NPSVM are better.

Fig.1

(a) The true data distribution. (b) SVM. (c) IVF-SVM. (d) GEPSVM. (e) TWSVM. (f) NPSVM. (g) F-NPSVM. (h) IVF-NPSVM.

4.2 Experiments conducted on UCI datasets

We perform the seven algorithms on 22 UCI datasets. The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. All samples were processed such that the features locate in [0, 1] before training. For each method, the parameters are tuned for the best classification accuracy. The procedures are repeated five times and Table 1 shows the comparison results in terms of accuracy, training time and p-value for SVM, IVF-SVM, GEPSVM, TWSVM, NPSVM, F-NPSVM and IVF-NPSVM with RBF kernel.

Table 1
Comparisons on UCI datasets with RBF kernel

Datasets SVM Accuracy Time(s) p-value INF-SVM Accuracy Time(s) p-value GEPSVM Accuracy Time(s) p-value TWSVM Accuracy Time(s) p-value NPSVM Accuracy Time(s) p-value F-NPSVM Accuracy Time(s) p-value INF-NPSVM Accuracy Time(s) p-value

WPBC  (198 × 34) 82.36 ± 4.67
0.17 82.88 ± 4.27
0.34 80.85 ± 3.02
0.004 78.82 ± 2.79
0.30 81.85 ± 5.02
0.34 81.97 ± 4.50
0.39 82.22 ± 4.37
0.51

0.6655 0.7035 0.5765 0.3655 0.8587 0.8655

Teaching  (151 × 5) 74.84 ± 4.31
0.15 75.01 ± 4.40
0.28 73.33 ± 4.25
0.003 76.81 ± 3.72
0.25 80.16 ± 3.97
0.19 80.20 ± 3.97
0.21 80.29 ± 3.25
0.32

0.0832 0.1047 0.2764 0.2079 0.9931 0.9942

Sonar  (208 × 60) 88.44 ± 3.29
0.20 88.65 ± 3.83
0.44 81.31 ± 2.68
0.001 86.55 ± 4.95
0.34 88.46 ± 3.52
0.34 88.74 ± 2.90
0.38 89.03 ± 3.71
0.70

0.8558 0.8691 0.3856 0.4678 0.8671 0.8906

Seeds  (210 × 7) 93.81 ± 4.42
0.21 94.16 ± 4.25
0.48 94.03 ± 2.66
0.002 97.14 ± 2.33
0.32 96.67 ± 2.43
0.39 96.84 ± 2.78
0.43 97.32 ± 3.69
0.70

0.2220 0.3775 0.5338 0.8327 0.6776 0.8417

Parkinsons  (195 × 22) 94.41 ± 2.42
0.19 95.12 ± 2.68
0.37 96.40 ± 1.86
0.003 93.30 ± 3.94
0.42 95.90 ± 1.26
0.22 96.17 ± 1.91
0.26 96.37 ± 2.30
0.65

0.7896 0.7901 0.6672 0.5177 0.4561 0.7466

Fertility  (100 × 9) 88.08 ± 4.89
0.09 88.11 ± 4.14
0.14 87.03 ± 4.98
0.002 88.08 ± 4.89
0.14 90.03 ± 4.46
0.10 91.13 ± 4.74
0.12 91.18 ± 4.74
0.15

0.3955 0.4533 0.4618 0.3955 0. 7430 0.8089

Heartstatlog  (270 × 13) 82.96 ± 4.12
0.29 83.41 ± 5.02
0.56 83.29 ± 5.63
0.024 83.33 ± 2.87
0.44 83.70 ± 7.26
0.60 83.88 ± 5.86
0.71 84.44 ± 4.63
0.88

0.6454 0.6454 0. 5228 0.6938 0.7676 0.7735

Echdiogram  (131 × 10) 88.45 ± 5.06
0.12 89.06 ± 6.22
0.29 88.27 ± 6.42
0.002 87.68 ± 6.39
0.19 89.25 ± 6.69
0.16 89.27 ± 7.59
0.22 89.30 ± 4.81
0.50

0.8316 0.8864 0.5643 0.7104 0.9938 0.9955

Hepatitis  (155 × 19) 83.90 ± 8.99
0.12 83.95 ± 7.94
0.22 80.97 ± 7.50
0.001 84.57 ± 3.54
0.18 86.45 ± 8.91
0.19 86.45 ± 5.41
0.21 86.84 ± 6.88
0.30

0.7405 0.7539 0. 6296 0.7512 0.8035 0.8097

Vertebral  (310 × 6) 84.52 ± 2.99
0.42 84.64 ± 3.25
0.89 83.33 ± 2.78
0.005 84.19 ± 4.61
0.59 85.48 ± 4.78
0.80 85.48 ± 4.98
0.95 85.48 ± 3.53
1.08

0.7396 0.7726 0.5613 0.5648 0.8706 0.8715

Australian  (690 × 14) 86.81 ± 2.68
1.83 87.21 ± 2.37
4.09 75.47 ± 2.98
0.016 86.81 ± 2.28
3.84 87.24 ± 1.77
5.06 87.83 ± 2.16
7.65 88.23 ± 3.25
9.65

0.4021 0.5532 0.5358 0. 4820 0. 7299 0.8043

Diabetes  (768 × 8) 77.21 ± 2.83
3.78 77.43 ± 3.84
7.65 71.30 ± 3.19
0.145 77.34 ± 3.59
4.72 77.80 ± 2.15
6.87 78.01 ± 3.34
8.04 78.65 ± 2.98
10.04

0. 6559 0.7020 0. 3137 0. 7368 0.8140 0.8341

Haberman  (306 × 3) 74.19 ± 1.50
0.40 74.44 ± 2.28
0.88 75.43 ± 3.76
0.007 74.84 ± 2.19
0.57 75.17 ± 2.30
0.53 75.17 ± 3.76
0.70 75.38 ± 5.10
0.90

0. 7039 0. 7398 0. 7153 0. 8369 0.8080 0.8226

Heartcancer  (303 × 14) 96.38 ± 5.90
0.39 96.55 ± 4.22
0.79 90.05 ± 4.46
0.001 100.0 ± 0.000.55 100.0 ± 0.00
0.43 100.0 ± 0.00
0.65 100.0 ± 0.00
0.99

0.0074 0.0132 0.0399 NaN NaN NaN

Heartdisease   (294 × 13) 76.87 ± 4.91
0.36 76.99 ± 3.19
0.65 82.66 ± 6.07
0.009 84.01 ± 3.10
0.53 80.27 ± 4.42
0.45 81.29 ± 2.63
0.52 81.62 ± 2.63
0.88

0. 4106 0. 4334 0.5158 0.6128 0.8092 0.8792

Ionosphere   (351 × 33) 95.44 ± 0.58
0.55 95.73 ± 0.58
0.92 90.16 ± 2.42
0.006 93.17 ± 1.85
0.99 95.02 ± 1.86
0.62 96.02 ± 2.07
0.68 96.22 ± 2.47
0.99

0.3645 0.3645 04177 0. 2014 0. 6228 0.8473

Spect   (267 × 44) 81.27 ± 5.98
0.29 81.46 ± 4.55
0.48 81.75 ± 3.57
0.004 81.66 ± 4.07
0.49 80.92 ± 4.51
0.62 81.94 ± 2.49
0.59 82.40 ± 2.76
0.50

0.6928 0.7015 0.2433 0.7140 0.5613 0.8236

BUPA   (345 × 6) 73.04 ± 3.50
0.57 73.19 ± 4.63
1.12 68.96 ± 4.24
0.096 72.18 ± 4.36
0.85 71.30 ± 5.19
0.54 71.72 ± 5.30
1.04 71.88 ± 7.19
1.50

0.4313 0.5364 0.6721 0.5133 0. 7759 0.8018

Breast   (683 × 9) 96.20 ± 3.61
3.16 96.78 ± 2.91
8.33 95.43 ± 2.43
0.032 97.07 ± 0.46
4.08 97.07 ± 1.22
4.69 97.87 ± 1.22
6.69 98.02 ± 1.22
10.69

0. 2139 0. 2845 0.3864 0.5944 0.8572 0.8922

BTSC   (748 × 4) 78.74 ± 1.51
3.48 78.87 ± 1.34
9.62 78.27 ± 1.75
0.035 78.88 ± 3.24
4.40 78.75 ± 2.57
5.72 78.76 ± 2.75
8.72 78.86 ± 3.57
11.72

0. 5187 0.5447 0. 7428 0.5921 0.8123 0.8265

Balancescale   (576 × 4) 97.92 ± 1.60
1.74 98.12 ± 0.67
5.09 96.45 ± 0.72
0.026 100.00 ± 0.00
2.64 99.65 ± 0.42
2.39 100.00 ± 0.00
5.98 100.00 ± 0.00
9.04

0.5162 0.5263 0.6396 NaN 0.7839 NaN

WDBC   (569 × 30) 97.72 ± 1.42
1.98 97.88 ± 2.01
4.52 98.22 ± 1.33
0.028 98.07 ± 1.02
2.70 98.24 ± 1.23
4.09 98.44 ± 2.23
6.09 98.89 ± 2.55
9.09

0.4154 0.4199 0.5627 0.5608 0.7954 0.8022

Datasets	SVM Accuracy Time(s) p-value	INF-SVM Accuracy Time(s) p-value	GEPSVM Accuracy Time(s) p-value	TWSVM Accuracy Time(s) p-value	NPSVM Accuracy Time(s) p-value	F-NPSVM Accuracy Time(s) p-value	INF-NPSVM Accuracy Time(s) p-value
WPBC (198 × 34)	82.36 ± 4.67 0.17	82.88 ± 4.27 0.34	80.85 ± 3.02 0.004	78.82 ± 2.79 0.30	81.85 ± 5.02 0.34	81.97 ± 4.50 0.39	82.22 ± 4.37 0.51
0.6655	0.7035	0.5765	0.3655	0.8587	0.8655
Teaching (151 × 5)	74.84 ± 4.31 0.15	75.01 ± 4.40 0.28	73.33 ± 4.25 0.003	76.81 ± 3.72 0.25	80.16 ± 3.97 0.19	80.20 ± 3.97 0.21	80.29 ± 3.25 0.32
0.0832	0.1047	0.2764	0.2079	0.9931	0.9942
Sonar (208 × 60)	88.44 ± 3.29 0.20	88.65 ± 3.83 0.44	81.31 ± 2.68 0.001	86.55 ± 4.95 0.34	88.46 ± 3.52 0.34	88.74 ± 2.90 0.38	89.03 ± 3.71 0.70
0.8558	0.8691	0.3856	0.4678	0.8671	0.8906
Seeds (210 × 7)	93.81 ± 4.42 0.21	94.16 ± 4.25 0.48	94.03 ± 2.66 0.002	97.14 ± 2.33 0.32	96.67 ± 2.43 0.39	96.84 ± 2.78 0.43	97.32 ± 3.69 0.70
0.2220	0.3775	0.5338	0.8327	0.6776	0.8417
Parkinsons (195 × 22)	94.41 ± 2.42 0.19	95.12 ± 2.68 0.37	96.40 ± 1.86 0.003	93.30 ± 3.94 0.42	95.90 ± 1.26 0.22	96.17 ± 1.91 0.26	96.37 ± 2.30 0.65
0.7896	0.7901	0.6672	0.5177	0.4561	0.7466
Fertility (100 × 9)	88.08 ± 4.89 0.09	88.11 ± 4.14 0.14	87.03 ± 4.98 0.002	88.08 ± 4.89 0.14	90.03 ± 4.46 0.10	91.13 ± 4.74 0.12	91.18 ± 4.74 0.15
0.3955	0.4533	0.4618	0.3955	0. 7430	0.8089
Heartstatlog (270 × 13)	82.96 ± 4.12 0.29	83.41 ± 5.02 0.56	83.29 ± 5.63 0.024	83.33 ± 2.87 0.44	83.70 ± 7.26 0.60	83.88 ± 5.86 0.71	84.44 ± 4.63 0.88
0.6454	0.6454	0. 5228	0.6938	0.7676	0.7735
Echdiogram (131 × 10)	88.45 ± 5.06 0.12	89.06 ± 6.22 0.29	88.27 ± 6.42 0.002	87.68 ± 6.39 0.19	89.25 ± 6.69 0.16	89.27 ± 7.59 0.22	89.30 ± 4.81 0.50
0.8316	0.8864	0.5643	0.7104	0.9938	0.9955
Hepatitis (155 × 19)	83.90 ± 8.99 0.12	83.95 ± 7.94 0.22	80.97 ± 7.50 0.001	84.57 ± 3.54 0.18	86.45 ± 8.91 0.19	86.45 ± 5.41 0.21	86.84 ± 6.88 0.30
0.7405	0.7539	0. 6296	0.7512	0.8035	0.8097
Vertebral (310 × 6)	84.52 ± 2.99 0.42	84.64 ± 3.25 0.89	83.33 ± 2.78 0.005	84.19 ± 4.61 0.59	85.48 ± 4.78 0.80	85.48 ± 4.98 0.95	85.48 ± 3.53 1.08
0.7396	0.7726	0.5613	0.5648	0.8706	0.8715
Australian (690 × 14)	86.81 ± 2.68 1.83	87.21 ± 2.37 4.09	75.47 ± 2.98 0.016	86.81 ± 2.28 3.84	87.24 ± 1.77 5.06	87.83 ± 2.16 7.65	88.23 ± 3.25 9.65
0.4021	0.5532	0.5358	0. 4820	0. 7299	0.8043
Diabetes (768 × 8)	77.21 ± 2.83 3.78	77.43 ± 3.84 7.65	71.30 ± 3.19 0.145	77.34 ± 3.59 4.72	77.80 ± 2.15 6.87	78.01 ± 3.34 8.04	78.65 ± 2.98 10.04
0. 6559	0.7020	0. 3137	0. 7368	0.8140	0.8341
Haberman (306 × 3)	74.19 ± 1.50 0.40	74.44 ± 2.28 0.88	75.43 ± 3.76 0.007	74.84 ± 2.19 0.57	75.17 ± 2.30 0.53	75.17 ± 3.76 0.70	75.38 ± 5.10 0.90
0. 7039	0. 7398	0. 7153	0. 8369	0.8080	0.8226
Heartcancer (303 × 14)	96.38 ± 5.90 0.39	96.55 ± 4.22 0.79	90.05 ± 4.46 0.001	100.0 ± 0.000.55	100.0 ± 0.00 0.43	100.0 ± 0.00 0.65	100.0 ± 0.00 0.99
0.0074	0.0132	0.0399	NaN	NaN	NaN
Heartdisease (294 × 13)	76.87 ± 4.91 0.36	76.99 ± 3.19 0.65	82.66 ± 6.07 0.009	84.01 ± 3.10 0.53	80.27 ± 4.42 0.45	81.29 ± 2.63 0.52	81.62 ± 2.63 0.88
0. 4106	0. 4334	0.5158	0.6128	0.8092	0.8792
Ionosphere (351 × 33)	95.44 ± 0.58 0.55	95.73 ± 0.58 0.92	90.16 ± 2.42 0.006	93.17 ± 1.85 0.99	95.02 ± 1.86 0.62	96.02 ± 2.07 0.68	96.22 ± 2.47 0.99
0.3645	0.3645	04177	0. 2014	0. 6228	0.8473
Spect (267 × 44)	81.27 ± 5.98 0.29	81.46 ± 4.55 0.48	81.75 ± 3.57 0.004	81.66 ± 4.07 0.49	80.92 ± 4.51 0.62	81.94 ± 2.49 0.59	82.40 ± 2.76 0.50
0.6928	0.7015	0.2433	0.7140	0.5613	0.8236
BUPA (345 × 6)	73.04 ± 3.50 0.57	73.19 ± 4.63 1.12	68.96 ± 4.24 0.096	72.18 ± 4.36 0.85	71.30 ± 5.19 0.54	71.72 ± 5.30 1.04	71.88 ± 7.19 1.50
0.4313	0.5364	0.6721	0.5133	0. 7759	0.8018
Breast (683 × 9)	96.20 ± 3.61 3.16	96.78 ± 2.91 8.33	95.43 ± 2.43 0.032	97.07 ± 0.46 4.08	97.07 ± 1.22 4.69	97.87 ± 1.22 6.69	98.02 ± 1.22 10.69
0. 2139	0. 2845	0.3864	0.5944	0.8572	0.8922
BTSC (748 × 4)	78.74 ± 1.51 3.48	78.87 ± 1.34 9.62	78.27 ± 1.75 0.035	78.88 ± 3.24 4.40	78.75 ± 2.57 5.72	78.76 ± 2.75 8.72	78.86 ± 3.57 11.72
0. 5187	0.5447	0. 7428	0.5921	0.8123	0.8265
Balancescale (576 × 4)	97.92 ± 1.60 1.74	98.12 ± 0.67 5.09	96.45 ± 0.72 0.026	100.00 ± 0.00 2.64	99.65 ± 0.42 2.39	100.00 ± 0.00 5.98	100.00 ± 0.00 9.04
0.5162	0.5263	0.6396	NaN	0.7839	NaN
WDBC (569 × 30)	97.72 ± 1.42 1.98	97.88 ± 2.01 4.52	98.22 ± 1.33 0.028	98.07 ± 1.02 2.70	98.24 ± 1.23 4.09	98.44 ± 2.23 6.09	98.89 ± 2.55 9.09
0.4154	0.4199	0.5627	0.5608	0.7954	0.8022

The p-values are obtained from a t-test by comparing each algorithm to INF-NPSVM. The best correctness results are in bold red, and the second in bold black.

4.3 Friedman test

In order to further analyze the performance of the seven algorithms, Friedman test [34] is used to test if there is a significant difference between the experimental results.

Friedman test is proved to be a simple non-parametric statistic method. To compute the Friedman statistic, the average ranks of the five algorithms on accuracy for the twenty-two UCI datasets are calculated and listed in Table 2. Under the null-hypothesis that all the algorithms are equivalent, the Friedman statistic can be computed as follows: $χ_{F}^{2} = \frac{12 N}{m (m + 1)} [\sum_{j} R_{j}^{2} - \frac{m {(m + 1)}^{2}}{4}]$ (58) where $R_{j} = \frac{1}{N} \sum_{i} r_{i}^{j}$ , and $r_{i}^{j}$ denotes the jth of m algorithms on the ith of N datasets. Then a more desirable statistic is derived:

$F_{F} = \frac{(N - 1) χ_{F}^{2}}{N (m - 1) - χ_{F}^{2}}$ (59)

which is distributed according to the F-distribution with m - 1 and (m - 1) (N - 1) degrees of freedom.

Table 2

Average rank on classification accuracy of nonlinear classifiers

Datasets	SVM	IVF-SVM	GEPSVM	TWSVM	NPSVM	F-NPSVM	IVF-NPSVM
WPBC	2	1	6	7	5	4	3
Teaching	6	5	7	4	3	2	1
Sonar	4	3	6	5	4	2	1
Seeds	7	5	6	2	4	3	1
Parkinsons	6	5	1	7	4	3	2
Fertility	5.5	4	7	5.5	3	2	1
Heartstatlog	7	4	6	5	3	2	1
Ech_diogram	5	4	6	7	3	2	1
Hepatitis	6	5	7	4	2.5	2.5	1
Vertebral	5	4	7	6	2	2	2
Australian	4.5	6	7	4.5	3	2	1
Diabetes	6	4	7	5	3	2	1
Haberman	7	6	1	5	3.5	3.5	2
Heartcancer	6	5	7	2.5	2.5	2.5	2.5
Heart_disease	7	6	2	1	5	4	3
Ionosphere	4	3	7	6	5	2	1
Spect	5	4	6	3	7	2	1
BUPA	2	1	7	6	5	4	3
Breast	6	5	7	3.5	3.5	2	1
BTSC	5	2	7	1	6	4	3
Balancescale	6	5	7	2	4	2	2
WDBC	7	6	4	5	3	2	1
Average rank	5.41	4.23	5.82	4.41	3.82	2.57	1.61

According to (58) and (59), for the nonlinear case, we can obtain $χ_{F}^{2} =$ 57.50 and F_F = 16.20, according to F-distribution with (6,126) degrees of freedom. As can be seen from the table of critical values for F-distribution, the critical value of F(6,126) is about 2.51, 2.17 and 1.82 for the significance level α_F =0.025, α_F = 0.05 and α_F = 0.1, respectively.

The obtained results suggest that there is a significant difference between the seven algorithms since the real value of F_F is much larger than the critical values. GEPSVM ranks the most backward, but its training time is the shortest. IVF-SVM ranks better than SVM, while F-NPSVM and IVF-NPSVM also rank better than NPSVM. Overall, IVF-NPSVM ranks first and F-NPSVM ranks second. These indicate that the classification models with fuzzy information have better average performance than the original models and the two new algorithms proposed in this paper are indeed superior to the other five algorithms in classification accuracy. Of course, we also noticed that because of the increase of parameters, the training models with fuzzy information are slower than other algorithms in training speed.

5 Conclusion

In order to further improve the performance of NPSVM, F-NPSVM and IVF-NPSVM which are meaningful extensions of NPSVM are introduced in this paper. They come from the theories of FS, IVFS and NPSVM. Besides the distance from point to the class center, the close degree of the points is proposed, and then the fuzzy membership degree and the interval-valued fuzzy membership degree of the points are determined. Since the contribution degree of different points is considered, the effect of outliers is reduced and the accuracy of the NPSVM classification is improved.

Although simulation results verify the two proposed algorithms perform better than NPSVM, the problem of time complexity is still in front of us. We will further study fast algorithms to improve the training efficiency, especially in the classification problems of large-scale datasets. The application of the algorithms in multi-class pattern recognition is also a research direction in the future.

Footnotes

Acknowledgement

This work was supported by the National Nature Science Foundation of China (NO.61571052) and (NO.71771028), Beijing Municipal University’s High-level Innovation Team Construction Project (IDHT20180510) and Beijing Intelligent Logistics System Collaborative Innovation Center.

References

Cortes

, Vapnik

V.N.

. Support vector networks [J], Machine Learning, 20 (1995): 273–297.

Vapnik

V.N.

. (1995) The Nature of Statistical Learning Theory [M], New York: Springer-Verlag.

Vapnik

V.N.

. Statistical Learning Theory [M], New York: Wiley, (1998).

Burges

(1998). A tutorial on support vector machines for pattern recognition [J], Data Mining and Knowledge Discovery, 2(2): 121–167.

Schölkopf

, Burges

, Smola

(1999). Advances in Kernel Methods: Support Vector Learning [M]. Cambridge, MA: MIT Press.

Suykens

J A K

, Vandewalle

(1999) Least Squares Support Vector Machine Classifiers [J]. Neural Process Lett 9(3): 293–300.

Schölkopf

, Smola

, Williamson

R C.

and Bartlett

P L.

(2000) New support vector Algorithms [J]. Neural Computation, 12: 1207–1245.

Zadeh

L A

(1965) Fuzzy sets [J], Information and Control, 8 (1965): 338–353.

Shuili

Chen

, Jinggong

, Xianggong

Wang

(2005) Fuzzy set theory and its application [M]. Beijing: Science Press.

10.

Atanassov

(1986) Intuitionistic fuzzy sets [J], Fuzzy Sets and Systems, 20: 87–96.

11.

Atanassov

(1999) Intuitionistic Fuzzy Sets: Theory and Applications[M], Physical -Verlag, Heidelberg, New York.

12.

Gorzafczany

(1983) Approximate Infererce with Interval-Valued Fuzzy sets-an Outline [J], in: Proc.Polish Symp. On Interval and Fuzzy Math. Poznan: 89–95B.

13.

Dziech

, Gorzafczany

(1987) Decision Making in Signal Transmission Problems with Interval-valued Fuzzy Sets [J]. Fuzzy Sets and Systems. 23(2): 191–203.

14.

Gorzafczany

(1988) Interval-valued Fuzzy Controller Based on Verbal Model of Object [J], Fuzzy Sets and Systems, 28(1): 45–53.

15.

Atanassov

(2006) Strategies for decision making in the conditions of intuitionistic fuzziness [J]. Advances in Soft Computing, 33: 263–269.

16.

Z.S.

& Chen

(2011) A multi-criteria decision-making procedure based on interval -valued intuitionistic fuzzy Bonferroni means [J]. Journal of Systems Science and Systems Engineering, 20(2): 217–228.

17.

Hongmei

, Fenghua

(2014) Multiattribute decision making models and methods using interval valued fuzzy sets [J], Journal of Chemical and Phamarceutical Research, 6(7): 465–473.

18.

LIN

C F

, WANG

S D

(2002) Fuzzy support vector machines [J], IEEE Transaction on Neural Networks, 13: 464–471.

19.

LINCF , WANG

S D

(2005) Fuzzy support vector machines with automatic membership setting [J], Studies in Fuzziness and soft computing, 177: 233–254.

20.

XiangZhang , Xiaoling

Xiao

, Guangyou

(2006) Determination and analysis of membership degree in fuzzy support vector machine [J], Journal of image and graphics, 11(8): 1188–1192.

21.

Hong-Mei

, Qiu-Ling

HOU

, Ling

JING

(2017) Interval-valued Fuzzy Support Vector Machine [J]. 2017 3rd International Conference on Computer Science and Mechanical Automation, 429–435.

22.

Olvi

(2006) Mangasarian, Edward W. Wild. Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence-PAMI, 28(1): 69–74.

23.

Jayadeva

Khemchandani

and Chandra

(2007) Twin Support Vector Machines for Pattern Classification [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 29(5): 905–910.

24.

Shao

Y.H.

, Chen

W.J.

, Zhang

J.J.

, Wang

, Deng

N.Y.

(2014) An efficient weighted Lagrangian twin support vector machine for imbalanced data classification [J], Pattern Recognition. 47(9): 3158–3167.

25.

Shao

Y.H.

, Chen

W.J.

, Huang

W.B.

, Yang

Z.M.

, Deng

N.Y.

(2013) The best separating decision tree twin support vector machine for multi-class classification [J], Procedia Computer Science. 17: 1032–1038.

26.

Ding

S.F.

, Zhang

X.K.

, An

Y.X.

(2017) Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification [J]. Pattern Recognition. 67: 32–46.

27.

Yang

Z.M.

, Wu

H.J.

, Li

C.N.

, Shao

Y.H.

(2016) Least squares recursive projection twin support vector machine for multi-class classification, International Journal of Machine Learning and Cybernetics [J]. 7(3): 411–426.

28.

Yingjie

Tian

, Zhiquan

, Xuchan

, Yong

Shi

and Xiaohui

Liu

(2014) Nonparallel Support Vector Machines for Pattern Classification [J]. IEEE Transactions on Cybernetics, 44(7): 1067–1079.

29.

Tian

Y.J.

, Ping

(2014) Large-scale linear nonparallel support vector machine solver [J], Neural Network. 50: 166–174.

30.

Dandan

Chen

, YingjieTian , XiaohuiLiu (2016) Structural nonparallel support vector machine for pattern recognition[J]. PatternRecognition, 60: 296–305.

31.

Jayadeva , Reshma~Khemchandani , Suresh~Chandra (2005) Fuzzy Proximal Support Vector Classification Via Generalized Eigenvalues [J], Pattern Recognition and Machine Intelligence, 3776: 360–363.

32.

Gao , Bin-Bin , (2015) Coordinate descent fuzzy twin support vector machine for classification [J], Machine Learning and Applications, 7–12.

33.

Sartakhti

, Ghadiri

, Afrabandpey

(2016) Fuzzy Least squares twin support vector machines [J].

34.

Demsar

(2006) Statistical comparisons of classifiers over multiple data sets [J], Journal of Machine Learning research. 7: 1–30.