Abstract
In this paper, we develop a novel support vector algorithm with fuzzy hyperplane for pattern classification. We first introduce the concepts of fuzzy hyperplane and fuzzy linear separability. Then, the proposed approach seeks a fuzzy hyperplane that best separates the positive class from the negative class with the widest margin in the feature space. Further, the decision function of the proposed approach is generalized so the values assigned to the individuals fall within a specified range and indicate the membership degree of these individuals in a given category. This integration preserves the benefits of fuzzy set theory and SVM theory, where the use of the fuzzy hyperplane provides the SVM with effective means for capturing the approximate, imprecise nature of the real world. On the other hand, the SVM provides the advantage to minimize the structural risk and effectively generalize the unseen data. Experimental results are then presented which indicate the performance of the proposed approach.
Introduction
In many real applications, the observed information is often imprecise, uncertain and incomplete, which can be represented by fuzzy set. Fuzzy set theory provides a strict mathematical framework in which vague conceptual phenomena can be precisely and rigorously studied [9, 28]. Since its inception in 1965, the fuzzy sets theory has advanced in a variety of ways and in many disciplines. Fuzzy set theory lets us work at a high abstraction level and offers a means for dealing with imprecise data measurement. Most important, fuzzy set theory allows us to deal with vague or inexact nature of the real world.
The Support Vector Machine (SVM) is a promising kernel-based learning algorithm for data classification and regression. It was first introduced by Vapnik et al. in 1995 as an approximate implementation of the structure risk minimization [8, 25]. For achieving best generalization performance, SVM finds the best tradeoff between the model complexity and the learning ability according to the principle of the statistical learning theory. In many applications, SVM has been shown to outperform most traditional learning machines and has been introduced as powerful tools for solving classification and regression problems [3]. Since SVM and fuzzy set theory have been very successful in machine learning problems, Lin et al. [17, 18] introduced a fuzzy membership as a weighting factor to reduce the influence of noises or outliers over the training procedure and proposed the first fuzzy SVM. Whereas Chiang et al. [7] applied SVM theory for the fuzzy rules based modeling. Lin et al. [19] proposed the support vector based fuzzy neural network for pattern classification. Hao et al. [10] applied the fuzzy function to the support vector regression machine. Ji et al. [Ji, 2014] incorporated the concept of the possibility measure and fuzzy number to the support vector based fuzzy classifier.
In this paper, we incorporate the concept of fuzzy set theory into the support vector classification model. We first introduce the concepts of fuzzy hyperplane and fuzzy linear separability. Then, we derive a new support vector classification algorithm with the fuzzy hyperplane. The proposed approach seeks an optimal fuzzy hyperplane in the feature space to separate the positive class from the negative class with the widest margin. Further, the decision function of the proposed approach is generalized so the values assigned to the individuals fall within a specified range and represent the membership degree of these individuals in a given category. The proposed approach combines the superior classification power of support vector machine in high dimensional data spaces and the efficient means of fuzzy set theory in handling imprecise information. The experimental results show the proposed approach can yield a satisfactory generalization performance and meanwhile can estimate the membership grade that an individual belongs to a given class.
The rest of this paper is organized as follows. A brief review of the theory of the support vector classification machine is described in Section 2. The proposed support vector classification with the fuzzy hyperplane is derived in Section 3. Experiments are presented in Section 4, and some concluding remarks are given in Section 5.
Review of support vector classification machine
Suppose we are given a set of labeled training data vectors {
The regularization parameter C controls the trade-off between the maximization of the margin (the width of the margin in C-SVC is 2/∥
By introducing the Mercer kernel k (
We can easily check that all training points are considered as the same important during the learning procedure of SVM. This leads to a higher sensitivity to some special cases, such as noises or outliers. Besides, the decision function f (
In this section, we incorporated the concept of fuzzy set theory into the support vector classification model. The proposed approach attempt to find a “fuzzy hyperplane” that best separates the positive class from the negative class. The parameters to be identified in the fuzzy hyperplane, such as the components of the weight vector and bias term, are fuzzy numbers. The fuzzy parameters studied in this work are restricted to a class of symmetric “triangular” fuzzy numbers.
The quadratic programming problem
To seek a fuzzy hyperplane that best separates the positive class from the negative class with the widest margin, we need the following preliminaries.
[a1, a2] ≥ [b1, b2] iff a1 ≥ b1 and a2 ≥ b2
then for any fuzzy number A, B, we have
Let X = (m, c) be a symmetric triangular fuzzy number where m is the center and c is the width. From Preliminary 1, for any two symmetric triangular fuzzy numbers A = (m A , c A ) and B = (m B , c B ), we have
Consider a set of N data vectors {
The minimization of ∥
Specifically, from the above preliminaries, our goal is to identify the fuzzy weight vector
We can find the solution of this optimization problem given by Equation (10) in dual variables by finding the saddle point of the Lagrangian:
where α1i, α2i, ρ1i, ρ2i and γ are the nonnegative Lagrange multipliers. Differentiating L with respect to
Substituting Equations. (12–17) into (11), we obtain the dual problem as
As for the pair (
While parameters b and d can be determined from the following Karush-Kuhn-Tucker (KKT)conditions
For some and , we have ξ1i = ξ2j = 0 and moreover the second factors in Equations (19) and (20) have to vanish. Hence, b and d can be computed as follows:
The fuzzy hyperplane is defined by the following membership function:
For any
where α = (m A + c A ) - (m B + c B ) and β = (m A - c A ) - (m B - c B ).
Note, R≥B (A) =0.5 if m
A
= m
B
, R≥B (A) <0.5 if m
A
< m
B
, and R≥B (A) >0.5 if m
A
> m
B
. And the decision function of the proposed fuzzy SV classification model is
This decision function takes a value between 0.0 and 1.0 and represents the degree of membership that the new point
The decision boundary given by (25) is a hyperplane in R
n
. More complex decision surfaces can be extended by employing a nonlinear mapping Φ : R
n
→ F to map the data points into a higher dimensional feature space F, and finding the maximal-margin fuzzy separating hyperplane in the feature space. Note, the data point
The fuzzy surface is defined by the following membership function:
The set of labeled training patterns is said to be fuzzy linearly separable if the following inequality
According to the Karush-Kuhn-Tucker (KKT) conditions given by Equations (19–22), the training data points can be classified into the following categories.
We will discuss the influences of model parameters on the number of SVs and MEVs in the experimental section.
In this section, several examples are used to verify the effectiveness of the proposed fuzzy SV classification algorithm. We apply the proposed method to benchmark datasets, handwritten digits problem, and stock market trend prediction problem. In these simulations, we only consider the radial basis function (RBF) kernel k (
Benchmark dataset
We evaluate the classification performance of the proposed fuzzy SV classification algorithm on several well-known benchmark datasets from the PROBEN1 [22], UCI repository [2], and the Statlog collection [20], respectively. We scale all data to be in [–1, 1]. Some of those datasets have more than two classes. For simplicity, we treat all data not in the first class as in the negative class. We compare the proposed fuzzy SV classification approach with the original C-SVC [25], v-SVC [23], par-v-SVC [11], and WCS-FSVM [1]. For fair comparison with the C-SVC and v-SVC, we set fuzzy membership μ i = 1, ∀ i for all datasets. The most important criterion for evaluating the performance of those algorithms is their accuracy rate. Note, the generalized accuracy of the SVM classifier depends on the values of the regularization parameter C, the kernel parameter q, the vagueness parameter v, I w , and O w . For simplicity, we set I w = O w for all problems. Although there exits many model-parameters selection methods for SVMs [4], the most popular method for choosing the model-parameters of SVMs is still the exhaustive search [12]. In the following experiments, we estimate the generalized accuracy using different regularization parameters C = [10–1,100,21, … ,106], kernel parameters q = [24,23, … ,2–10], vagueness v = [0.1, 0.2, 0.3, … ,0.8], and I w = O w = [0.1, 0.2, 0.3, …, 0.9] for each algorithms. We apply the ten-fold cross-validation method to the whole training data to select the model parameters and estimate the generalized accuracy.
Table 1 reports the comparison of C-SVC, v-SVC, par-v-SVC, WCS-FSVM and the proposed approach in terms of accuracy rate, training time, and number of SVs. The training time complexity for solving the SVM-type quadratic optimization problem with N variables is O(N3). Because the number of variables to be solved in the proposed quadratic optimization problem is double that of the original SVM, the training speed of the proposed fuzzy SVC approach approximately eight times slower than that of a classical SVM (as shown in Table 1).
As for the test time complexity, the classification time of a SVM model increases linearly with the number of support vectors. Although the number of variables in the proposed quadratic optimization problem is double that of the original SVM, the numbers of support vectors do not increase significantly in our approach. The proposed fuzzy SVC algorithm achieves a better accuracy rate with fewer support vectors on wdbc and new-thyroid datasets, indicating the proposed approach are more accurate and faster on testing in those datasets. In many real time applications, the generalization ability and classification speed are more important than training speed. Hence, it is necessary to present our fuzzy SVC that yields comparable accuracy and number of SVs to the classical SVM approaches, demonstrating that it is suitable for real-world classification problems. The experimental results demonstrate our approach performs fairly well on those benchmark datasets.
Analysis of the influence of model parameters
In the part, we analyze how the model parameters influence the results of the proposed fuzzy SV classification model. We consider the following measures: the number of SVs/MEVs, training/test accuracy rate, and the width of the fuzzy hyperplane. The width of the fuzzy hyperplane is defined as . Figure 2 (a) illustrates the influence of fuzzy width on the degree of vagueness of the fuzzy boundary. As seen from Fig. 2(a), is completely larger than Θ (i.e., ) when the width of is small. As the value of 〈
Figures 3 through 6 illustrate the results of the proposed fuzzy SVC algorithm on the iris dataset with different settings of the following parameters: v (vagueness parameter), I w (width of symmetric triangular fuzzy numbers one), C (regularization parameter), and q (RBF kernel parameter). The arithmetic means and standard deviation error bars are estimated by employing the ten-fold cross-validation mechanism. Figure 3 illustrates the results of the proposed fuzzy SV classification model for different values of the vagueness parameter v. It can be seen the numbers of SVs/MEVs and training/test accuracy rates are insensitive toward changes in v. Moreover, the width of the fuzzy hyperplane increases when the vagueness parameter v is decreased. Namely, more data points would be located on the fuzzy boundary and the resulting classification model gets vaguer when the vagueness parameter v is decreased.
Figure 4 shows how the parameter I
w
(= O
w
) settings influence the proposed fuzzy SVC model. The parameter I
w
(= O
w
) also controls the vagueness of the resulting fuzzy boundary. As illustrated in Fig. 2(b), the membership degree that is satisfied is when O
w
is small. Namely, is completely larger than Θ. As the parameter O
w
increases, becomes partially larger than Θ (i.e., ). In other words, point
Figure 5 gives an illustration of the proposed fuzzy SVC model for different values of the regularization parameter C (the horizontal-axis represents the logarithm of C to the base 10). The chosen C mainly controls the cost that one is willing to pay for the amount of variation of the fuzzy separable constraints in Equation 10. As shown in Fig. 5, the numbers of SVs/MEVs decrease when the regularization parameter C is increased, largely due to more training points would completely satisfy the fuzzy separable constraint given in Equation (30) as the penalty assigned to the misclassified patterns is increased. The accuracy rate increases as the parameter C increases initially and decreases afterwards, largely because increasing the penalty to the misclassified error by too much would lead to overfitting. Moreover, the FuzzyWidth varies independently with C.
Figure 6 shows how the RBF kernel parameter q settings influence the proposed fuzzy SVC model (the horizontal-axis represents the logarithm of q to thebase 2). With increasing q, the decision boundary fits the data more tightly. On the contrary, the decision boundary gets smoother as q is decreased. As shown in Fig. 6, the number of SVs decreases as the parameter q initially increases and increases afterwards, while the number of MEVs decreases as the parameter q increases. This is caused by the fact that as the decision boundary fits the data more tightly, the number of training patterns that partially/completely violate the fuzzy separable constraint (i.e., SVs or MEVs) is reduced. However, as the decision boundary become more sophisticated later on, it requires more SVs to construct the fuzzy hyperplane. Besides, the accuracy rate among training patterns increases as the parameter q increases. The test accuracy rate increases as the parameter q initially increases and decreases afterwards. Largely due to as the decision boundary fitting the data more tightly, it could increase the learning ability. However, it would also lead to overfitting due to memorizing the errors and noises among the training data points. Moreover, the FuzzyWidth is insensitive toward changes in q. In summary, parameters v and I w mainly control the vagueness of the resulting model while parameters C and q mainly control the accuracy rate of the proposed approach.
Handwritten digits dataset
We now evaluate the classification performance of the proposed fuzzy SV classification approach on the handwritten digits problem. This dataset consists of isolated binary handwritten digits partially extracted from the BR digit set of the SUNY CDROM-1 [13] and the ITRI database [5].
We use two categories of confusion group, class “2” and class “7”, with 600 samples per class. Each digit has a different size, and a sample of some of the digits is shown in Fig. 7. For handwritten digit problem, most of the digit classes consist of different writing styles. Besides, the noises or outliers patterns make the handwritten digit recognition even more difficult. Since the optimal classifier constructed by the SVM depends on only a small part of the training patterns, it may become sensitive to outliers or noises in the training set [17, 18]. To overcome this problem, we can assign lower fuzzy membership to those outliers or noises and treat them as less important.
In this experiment, we apply the multi-sphere fuzzy SV clustering algorithm [6] to estimate the fuzzy membership μ
i
for each digit patterns. The multi-sphere fuzzy SV clustering approach first maps data points into a higher dimensional feature space through a desired kernel function. It then applies an adaptive cell growing model to essentially identify dense regions in the original space by finding their corresponding hyperspheres with minimal radius in the feature space. Unlike original support vector clustering, this approach regards each hypersphere in the feature space as representing a single cluster of arbitrary shapes in the original space. To determine the membership grade of a point belonging to a certain cluster, the multi-sphere fuzzy SV clustering approach considers not only the distance to the corresponding spherical center, but also the radius of the hypersphere in the feature space. The multi-sphere fuzzy SV clustering approach uses the following fuzzy membership function to compute the degree of a given point
where R
j
is the spherical radius in feature space corresponding to cluster j; D
j
(x) is the distance between
For class “2” and “7”, we randomly select 300 samples as training instances and set the remaining samples as testing instances. We employ a ten-fold cross-validation over the training set to obtain the optimal model parameters. Then we train the whole training set using the model parameters that achieve the best validation rate and predict the test set. Table 2 reports the result of comparing the original C-SVC, par-v-SVC [11], lin’s FSVM [18], WCS-FSVM [1], and the proposed fuzzy SV classification approach (without/with assigning fuzzy membership to each data points). We report the optimal model parameters, the corresponding cross-validation rates on the training set, the accuracy rates on the test set, and the numbers of support vectors. As seen from Table 2, associating a fuzzy membership to each digit pattern can effectively reduce the influences of noises, and incorporating the concept of fuzzy hyperplane into the SVM might be very useful for dealing with the vague and inexact nature of the real world. As a whole, the experimental results demonstrate the proposed method performs fairly well on the handwritten digit dataset.
Though this experiment achieves satisfactory results, it did not really address the actual goal the fuzzy SV classification algorithm was designed for. The main advantage of the proposed approach is that there exists an imprecise boundary between the positive and the negative classes to capture the uncertain nature of the real world. Figure 8 shows patterns located in the fuzzy boundary constructed by the proposed approach. Namely, the values of the fuzzy decision function f (
In this section, we apply the proposed fuzzy SVC approach to the stock market trend prediction problem. The financial market is characterized by complex, evolutionary, and non-linear dynamical systems. Predicting the next-day stock trend (either increase or decrease) is therefore a difficult task. However, it is important in the sense that it provides tangible information for investment decisions and it is a highly attractive issue to researchers from different fields. Numerous studies of financial economics have reported that technical indicators can be used to predict the directions of daily change of the stock price index. We select 17 technical indicators to construct the initial attributes, as determined by the review of domain experts and prior research [15], and classify the day point into two classes: increase and decrease.
The stock trend prediction problem is a really ambiguous classification problem. Clearly, the day point whose stock price changes +5% has a higher membership degree in the “increase” class than the day point whose stock price changes +0.05% . Moreover, the day point whose stock price changes +0.001% seems to be located on the neutral and ambiguous boundary between the “increase” and “decrease” classes. We assign the fuzzy membership to each day point by using the following S-shaped membership function.
where Δp i is the daily change of the stock price index in the ith day point, a = 0 and b = median (|Δp i |) is the median value of |Δp i |, i = 1, …, N.
The stock price index data set is collected from TEJ+ database (http://www.finasia.biz/ensite/). We randomly select four companies: China Steel Co., Ltd. (CSC), Taiwan Semiconductor Manufacturing Co., Ltd. (TSMC), United Microelectronics Co., Ltd. (UMC), and Winbond Electronics Co., Ltd. (WEC). The daily stock price data covers from 1-Jan-2011 to 1-Jan -2013. We compare the proposed fuzzy SVC approach with original C-SVC, par-v-SVC [11], lin’s FSVM [18], and WCS-FSVM [1]. We apply the ten-fold cross-validation method on the whole training data to select the model parameters and estimate the generalized accuracy. Table 3 presents the result of comparing these methods in terms of cross-validation rate and number of SVs. As seen from Table 3, the proposed fuzzy SVC yields comparable results to the traditional SVM and FSVM approaches, demonstrating that incorporating the concept of fuzzy hyperplane into SVM might be very usefully for dealing with ambiguous boundary in the stock market trend prediction problem. In summary, rather than either belonging to the “increase” class or “decrease” class, tolerating a reasonable amount of imprecision, vagueness, and uncertainty during the estimation of the optimal separable hyperplane provides an effective way for solving the complex stock market trend prediction problem (neither increase nor decrease is meaningful). The experimental results show that the fuzzy hyperplane provides an effective means of capturing the inexact, approximate nature of the real world.
In this paper, by combining the fuzzy set theory with SVM, we propose a novel support vector algorithm with fuzzy hyperplane for pattern classification. We first introduce the concepts of fuzzy linear separability and fuzzy hyperplane. Then, the proposed approach constructs a fuzzy hyperplane that best separates the positive class from the negative class with the widest margin in the feature space. Further, the decision function of the proposed approach is generalized so the values assigned to the individuals fall within a specified range and represent the membership degree of these individuals in a given category. This integration preserves the benefits of fuzzy set theory and SVM theory, where the use of the fuzzy hperplane provides the SVM with effective means for handling the approximate, imprecise nature of the real world. On the other hand, the SVM provides the advantage to minimize the structural risk and effectively generalize the unseen data. The experimental results show the proposed approach can yield a satisfactory generalization performance and meanwhile can estimate the membership grade that an individual belongs to a given class.
Footnotes
Acknowledgments
This research work was supported in part by the Ministry of Science and Technology Research Grant MOST 103-2221-E-151-032.
