Abstract
In Rough margin-based ν-Twin Support Vector Machine (Rν-TSVM) algorithm, the rough theory is introduced. Rν-TSVM gives different penalties to the corresponding misclassified samples according to their positions, so it avoids the overfitting problem to some extent. While the input data is a tensor, Rν-TSVM cannot handle it directly and may not utilize the data information effectively. Therefore, we propose a novel classifier based on tensor data, termed as Rough margin-based ν-Twin Support Tensor Machine (Rν-TSTM). Similar to Rν-TSVM, Rν-TSTM constructs rough lower margin, rough upper margin and rough boundary in tensor space. Rν-TSTM not only retains the superiority of Rν-TSVM, but also has its unique advantages. Firstly, the data topology is retained more efficiently by the direct use of tensor representation. Secondly, it has better classification performance compared to other classification algorithms. Thirdly, it can avoid overfitting problem to a great extent. Lastly, it is more suitable for high dimensional and small sample size problem. To solve the corresponding optimization problem in Rν-TSTM, we adopt the alternating iteration method in which the parameters corresponding to the hyperplanes are estimated by solving a series of Rν-TSVM optimization problem. The efficiency and superiority of the proposed method are demonstrated by computational experiments.
Introduction
Pattern recognition problems are often encountered in practical fields, and their learning styles become various, such as deterministic learning, dictionary learning [1–3]. Support Vector Machine (SVM) [4] is one of the powerful classifier for pattern recognition problem today. It obtains the optimal hyperplane by maximizing the margin between two parallel boundary hyperplanes. After solving the Quadratic Programming Problem (QPP), the optimal solution is obtained, which is also the unique global solution. Although SVM has many advantages, it has many challenges, such as high computational complexity. Jayadeva et al. [5] proposed a Twin Support Vector Machine (TSVM) to improve the solving speed. TSVM seeks two nonparallel proximal hyperplanes by solving two smaller-sized QPPs while SVM solves a larger one. Hence TSVM works faster than SVM. By introducing parameter ν, Peng proposed a ν-TSVM [6]. The parameter ν is used to control the fractions of support vectors and margin errors. Later, many variants of TSVM [7–13] have been proposed to further improve its solving speed or generalization performance.
However, the aforementioned algorithms are all based on vector space. When the tensor is considered as input data, the traditional vector-based algorithms cannot handle it directly and effectively. In pattern recognition, machine learning, image processing and other fields, lots of the original objects are represented as tensor form [14]. For example, gray face image is represented by the second order tensor [15]; color image is expressed as third order tensor [16]. Although there are methods converting tensor directly into vector, it may cause structural information lose and data correlation damage [17]. Besides, it often leads to overfitting, curse of dimensionality problem and Small Sample Size (S3) problem.
For solving aforementioned problems, Cai proposed Support Tensor Machine (STM) [18, 19] which can deal with the input image directly without vectorization. Besides, the experimental results also verified that the classification accuracy of STM is superior to that of traditional SVM, and STM is especially suitable for S3 problem. Zhang et al. proposed a twin version of STM, and applied it for micro calcification clusters detection [20]. Later, some researchers studied the performance of twin STMs [21–23]. Kotsia et al. formulated the higher rank STMs [24], in which the parameters defining the separating hyperplane form a tensor that was constrained to be the sum of rank one tensors. Khemchandani et al. introduced a proximal STM [25] by solving a system of linear equations in each iteration, which greatly saved the running time. Jiang et al. outlined a novel learning frameworks-multiple rank multi-linear twin support matrix classification machine [26]. For one-class classification problem, a one-class STM [27] and a one-class Support Higher Order Tensor Machine [28] were proposed. Ye presented a kernel support matrix machine which performs a matrix-form inner product with maximum margin classifier [29]. For tensor regression problem, Shu and Yang proposed a Least Square Support Tensor Regression machine [30] and used a fixed point algorithm to solve it. The tensor-based algorithms have been applied in numerous fields, such as pedestrian detection [31], image classification [32], crowd density estimation [33], fault diagnosis [34], and so on.
The Rough margin-based ν-Twin Support Vector Machine (Rν-TSVM) [8] is an improved algorithm of ν-TSVM. When constructing one hyperplane of ν-TSVM, it considers a large number of samples of this class and only fewer samples in the other class, which easily leads to over-fitting problem. Besides, it gives same penalties to each misclassified samples. By introducing the rough set theory, the Rν-TSVM can effectively overcome these two shortcomings. As mentioned above, because the Rν-TSVM is a vector-based algorithm, it cannot deal with tensor data directly.
Based on the aforementioned works, in this paper, we propose a new tensor-based algorithm, called Rough margin-based ν-Twin Support Tensor Machine (Rν-TSTM). The main idea of Rν-TSTM is to find two nonparallel hyperplane in tensor space by solving a pair of smaller-sized QPPs, which can greatly reduce its computational complexity substantially compared with STM. Similar to Rν-TSVM, our Rν-TSTM firstly constructs rough lower margin, rough upper margin, and rough boundary. Then it gives different penalties to the misclassified samples according to their positions. Therefore, Rν-TSTM retains the superiorities of Rν-TSVM. Unlike Rν-TSVM, the proposed Rν-TSTM retains the data topology efficiently by the direct use of tensor representation. It has acceptable or better classification performance compared to SVM, STM, ν-TSVM, ν-TSTM and Rν-TSVM. What’s more, it can avoid overfitting problem to a great extent, and is more suitable for high dimensional and S3 problems. Besides, we analyze its theoretical interpretation of parameters in tensor space. We do computational experiments on 17 different kinds of datasets to verify the efficiency and superiority of the proposed method.
The remainder of the paper is organized as follows. In Section 2, we give a brief review of ν-TSVM. In Section 3, we introduce our Rν-TSTM algorithm, theoretical derivation, its solving method, theoretical interpretation and generalized form. The experimental results on various datasets are presented in Section 4. Finally, we make conclusions in Section 5.
ν-Twin Support Vector Machine
The ν-TSVM [6] is an improved algorithm of TSVM. The parameters ν1 and ν2 in ν-TSVM are used to control the bounds of the fractions of the support vectors and the margin errors. Suppose the training dataset is: T = {(
For linear case, ν-TSVM seeks the following pair of nonparallel hyperplanes:
In linear ν-TSVM, if the number positive and negative samples are equal, i.e. p = q = l/2 , then its computational complexity is:
The most obvious defect of ν-TSVM is that when constructing the classification plane for one class, a large number of samples of this class are considered in the objective function and it ignores most samples in the other class. Different samples have different effects on the decision of the hyperplane. Rough margin-based ν-TSVM (Rν-TSVM) [8] was proposed to overcome these disadvantages.
The Rν-TSVM algorithm aims to learn two non-parallel hyperplanes by solving two smaller-sized QPPs. Because different points in different positions have different effects on the separating hyperplane, Rν-TSVM constructs rough lower margin, rough upper margin and rough boundary, and gives different penalties to the misclassified samples according to their positions.
Our Rough margin-based ν-Twin Support Tensor Machine (Rν-TSTM) is fundamentally based on the same idea. In this section, we propose a new Rν-TSTM based on tensor space, which can utilize the structural information of tensor directly. Firstly, we establish the model of the 2nd-order Rank-one Rν-TSTM, then its algorithm is given in detail. Secondly, we give its theoretical interpretation and a generalized form. Thirdly, we analyze its convergence and computaion complexity. Finally, the high-order Rank-one Rν-TSTM is concerned.
Rν-TSTM and its algorithm
For binary classification of tensor data, suppose we are given the training dataset:
The two QPPs of second-order Rank-one Rν-TSTM are represented as follows:
We use the positive QPP (6) to understand the meaning of variables, the corresponding explanation is shown in Fig. 1. The black line denotes the positive hyperplane

Illustration for QPP (6) in Rν-TSTM.
To make the two QPPs of Rν-TSTM simpler and easier to be solved, we transform the positive and negative training samples into 3rd-order tensors, and use
On account of the similar structure of the QPP (8) and (9), we only describe the solving process of the problem (8). By introducing the Lagrangian multipliers
Setting the derivatives with respect to the primal variables
From Eq.(11) and Eq.(12), we get that
Firstly, we initialize
Secondly, once the value
Let
By the similar way, we can obtain the optimal solution
The flowchart of Rν-TSTM is described as follows.
Inputs: The parameters ν1, ν2, σ1, σ2, the maximum number of iteration, training samples
Outputs: The optimal solutions (
Step 1: Initialization. Let
Step 2: Calculate (
Step 3: Update (
Step 4: Update (
Step 5: Do the similar steps 2 ∼ 4,
Step 6: Calculate ∥
Step 7: For the testing samples, output their labels by Eq.(30).
Step 8: End.
In this subsection, we still use the first optimization problem of Rν-TSTM to interpret. After solving QPP (8), we can get its optimal solution
If If If If If
According to KKT conditions and Proposition 1, we can calculate the parameter ρ1, ρ2, respectively.
The following proposition gives the theoretical interpretation of (ν1, σ1).
(1) we can get 2ν1/σ1 ≤ q1/q, which means 2ν1/σ1 is a lower bound on the fraction of negative support tensors.
(2) we can get 2ν1/σ1 > q2/q, which means 2ν1/σ1 is an upper bound on the fraction of entirely negative margin errors.
The proof of proposition 2 is similar to Ref. [8], here we will not give the detail.
Similarly, we can derive the theoretical interpretations for positive samples.
In this subsection, we analyze the convergence of Algorithm 1.
Similarly, we can analyze QPP (7) and get the same conclusion. Therefore, Algorithm 1 is convergent. □
In order to discuss the computational complexity of Algorithm 1, we firstly assume that the number of samples in the two classes is approximately equal, and set p = q = l/2. In addition, we set n1 = n2 = n. At each iteration, the algorithm need to calculate a dual matrix and a QPP, which spend at least
The input data of linear Rν-TSTM described above is second order. In this subsection, we briefly describe a generalized high-order form for Rν-TSTM. Let
Similarly, we can use the Alternating Iteration method to obtain the optimal solutions.
In this section, we compare the performance of Rν-TSTM with tensor-based algorithms STM and ν-TSTM, and vector-based algorithms SVM, ν-TSVM and Rν-TSVM. We firstly evaluate tensor-based algorithms on Australian dataset for a trial of detailed discussion on the impact of different tensor sizes. Then we give an overall comparison on 10 vector-based and 7 tensor-based datasets.
Datasets and preparation of experiments
All the datasets come from UCI 1 and ORL 2 database. Table 1 includes the detailed information of these datasets, where the first 10 datasets are vector-based and the last 7 are tensor-based. In all experiments, the 17 datasets are used to construct the binary classification problems. We randomly choose two classes from the multi-class dataset.
Description of datasets
Description of datasets
All algorithms are written in Matlab 2014a. These programs are operated on 3.60GHz Inter Core i3-4160 Duo CPU with 4.0GB RAM. We use 10-fold cross-validation on the training set to find the best parameters. Namely, we randomly select the subsets from the datasets to train the model, and the rest data is used for testing. This process is repeated ten times.
There are 5 running parameters: C, ν1, ν2, σ1 and σ2. The parameters C in SVM and STM are searched from {2-10, 2-9, …, 210}. We set ν1 = ν2 = ν in the ν-TSVM, ν-TSTM, Rν-TSVM and Rν-TSTM, which are searched in the range {0.1, 0.2, …, 0.9}. We set σ1 = σ2 = σ and the parameters σ in Rν-TSVM and Rν-TSTM are tuned in the range {21, 21.5, …, 25}.
We report the test accuracy to evaluate the performance of classifiers, its form is mean value μ and plus or minus the standard deviation σ, where:
To verify the effectiveness dealt with overfitting problem, we use the evaluation indexes: sensitivity and specificity, where:
In this subsection, we use Australian dataset as an example, which has 383, 307 positive and negative samples respectively with 14 attributes, to discuss the impact of different tensor sizes on classification performance.
For a vector sample
Averaged percentages of testing accuracy in different tensor sizes on Australian dataset.
Averaged percentages of testing accuracy in different tensor sizes on Australian dataset.
The experimental results indicate that the closer of n1 and n2, the better classification performance. Based on this principle, we establish the tensor sizes of the 10 vector-based datasets involved in this paper, which are also shown Table 1.
We evaluate the six algorithms: SVM, STM, ν-TSVM, ν-TSTM, Rν-TSVM, and Rν-TSTM on 10 vector datasets and 7 tensor datasets in Table 1. We focus on small training sets and record the testing accuracy and Gmeans with various training sample sizes. The averaged percentages of testing accuracy of the six algorithms on 10 vector datasets and on 7 tensor datasets are reported in Table 3 and Table 4, respectively. The averaged Gmeans of the six algorithms on 10 vector datasets and on 7 tensor datasets are reported in Table 5 and Table 6, respectively. Similarly, the bold values in Table 3 and Table 4 indicate the best ones. To get a more visual image of the classification performance and running time with respect to the training set size, we illustrate the experimental results in Figs. 2-18. In order to see the trend of running time clearly, we draw the logarithmic relationship between the averaged running time and training number in these figures.
Averaged percentages of testing accuracy on 10 vector-based datasets.
Averaged percentages of testing accuracy on 10 vector-based datasets.
Averaged percentages of testing accuracy on 7 tensor-based datasets.
The averaged Gmeans of six algorithms on 10 vector-based datasets.
The averaged Gmeans of six algorithms on 7 tensor-based datasets.

(a) Test accuracy and (b) Time costing of six algorithms on Iris dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Heart dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Wine dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Australian dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Vote dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Lung dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Banknote dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Mushrooms dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Dbworld dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Dlbcl dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Letters dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Libras dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Robot dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Hand dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on ORL dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Eyes dataset with respect to training sample sizes, respectively.

(a) Test accuracy and (b) Time costing of six algorithms on Yale dataset with respect to training sample sizes, respectively.
To avoid unnecessary repetition, we give an overall comparison on these datasets. From these results, we have the following observations: From the perspective of testing accuracy, our Rν-TSTM is obviously higher than the other five algorithms in most cases. It is also clear that tensor-based algorithms have a better performance compared with vector-based algorithms in most cases, i.e. STM is better than SVM; ν-TSTM is better than ν-TSVM; Rν-TSTM is better than Rν-TSVM. From Figs. 2-18(a), we get that as the training samples increase, the testing accuracies of the six algorithms arise. Our proposed Rν-TSTM performs best, followed by Rν-TSVM, ν-TSTM, ν-TSVM, STM and SVM in most cases. However, the proposed Rν-TSTM performs worse than Rν-TSVM on datasets Banknote, Mushrooms, and Letters (seen Fig. 8, Fig. 9 and Fig. 12) which have larger training samples compared with the number of attributes, which indicate that tensor-based algorithms are not suitable for this kind of datasets. It is clear that tensor-based algorithms STM, ν-TSTM and Rν-TSTM take more time than vector-based algorithms SVM, ν-TSVM and Rν-TSVM in most cases. While in the 3 tensor-based algorithms, our Rν-TSTM and ν-TSTM spend less time compared to STM in the majority of cases. It is worthwhile to find that on the datasets with larger dimension (attribute), tensor-based algorithms Rν-TSTM and ν-TSTM take less time compared with vector-based algorithms Rν-TSVM and ν-TSVM, such as on the datsets Dbworld (Fig. 10(b)), Dlbcl(Fig. 11(b)), ORL(Fig. 16(b)), Eyes(Fig. 18(b)) and Yale(Fig. 19(b)). Especially in Yale dataset, the dimension is 10,000 for vector-based algorithm, the solving speed of tensor-based algorithms becomes faster comparatively. The proposed Rν-TSTM takes the least running time except SVM. Besides, the proposed Rν-TSTM yields the best accuracy and Gmeans on these 5 datasets. That means our Rν-TSTM can deal with high dimensional and S3 problems effectively and efficiently. The Gmeans of Rν-TSTM is higher than Rν-TSVM under the same training number in most comparisons. We can get similar results when compare ν-TSTM with ν-TSVM, compare STM with SVM. The proposed Rν-TSTM scores best among the six compared algorithms in most cases, which also indicates our Rν-TSTM can deal with overfitting problem to a great extent.
Generally speaking, with the increase of training number, the testing accuracy, running time and Gmeans will increase in most cases. The proposed Rν-TSTM makes full use of structural information of data. Its computational cost is basically less than that of STM. It has evident advantages on dealing high dimensional and S3 problems. It has acceptable or better classification accuracy compared to SVM, STM, ν-TSVM, ν-TSTM and Rν-TSVM. It can overcome the overfitting problem to a great extent.
In this paper, we propose a novel tensor based algorithm termed as Rough margin-based ν-Twin Support Tensor Machine. The proposed Rν-TSTM uses tensor data as input data, and gives different penalties according to the samples’ positions. Compared with vector-based algorithms, Rν-TSTM utilizes the data structural information sufficiently, which is crucial and beneficial to get better classification performance. It avoids overfitting problem to a great extent and especially suits for high dimensional and small sample size problems. As respected, the computational experiments on 17 datasets testified the mentioned advantages.
