Rough margin-based ν -twin support tensor machine in pattern recognition

Abstract

In Rough margin-based ν-Twin Support Vector Machine (Rν-TSVM) algorithm, the rough theory is introduced. Rν-TSVM gives different penalties to the corresponding misclassified samples according to their positions, so it avoids the overfitting problem to some extent. While the input data is a tensor, Rν-TSVM cannot handle it directly and may not utilize the data information effectively. Therefore, we propose a novel classifier based on tensor data, termed as Rough margin-based ν-Twin Support Tensor Machine (Rν-TSTM). Similar to Rν-TSVM, Rν-TSTM constructs rough lower margin, rough upper margin and rough boundary in tensor space. Rν-TSTM not only retains the superiority of Rν-TSVM, but also has its unique advantages. Firstly, the data topology is retained more efficiently by the direct use of tensor representation. Secondly, it has better classification performance compared to other classification algorithms. Thirdly, it can avoid overfitting problem to a great extent. Lastly, it is more suitable for high dimensional and small sample size problem. To solve the corresponding optimization problem in Rν-TSTM, we adopt the alternating iteration method in which the parameters corresponding to the hyperplanes are estimated by solving a series of Rν-TSVM optimization problem. The efficiency and superiority of the proposed method are demonstrated by computational experiments.

Keywords

Classification problem rough margin tensor learning

1 Introduction

Pattern recognition problems are often encountered in practical fields, and their learning styles become various, such as deterministic learning, dictionary learning [1 –3]. Support Vector Machine (SVM) [4] is one of the powerful classifier for pattern recognition problem today. It obtains the optimal hyperplane by maximizing the margin between two parallel boundary hyperplanes. After solving the Quadratic Programming Problem (QPP), the optimal solution is obtained, which is also the unique global solution. Although SVM has many advantages, it has many challenges, such as high computational complexity. Jayadeva et al. [5] proposed a Twin Support Vector Machine (TSVM) to improve the solving speed. TSVM seeks two nonparallel proximal hyperplanes by solving two smaller-sized QPPs while SVM solves a larger one. Hence TSVM works faster than SVM. By introducing parameter ν, Peng proposed a ν-TSVM [6]. The parameter ν is used to control the fractions of support vectors and margin errors. Later, many variants of TSVM [7 –13] have been proposed to further improve its solving speed or generalization performance.

However, the aforementioned algorithms are all based on vector space. When the tensor is considered as input data, the traditional vector-based algorithms cannot handle it directly and effectively. In pattern recognition, machine learning, image processing and other fields, lots of the original objects are represented as tensor form [14]. For example, gray face image is represented by the second order tensor [15]; color image is expressed as third order tensor [16]. Although there are methods converting tensor directly into vector, it may cause structural information lose and data correlation damage [17]. Besides, it often leads to overfitting, curse of dimensionality problem and Small Sample Size (S3) problem.

For solving aforementioned problems, Cai proposed Support Tensor Machine (STM) [18, 19] which can deal with the input image directly without vectorization. Besides, the experimental results also verified that the classification accuracy of STM is superior to that of traditional SVM, and STM is especially suitable for S3 problem. Zhang et al. proposed a twin version of STM, and applied it for micro calcification clusters detection [20]. Later, some researchers studied the performance of twin STMs [21 –23]. Kotsia et al. formulated the higher rank STMs [24], in which the parameters defining the separating hyperplane form a tensor that was constrained to be the sum of rank one tensors. Khemchandani et al. introduced a proximal STM [25] by solving a system of linear equations in each iteration, which greatly saved the running time. Jiang et al. outlined a novel learning frameworks-multiple rank multi-linear twin support matrix classification machine [26]. For one-class classification problem, a one-class STM [27] and a one-class Support Higher Order Tensor Machine [28] were proposed. Ye presented a kernel support matrix machine which performs a matrix-form inner product with maximum margin classifier [29]. For tensor regression problem, Shu and Yang proposed a Least Square Support Tensor Regression machine [30] and used a fixed point algorithm to solve it. The tensor-based algorithms have been applied in numerous fields, such as pedestrian detection [31], image classification [32], crowd density estimation [33], fault diagnosis [34], and so on.

The Rough margin-based ν-Twin Support Vector Machine (Rν-TSVM) [8] is an improved algorithm of ν-TSVM. When constructing one hyperplane of ν-TSVM, it considers a large number of samples of this class and only fewer samples in the other class, which easily leads to over-fitting problem. Besides, it gives same penalties to each misclassified samples. By introducing the rough set theory, the Rν-TSVM can effectively overcome these two shortcomings. As mentioned above, because the Rν-TSVM is a vector-based algorithm, it cannot deal with tensor data directly.

Based on the aforementioned works, in this paper, we propose a new tensor-based algorithm, called Rough margin-based ν-Twin Support Tensor Machine (Rν-TSTM). The main idea of Rν-TSTM is to find two nonparallel hyperplane in tensor space by solving a pair of smaller-sized QPPs, which can greatly reduce its computational complexity substantially compared with STM. Similar to Rν-TSVM, our Rν-TSTM firstly constructs rough lower margin, rough upper margin, and rough boundary. Then it gives different penalties to the misclassified samples according to their positions. Therefore, Rν-TSTM retains the superiorities of Rν-TSVM. Unlike Rν-TSVM, the proposed Rν-TSTM retains the data topology efficiently by the direct use of tensor representation. It has acceptable or better classification performance compared to SVM, STM, ν-TSVM, ν-TSTM and Rν-TSVM. What’s more, it can avoid overfitting problem to a great extent, and is more suitable for high dimensional and S3 problems. Besides, we analyze its theoretical interpretation of parameters in tensor space. We do computational experiments on 17 different kinds of datasets to verify the efficiency and superiority of the proposed method.

The remainder of the paper is organized as follows. In Section 2, we give a brief review of ν-TSVM. In Section 3, we introduce our Rν-TSTM algorithm, theoretical derivation, its solving method, theoretical interpretation and generalized form. The experimental results on various datasets are presented in Section 4. Finally, we make conclusions in Section 5.

2 ν-Twin Support Vector Machine

The ν-TSVM [6] is an improved algorithm of TSVM. The parameters ν₁ and ν₂ in ν-TSVM are used to control the bounds of the fractions of the support vectors and the margin errors. Suppose the training dataset is: T = {(x₁, + 1) , ⋯ , (x_p, + 1) , (x_p+1, - 1) , ⋯, (x_p+q, - 1)}, where $x_{i} \in ℝ^{n}$ , with p samples belonging to class +1 and q samples belonging to class -1. Let matrix $A \in ℝ^{p \times n}, B \in ℝ^{q \times n}$ stand for the positive and negative samples, respectively.

For linear case, ν-TSVM seeks the following pair of nonparallel hyperplanes: $〈 w_{+}, x 〉 + b_{+} = 0 and 〈 w_{-}, x 〉 + b_{-} = 0$ (1) where $w_{+}, w_{-} \in ℝ^{n}$ and $b_{+}, b_{-} \in ℝ$ , such that each hyperplane is closer to one class and is as far as possible from the other. The ν-TSVM is acquired by solving the following pair of QPPs. $\begin{matrix} min_{w_{+}, b_{+}, ρ_{+}, ξ_{-}} \frac{1}{2} {∥ A w_{+} + e_{+} b_{+} ∥}^{2} - ν_{1} ρ_{+} + \frac{1}{q} e_{-}^{T} ξ_{-} \\ s . t . - (B w_{+} + e_{-} b_{+}) \geq ρ_{+} e_{-} - ξ_{-}, \end{matrix}$ (2) $ρ_{+} \geq 0, ξ_{-} \geq 0,$ and $\begin{matrix} min_{w_{-}, b_{-}, ρ_{-}, ξ_{+}} \frac{1}{2} {∥ B w_{-} + e_{-} b_{-} ∥}^{2} - ν_{2} ρ_{-} + \frac{1}{p} e_{+}^{T} ξ_{+} \\ s . t . A w_{-} + e_{+} b_{-} \geq ρ_{-} e_{+} - ξ_{+}, \end{matrix}$ (3) $ρ_{-} \geq 0, ξ_{+} \geq 0,$ where $ξ_{+} \in ℝ^{p}, ξ_{-} \in ℝ^{q}$ are slack variables; ν₁, ν₂ are parameters; ρ₊, ρ_- are additional variables; $e_{+} \in ℝ^{p}, e_{-} \in ℝ^{q}$ are vectors of ones. By solving the dual problems of the above two QPPs, the optimal (w₊, b₊) and (w_-, b_-) can be obtained. A new sample x is assigned by

$class k = arg min_{k = +, -} \frac{| 〈 w_{(k)}, x 〉 + b_{(k)} |}{∥ w_{(k)} ∥} .$ (4)

In linear ν-TSVM, if the number positive and negative samples are equal, i.e. p = q = l/2 , then its computational complexity is: $2 \times (O (n {(l / 2)}^{2}) + O (n {(l / 2)}^{3}) = \frac{1}{4} O (2 n l^{2} + l^{3})$ .

The most obvious defect of ν-TSVM is that when constructing the classification plane for one class, a large number of samples of this class are considered in the objective function and it ignores most samples in the other class. Different samples have different effects on the decision of the hyperplane. Rough margin-based ν-TSVM (Rν-TSVM) [8] was proposed to overcome these disadvantages.

3 A Rough margin-based ν-Twin Support Tensor Machine

The Rν-TSVM algorithm aims to learn two non-parallel hyperplanes by solving two smaller-sized QPPs. Because different points in different positions have different effects on the separating hyperplane, Rν-TSVM constructs rough lower margin, rough upper margin and rough boundary, and gives different penalties to the misclassified samples according to their positions.

Our Rough margin-based ν-Twin Support Tensor Machine (Rν-TSTM) is fundamentally based on the same idea. In this section, we propose a new Rν-TSTM based on tensor space, which can utilize the structural information of tensor directly. Firstly, we establish the model of the 2nd-order Rank-one Rν-TSTM, then its algorithm is given in detail. Secondly, we give its theoretical interpretation and a generalized form. Thirdly, we analyze its convergence and computaion complexity. Finally, the high-order Rank-one Rν-TSTM is concerned.

3.1 Rν-TSTM and its algorithm

For binary classification of tensor data, suppose we are given the training dataset: T = {(X₁, + 1) , …, (X_p, + 1) , (X_p+1, - 1) , …, (X_p+q, - 1)}, where $X_{i} \in ℝ^{n_{1}} \otimes ℝ^{n_{2}}$ represents the 2nd-order tensor (matrix). The proposed Rν-TSTM seeks the following pair of nonparallel positive and negative hyperplanes:

$u^{T} X v + b_{+} = 0 and {\tilde{u}}^{T} X \tilde{v} + b_{-} = 0$ (5) where $u, \tilde{u} \in ℝ^{n_{1}}$ , $v, \tilde{v} \in ℝ^{n_{2}}$ and $b_{+}, b_{-} \in ℝ$ , such that each hyperplane is as close as possible to one class and as far as possible from the other class.

The two QPPs of second-order Rank-one Rν-TSTM are represented as follows: $\begin{array}{l} \min \frac{1}{2} \sum_{i = 1}^{p} {(u^{T} X_{i} v + b_{+})}^{2} - ν_{1} (ρ_{1} + ρ_{2}) \\ + \frac{1}{q} \sum_{j = p + 1}^{p + q} ξ_{j}^{(1)} + \frac{σ_{1}}{q} \sum_{j = p + 1}^{p + q} ξ_{j}^{(2)} \\ s . t . - (u^{T} X_{j} v + b_{+}) \geq ρ_{2} - ξ_{j}^{(1)} - ξ_{j}^{(2)} \\ 0 \leq ξ_{j}^{(1)} \leq ρ_{2} - ρ_{1}, ξ_{j}^{(2)} \geq 0, \\ ρ_{1} + ρ_{2} \geq 0, ξ_{j} \geq 0, j = p + 1, \dots, p + q \end{array}$ (6) and $\begin{matrix} min & \frac{1}{2} \sum_{j = p + 1}^{p + q} {({\tilde{u}}^{T} X_{j} \tilde{v} + b_{-})}^{2} - ν_{2} (ρ_{3} + ρ_{4}) \\ + \frac{1}{p} \sum_{i = 1}^{p} ξ_{i}^{(3)} + \frac{σ_{2}}{p} \sum_{i = 1}^{p} ξ_{i}^{(4)} \\ s . t . & ({\tilde{u}}^{T} X_{i} \tilde{v} + b_{-}) \geq ρ_{4} - ξ_{i}^{(3)} - ξ_{i}^{(4)}, \\ 0 \leq ξ_{i}^{(3)} \leq ρ_{4} - ρ_{3}, ξ_{i}^{(4)} \geq 0, \end{matrix}$ (7) $ρ_{3} + ρ_{4} \geq 0, i = 1, \dots, p$ where p, q denote the numbers of positive samples and negative samples respectively; σ₁, σ₂ > 1 are penalty coefficients which are given prior; $ξ_{j}^{(i)} (i = 1, \dots, 4)$ are slack variables; ν₁, ν₂ are parameters; ρ_i (i = 1, ⋯ , 4) are additional variables.

We use the positive QPP (6) to understand the meaning of variables, the corresponding explanation is shown in Fig. 1. The black line denotes the positive hyperplane u^TXv + b₊ = 0. The negative training samples in the lower margin (corresponding to the positive region, u^TXv + b₊ > - ρ₁) are outliers which are given greater penalties; outside the upper margin (corresponding to the negative region, u^TXv + b₊ < - ρ₂) are correctly classified, which are given no penalties; inside the rough boundary are possible outliers for negative samples, which are given minor penalties.

Fig. 1

Illustration for QPP (6) in Rν-TSTM.

To make the two QPPs of Rν-TSTM simpler and easier to be solved, we transform the positive and negative training samples into 3rd-order tensors, and use $A$ and $B$ to represent, respectively. Then, the two QPPs can be denoted as follows: $\begin{matrix} min & \frac{1}{2} {∥ A \times u \times v + e_{+} b_{+} ∥}^{2} - ν_{1} (ρ_{1} + ρ_{2}) \\ + \frac{1}{q} e_{-}^{T} ξ_{1} + \frac{σ_{1}}{q} e_{-}^{T} ξ_{2} \\ s . t . & - (B \times u \times v + e_{-} b_{+}) \geq ρ_{2} e_{-} - ξ_{1} - ξ_{2}, \\ 0 \leq ξ_{1} \leq (ρ_{2} - ρ_{1}) e_{-}, ξ_{2} \geq 0, \end{matrix}$ (8) $ρ_{1}, ρ_{2} \geq 0,$ and

$\begin{matrix} min & \frac{1}{2} {∥ B \times \tilde{u} \times \tilde{v} + e_{-} b_{-} ∥}^{2} - ν_{2} (ρ_{3} + ρ_{4}) \\ + \frac{1}{p} e_{+}^{T} ξ_{3} + \frac{σ_{2}}{p} e_{+}^{T} ξ_{4} \\ s . t . & A \times \tilde{u} \times \tilde{v} + e_{+} b_{-} \geq ρ_{4} e_{+} - ξ_{3} - ξ_{4}, \\ 0 \leq ξ_{3} \leq (ρ_{4} - ρ_{3}) e_{+}, ξ_{4} \geq 0, \end{matrix}$ (9) $ρ_{3}, ρ_{4} \geq 0 .$ where $ξ_{1}, ξ_{2} \in ℝ^{q}$ , $ξ_{3}, ξ_{4} \in ℝ^{p}$ , and $e_{+} \in ℝ^{p}, e_{-} \in ℝ^{q}$ are vectors of ones.

On account of the similar structure of the QPP (8) and (9), we only describe the solving process of the problem (8). By introducing the Lagrangian multipliers $α \in ℝ^{q}, β \in ℝ^{q}, γ \in ℝ^{q}, η \in ℝ^{q}, ς \in ℝ, τ \in ℝ$ , the Lagrangian function for QPP (8) can be written as follows,

$\begin{matrix} L & = \frac{1}{2} {∥ A \times u \times v + e_{+} b_{+} ∥}^{2} - ν_{1} (ρ_{1} + ρ_{2}) \\ + \frac{1}{q} e_{-}^{T} ξ_{1} + \frac{σ_{1}}{q} e_{-}^{T} ξ_{2} \\ - α^{T} [- (B \times u \times v + e_{-} b_{+}) - ρ_{2} e_{-} + ξ_{1} + ξ_{2}] \\ - β^{T} ξ_{1} + γ^{T} (ξ_{1} - (ρ_{2} - ρ_{1}) e_{-}) \\ - η^{T} ξ_{2} - ς ρ_{2} - τ ρ_{1} . \end{matrix}$ (10)

Setting the derivatives with respect to the primal variables u, v, b₊, ρ₁, ρ₂, ξ₁, ξ₂ equal to zero, we get $\begin{matrix} \partial L / \partial u = (A \times v) (A \times u \times v + e_{+} b_{+}) \\ + (B \times v) α = 0, \end{matrix}$ (11) $\begin{matrix} \partial L / \partial v = (A \times u) (A \times u \times v + e_{+} b_{+}) \\ + (B \times u) α = 0, \end{matrix}$ (12) $\partial L / \partial b_{+} = e_{+}^{T} (A \times u \times v + e_{+} b_{+}) + e_{-}^{T} α = 0,$ (13) $\partial L / \partial ρ_{1} = e_{-}^{T} γ - ν_{1} - τ = 0,$ (14) $\partial L / \partial ρ_{2} = e_{-}^{T} α - e_{-}^{T} γ - ν_{1} - ς = 0,$ (15) $\partial L / \partial ξ_{1} = 1 / q \cdot e_{-} - α - β + γ = 0,$ (16) $\partial L / \partial ξ_{2} = σ_{1} / q \cdot e_{-} - α - η = 0 .$ (17)

From Eq.(11) and Eq.(12), we get that u and v are interdependent. The traditional methods cannot solve the problem, so we resort to the Alternate Iterating Algorithm, which is also used for solving STM [19] and twin STM [21].

Firstly, we initialize u. Let $A_{1} = A \times u$ and $B_{1} = B \times u$ . Then QPP (8) can be transformed as follows: $\begin{matrix} min & \frac{1}{2} {∥ A_{1} v + e_{+} b_{+} ∥}^{2} - ν_{1} (ρ_{1} + ρ_{2}) \\ + \frac{1}{q} e_{-}^{T} ξ_{1} + \frac{σ_{1}}{q} e_{-}^{T} ξ_{2} \\ s . t . & - (B_{1} v + e_{-} b_{+}) \geq ρ_{2} e_{-} - ξ_{1} - ξ_{2}, \\ 0 \leq ξ_{1} \leq (ρ_{2} - ρ_{1}) e_{-}, ξ_{2} \geq 0, \end{matrix}$ (18) $ρ_{1}, ρ_{2} \geq 0 .$ For solving (18), we consider its Lagrangian function

$\begin{matrix} L_{1} = & \frac{1}{2} {∥ A_{1} v + e_{+} b_{+} ∥}^{2} - ν_{1} (ρ_{1} + ρ_{2}) \\ + \frac{1}{q} e_{-}^{T} ξ_{1} + \frac{σ_{1}}{q} e_{-}^{T} ξ_{2} \\ - α^{T} [- (B_{1} v + e_{-} b_{+}) - ρ_{2} e_{-} + ξ_{1} + ξ_{2}] \\ - β^{T} ξ_{1} + γ^{T} (ξ_{1} - (ρ_{2} - ρ_{1}) e_{-}) \\ - η^{T} ξ_{2} - ς ρ_{2} - τ ρ_{1} . \end{matrix}$ (19) Based on Karush-Kuhn-Tucker (KKT) conditions, $\partial L_{1} / \partial v = A_{1}^{T} (A_{1} v + e_{+} b_{+}) + B_{1}^{T} α = 0,$ (20) $\partial L_{1} / \partial b_{+} = e_{+}^{T} (A_{1} v + e_{+} b_{+}) + e_{-}^{T} α = 0,$ (21) we can get

$[\begin{matrix} A_{1}^{T} \\ e_{+}^{T} \end{matrix}] [A_{1} e_{+}] [\begin{matrix} v \\ b_{+} \end{matrix}] + [\begin{matrix} B_{1}^{T} \\ e_{-}^{T} \end{matrix}] α = 0 .$ (22) Combing Eq.(14) and Eq.(15), from ς > 0, τ > 0, we have

$e_{-}^{T} α = 2 ν_{1} + ς + τ \geq 2 ν_{1} .$ (23) From Eq.(16) and Eq.(17), we also get

$0 \leq α \leq \frac{σ_{1}}{q} e_{-} .$ (24) Then the dual problem of QPP (18) can be obtained: $\begin{matrix} min_{α} \frac{1}{2} α^{T} G_{1} {(H_{1}^{T} H_{1})}^{- 1} G_{1}^{T} α \\ s . t . 0 \leq α \leq \frac{σ_{1}}{q} e_{-}, \end{matrix}$ (25) $e_{-}^{T} α \geq 2 ν_{1},$ where H₁ = [A₁ e₊] and G₁ = [B₁ e_-]. After solving optimization problem (25), we can obtain the lagrangian multipliers α , and get (v, b₊):

${[v^{T} b_{+}]}^{T} = - {(H_{1}^{T} H_{1})}^{- 1} G_{1}^{T} α .$ (26)

Secondly, once the value v is acquired, we can further solve the value u with similar steps.

Let $A_{2} = (A \times v)^{T}$ and $B_{2} = (B \times v)^{T}$ , according to QPP (8), u can be calculated by solving the following QPP: $\begin{matrix} min & \frac{1}{2} {∥ A_{2} u + e_{+} b_{+} ∥}^{2} - ν_{1} (ρ_{1} + ρ_{2}) \\ + \frac{1}{q} e_{-}^{T} ξ_{1} + \frac{σ_{1}}{q} e_{-}^{T} ξ_{2} \\ s . t . & - (B_{2} u + e_{-} b_{+}) \geq ρ_{2} e_{-} - ξ_{1} - ξ_{2}, \\ 0 \leq ξ_{1} \leq (ρ_{2} - ρ_{1}) e_{-}, ξ_{2} \geq 0, \end{matrix}$ (27) $ρ_{1}, ρ_{2} \geq 0 .$ In an exactly same way, we can derive its dual problem as follows, $\begin{matrix} min_{\hat{α}} \frac{1}{2} {\hat{α}}^{T} G_{2} {(H_{2}^{T} H_{2})}^{- 1} G_{2}^{T} \hat{α} \\ s . t . 0 \leq \hat{α} \leq \frac{σ_{1}}{q} e_{-}, \end{matrix}$ (28) $e_{-}^{T} \hat{α} \geq 2 ν_{1},$ where H₂ = [B₂ e_-], G₂ = [A₂ e₊], and $\hat{α}$ is corresponding Lagrange multiplier vector. Similarly, (u, b₊) can be obtained by

${[u^{T} b_{+}]}^{T} = - {(H_{2}^{T} H_{2})}^{- 1} G_{2}^{T} \hat{α} .$ (29)

By the similar way, we can obtain the optimal solution $(\tilde{u}, \tilde{v}, b_{-})$ for QPP (9). The decision function of RνTSTM is as follows:

$\begin{matrix} Class k = arg min_{k = +, -} {| u^{T} Xv + b_{+} | / ∥ u^{T} v ∥, \\ | {\tilde{u}}^{T} X \tilde{v} + b_{-} | / ∥ {\tilde{u}}^{T} \tilde{v} ∥} . \end{matrix}$ (30)

The flowchart of Rν-TSTM is described as follows.

Algorithm 1. The Alternating Iteration method for Rν-TSTM.

Inputs: The parameters ν₁, ν₂, σ₁, σ₂, the maximum number of iteration, training samples $X_{i} \in ℝ^{n_{1}} \otimes ℝ^{n_{2}} (i = 1, \dots, p + q)$ and testing samples $X_{j} \in ℝ^{n_{1}} \otimes ℝ^{n_{2}} (j = 1, \dots, m)$ .

Outputs: The optimal solutions (u, v, b₊) and $(\tilde{u}, \tilde{v}, b_{-})$ , the labels of testing samples.

Step 1: Initialization. Let u^t = (1, …, 1) ^T, ${\tilde{u}}^{t} = {(1, \dots, 1)}^{T}$ , t = 0 and ɛ > 0.

Step 2: Calculate (u, b₊). By solving (25) with u = u^t, we can get α ^t, then $(v^{t}, b_{+}^{t})$ can be obtained by solving Eq.(26) with α = α ^t.

Step 3: Update (u, b₊). After acquiring u = u^t in Step 2, ${\hat{α}}^{t}$ can be obtained by solving QPP (28). Solving Eq. (29) with ${\hat{α}}_{+} = {\hat{α}}_{+}^{t}$ , we get (u, b₊).

Step 4: Update (v, b₊), compute u and v by the Alternate Iterating from Step 2 ∼ 3. If the following conditions: ∥u^t - u^t-1 ∥ ≤ ɛ, ∥ v^t - v^t-1 ∥ ≤ ɛ and $‖ b_{+}^{t} - b_{+}^{t - 1} ‖ \leq ε$ are satisfied simultaneously, or the iteration number exceeds the maximum number, the iteration will be terminated. Then the optimal solutions ( $u^{*}, v^{*}, b_{+}^{*}$ ) are obtained. Otherwise, set t = t + 1 and return to Step 2.

Step 5: Do the similar steps 2 ∼ 4, $({\tilde{u}}^{*}, {\tilde{v}}^{*}, b_{-}^{*})$ can be acquired.

Step 6: Calculate ∥uv^T ∥ ² and ${∥ \tilde{u} {\tilde{v}}^{T} ∥}^{2}$ .

Step 7: For the testing samples, output their labels by Eq.(30).

Step 8: End.

3.2 Theoretical interpretation of Rν-TSTM

In this subsection, we still use the first optimization problem of Rν-TSTM to interpret. After solving QPP (8), we can get its optimal solution $α^{*} = {(α_{p + 1}^{*}, α_{p + 2}^{*}, \dots, α_{p + q}^{*})}^{T}$ in the last iteration. The different values of α ^* are corresponding to the different positions of negative samples. The following equations are the complementary slack conditions of KKT: $α^{T} [- B \times u \times v - e_{-} b_{+} - ρ_{2} e_{-} + ξ_{1} + ξ_{2}] = 0,$ (31) $γ^{T} (ξ_{1} - (ρ_{2} - ρ_{1}) e_{-}) = 0,$ (32) $β^{T} ξ_{1} = 0,$ (33) $η^{T} ξ_{2} = 0 .$ (34) From the equations above, we can drive the following proposition.

Proposition 1. The negative samples locating in different positions have different values of $α_{j}^{*}, j = p + 1, \dots, p + q$ and they can be divided into five cases.

If $α_{j}^{*} = 0$ , which means $ξ_{j}^{(1)}, ξ_{j}^{(2)} = 0$ , then the negative samples locate under the rough upper margin and satisfy u^TXv + b₊ < - ρ₂. They are correctly classified negative samples.

If $0 < α_{j}^{*} < 1 / q$ , which means $ξ_{j}^{(1)}, ξ_{j}^{(2)} = 0$ , then the negative samples locate on the border of upper margin and satisfy u^TXv + b₊ = - ρ₂. They are the support tensors of negative class on the boundary of upper margin.

If $α_{j}^{*} = 1 / q$ , which means $ξ_{j}^{(1)} > 0$ , then the samples locate within the boundary of rough margin and satisfy -ρ₂ < u^TXv + b₊ < - ρ₁. They are the support tensors of negative class in the rough margin.

If $1 / q < α_{j}^{*} < σ_{1} / q$ , which means $ξ_{j}^{(1)}, ξ_{j}^{(2)} = 0$ , then the samples locate on the border of lower margin and satisfy u^TXv + b₊ = - ρ₁. They are the support tensors of negative class on the boundary of lower margin.

If $α_{j}^{*} = σ_{1} / q$ , which means $ξ_{j}^{(2)} > 0$ , then the samples locate in upside the boundary of lower margin and satisfy u^TXv + b₊ > - ρ₁. They are often the outliers or noises of the negative class.

According to KKT conditions and Proposition 1, we can calculate the parameter ρ₁, ρ₂, respectively.

$ρ_{2} = - \frac{1}{N_{1}} \sum_{j = 1}^{N_{1}} (u^{T} X_{j} v + b_{+}),$ (35) where N₁ denotes the number of negative support tensors satisfy $0 < α_{j}^{*} < 1 / q$ .

$ρ_{1} = - \frac{1}{N_{2}} \sum_{j = 1}^{N_{2}} (u^{T} X_{j} v + b_{+}),$ (36) where N₂ denotes the number of negative support tensors satisfy $1 / q < α_{j}^{*} < σ_{1} / q$ .

The following proposition gives the theoretical interpretation of (ν₁, σ₁).

Proposition 2. Suppose we run problem (8) with p + q samples on dataset T, and obtain the results ρ₁ and ρ₂. Denote q₁ be the corresponding number of support tensor in the negative class, q₂ be the number of negative samples locating in upside the boundary of lower margin (u^TXv + b₊ > - ρ₁). Then,

(1) we can get 2ν₁/σ₁ ≤ q₁/q, which means 2ν₁/σ₁ is a lower bound on the fraction of negative support tensors.

(2) we can get 2ν₁/σ₁ > q₂/q, which means 2ν₁/σ₁ is an upper bound on the fraction of entirely negative margin errors.

The proof of proposition 2 is similar to Ref. [8], here we will not give the detail.

Similarly, we can derive the theoretical interpretations for positive samples.

3.3 Convergence and computation complexity analysis

In this subsection, we analyze the convergence of Algorithm 1.

Theorem 1. Using Algorithm 1, one can find the optimal solutions $(u^{*}, v^{*}, b_{+}^{*})$ and $({\tilde{u}}^{*}, {\tilde{v}}^{*}, b_{-}^{*})$ , then Algorithm 1 is convergent.

Proof. Let f₁ (u, v, b₊) be the objective of QPP (6), then

$\begin{matrix} f_{1} (u, v, b_{+}) & = \frac{1}{2} \sum_{i = 1}^{p} {(u^{T} X_{i} v + b_{+})}^{2} \\ - ν_{1} (ρ_{1} + ρ_{2}) + \frac{1}{q} \sum_{j = p + 1}^{p + q} {ξ_{j}}^{(1)} \end{matrix} + \frac{σ_{1}}{q} \sum_{j = p + 1}^{p + q} {ξ_{j}}^{(2)}$ (37) According to the Step 1 of Algorithm 1, namely, the given initial value u⁰ = u, we can get the optimal solutions $(v^{0}, b_{+}^{0})$ by solving QPP (35). Then by solving QPP (45) with v = v⁰, we can obtain the optimal solutions $(u^{1}, b_{+}^{1^{'}})$ . And so on, we get a monotone decreasing sequences with lower bounds: $\begin{matrix} f_{1} (u^{0}, v^{0}, b_{+}^{0}) \geq f_{1} (u^{1}, v^{0}, b_{+}^{1^{'}}) \geq f_{1} (u^{1}, v^{1}, b_{+}^{1}) \\ \geq f_{1} (u^{2}, v^{1}, b_{+}^{2^{'}}) \geq \dots \geq 0 . \end{matrix}$ (38) It is clear that there must be a limit for any monotone decreasing sequence with lower bound, therefore, there exists the optimal solutions $(u^{*}, v^{*}, b_{+}^{*})$ such that $\begin{matrix} f_{1} (u^{0}, v^{0}, b_{+}^{0}), f_{1} (u^{1}, v^{0}, b_{+}^{1^{'}}), f_{1} (u^{1}, v^{1}, b_{+}^{1}), \\ f_{1} (u^{2}, v^{1}, b_{+}^{2^{'}}), \dots \to f_{1} (u^{*}, v^{*}, b_{+}^{*}) . \end{matrix}$ (39) Because QPP (6) is a convex optimization problem, $(u^{*}, v^{*}, b_{+}^{*})$ can be seen as its globally optimal solutions.

Similarly, we can analyze QPP (7) and get the same conclusion. Therefore, Algorithm 1 is convergent. □

In order to discuss the computational complexity of Algorithm 1, we firstly assume that the number of samples in the two classes is approximately equal, and set p = q = l/2. In addition, we set n₁ = n₂ = n. At each iteration, the algorithm need to calculate a dual matrix and a QPP, which spend at least $2 \times (O (n {(l / 2)}^{2}) + O (n {(l / 2)}^{3})) = \frac{1}{4} O (2 n l^{2} + l^{3})$ for linear Rν-TSTM. Then the total complexity equals the number of iteration multiplying the computational complexity of each iteration.

3.4 The general algorithm for Rν-TSTM

The input data of linear Rν-TSTM described above is second order. In this subsection, we briefly describe a generalized high-order form for Rν-TSTM. Let ${X_{i}}, i = 1, \dots, p + q$ denote the training samples, where $X_{i} \in ℝ^{n_{1}} \otimes \dots \otimes ℝ^{n_{K}}$ is the Kth order tensor. Based on the following tensor operations: $\begin{matrix} 〈 W_{+}, X_{i} 〉 & = 〈 a_{1} \otimes \dots \otimes a_{K}, X_{i} 〉 \\ = X_{i} \times a_{1} \times \dots \times a_{K} \end{matrix}$ (40) $\begin{matrix} 〈 W_{-}, X_{i} 〉 & = 〈 {\tilde{a}}_{1} \otimes \dots \otimes {\tilde{a}}_{K}, X_{i} 〉 \\ = X_{i} \times {\tilde{a}}_{1} \times \dots \times {\tilde{a}}_{K} \end{matrix}$ (41) where $a_{i}, {\tilde{a}}_{i} \in ℝ^{n_{i}}, i = 1, \dots, K$ . The optimization problems of High-order rank-one Rν-TSTM are as follows: $\begin{array}{l} \min \frac{1}{2} \sum_{i = 1}^{p} {(X_{i} \times a_{1} \times \dots \times a_{K} + b_{+})}^{2} \\ - ν_{1} (ρ_{1} + ρ_{2}) + \frac{1}{q} \sum_{j = p + 1}^{p + q} ξ_{j}^{(1)} + \frac{σ_{1}}{q} \sum_{j = p + 1}^{p + q} ξ_{j}^{(2)} \\ s . t . - (X_{j} \times a_{1} \times \dots \times a_{K} + b_{+}) \geq ρ_{2} - ξ_{j}^{(1)} - ξ_{j}^{(2)} \\ 0 \leq ξ_{j}^{(1)} \leq ρ_{2} - ρ_{1}, ξ_{j}^{(2)} \geq 0 \end{array}$ (42) $ρ_{1} + ρ_{2} \geq 0, ξ_{j} \geq 0, j = p + 1, \dots, p + q$ and $\begin{array}{l} \min \frac{1}{2} \sum_{j = p + 1}^{p + q} {(X_{j} \times {\tilde{a}}_{1} \times \dots \times {\tilde{a}}_{K} + b_{-})}^{2} \\ - ν_{2} (ρ_{3} + ρ_{4}) + \frac{1}{p} \sum_{i = 1}^{p} ξ_{i}^{3} + \frac{σ_{2}}{p} \sum_{i = 1}^{p} ξ_{i}^{(4)} \\ s . t . (X_{i} \times {\tilde{a}}_{1} \times \dots \times {\tilde{a}}_{K} + b_{-}) \geq ρ_{4} - ξ_{i}^{(3)} - ξ_{i}^{(4)} \\ 0 \leq ξ_{i}^{(3)} \leq ρ_{4} - ρ_{3}, ξ_{i}^{(4)} \geq 0 \end{array}$ (43) $ρ_{3} + ρ_{4} \geq 0, i = 1, \dots, p$

Similarly, we can use the Alternating Iteration method to obtain the optimal solutions.

4 Experimental results

In this section, we compare the performance of Rν-TSTM with tensor-based algorithms STM and ν-TSTM, and vector-based algorithms SVM, ν-TSVM and Rν-TSVM. We firstly evaluate tensor-based algorithms on Australian dataset for a trial of detailed discussion on the impact of different tensor sizes. Then we give an overall comparison on 10 vector-based and 7 tensor-based datasets.

4.1 Datasets and preparation of experiments

All the datasets come from UCI 1 and ORL 2 database. Table 1 includes the detailed information of these datasets, where the first 10 datasets are vector-based and the last 7 are tensor-based. In all experiments, the 17 datasets are used to construct the binary classification problems. We randomly choose two classes from the multi-class dataset.

Table 1
Description of datasets

Datasets Samples Positive Negative Attribute Matrix-size

Iris 150 50 100 4 (2,2)

Heart 270 120 150 13 (3,5)

Wine 130 59 71 13 (3,5)

Australian 690 383 307 14 (4,4)

Vote 435 168 267 16 (4,4)

Lung 23 10 13 56 (7,8)

Banknote 1372 762 610 4 (2,2)

Mushrooms 8124 3916 4208 112 (14,8)

Dbworld 64 35 29 4702 (157,30)

Dlbcl 58 32 26 7129 (115,62)

Letters 1555 789 766 (4,4) (4,4)

Libras 48 24 24 (45,2) (45,2)

Robot 47 20 27 (15,6) (15,6)

Hand 40 20 20 (16,16) (16,16)

ORL32 20 10 10 (32,32) (32,32)

Eyes 80 40 40 (84,56) (84,56)

Yale 22 11 11 (100,100) (100,100)

Datasets	Samples	Positive	Negative	Attribute	Matrix-size
Iris	150	50	100	4	(2,2)
Heart	270	120	150	13	(3,5)
Wine	130	59	71	13	(3,5)
Australian	690	383	307	14	(4,4)
Vote	435	168	267	16	(4,4)
Lung	23	10	13	56	(7,8)
Banknote	1372	762	610	4	(2,2)
Mushrooms	8124	3916	4208	112	(14,8)
Dbworld	64	35	29	4702	(157,30)
Dlbcl	58	32	26	7129	(115,62)
Letters	1555	789	766	(4,4)	(4,4)
Libras	48	24	24	(45,2)	(45,2)
Robot	47	20	27	(15,6)	(15,6)
Hand	40	20	20	(16,16)	(16,16)
ORL32	20	10	10	(32,32)	(32,32)
Eyes	80	40	40	(84,56)	(84,56)
Yale	22	11	11	(100,100)	(100,100)

All algorithms are written in Matlab 2014a. These programs are operated on 3.60GHz Inter Core i3-4160 Duo CPU with 4.0GB RAM. We use 10-fold cross-validation on the training set to find the best parameters. Namely, we randomly select the subsets from the datasets to train the model, and the rest data is used for testing. This process is repeated ten times.

There are 5 running parameters: C, ν₁, ν₂, σ₁ and σ₂. The parameters C in SVM and STM are searched from {2^-10, 2^-9, …, 2¹⁰}. We set ν₁ = ν₂ = ν in the ν-TSVM, ν-TSTM, Rν-TSVM and Rν-TSTM, which are searched in the range {0.1, 0.2, …, 0.9}. We set σ₁ = σ₂ = σ and the parameters σ in Rν-TSVM and Rν-TSTM are tuned in the range {2¹, 2^1.5, …, 2⁵}.

We report the test accuracy to evaluate the performance of classifiers, its form is mean value μ and plus or minus the standard deviation σ, where:

$Accuracy = \frac{TP + TN}{TP + FP + TN + FN}$ (44) where TP, TN, FP and FN stand for numbers of points of true positive, true negative, false positive and false negative respectively.

To verify the effectiveness dealt with overfitting problem, we use the evaluation indexes: sensitivity and specificity, where: $Sensitivity = \frac{TP}{TP + FP};$ (45) $Specificity = \frac{TN}{TN + FN} .$ (46) The index “Gmeans” considers the sensitivity and specificity simultaneously, where:

$Gmeans = \sqrt{Sensitivity \cdot Specificity},$ (47) so we record the results of “Gmeans” for comparison. “Time” denotes the mean value of 10 experiments times, which unit is seconds and each experiment’s time consists of training time and testing time.

4.2 Experiments with different tensor sizes on Australian dataset

In this subsection, we use Australian dataset as an example, which has 383, 307 positive and negative samples respectively with 14 attributes, to discuss the impact of different tensor sizes on classification performance.

For a vector sample $x \in ℝ^{n}$ , it can be converted to the form of matrix (second order tensor) $X \in ℝ^{n_{1}} \otimes ℝ^{n_{2}}$ , where n₁ × n₂ ≈ n. Cai [18] proposed a method to make the conversion. Hence, there are five possible tensor sizes that converted from vector in Australian dataset. It is worthwhile to find which conversion is the best one. Our experiment records the testing accuracy on different tensor sizes and different training sizes using three different tensor-based algorithms. The results are displayed in Table 2. The bold form is the best one. From these results, we can know that when tensor size is 4 × 4, STM, ν-TSTM and Rν-TSTM all obtain outstanding and satisfactory performance.

Table 2
Averaged percentages of testing accuracy in different tensor sizes on Australian dataset.

P+N Tensor size 2×7 3×5 4×4 5×3 7×2

10+8 STM 70.85±5.99 72.60±7.29 71.34±4.51 72.41±5.68 68.42±5.11

ν-TSTM 80.35±0.59 72.61±0.21 80.57±0.59 71.98±3.71 68.63±2.94

Rν-TSTM 85.39±0.23 81.58±0.34 85.45±0.17 85.19±0.55 73.75±0.43

20+15 STM 74.38±5.15 75.02±4.59 76.49±7.16 76.37±7.13 70.78±3.37

ν-TSTM 79.86±0.35 78.11±4.76 84.01±4.76 76.43±1.24 67.73±0.15

Rν-TSTM 85.39±0.48 84.73±1.31 85.54±0.36 85.41±0.38 76.43±1.68

28+23 STM 76.85±2.59 77.17±5.47 81.05±5.48 82.39±4.52 70.11±3.14

ν-TSTM 77.48±0.71 72.82±5.85 83.79±0.24 76.82±2.77 62.14±2.13

Rν-TSTM 85.37±0.44 81.58±1.46 86.07±0.57 84.81±0.41 74.39±10.53

38+30 STM 79.26±4.68 78.47±8.41 82.70±4.23 80.35±5.59 72.61±4.59

ν-TSTM 70.47±3.43 77.04±6.24 82.01±0.78 73.75±4.31 62.14±2.14

Rν-TSTM 85.47±0.34 85.35±0.92 85.71±0.35 85.61±0.46 75.16±1.09

P+N	Tensor size	2×7	3×5	4×4	5×3	7×2
10+8	STM	70.85±5.99	72.60±7.29	71.34±4.51	72.41±5.68	68.42±5.11
	ν-TSTM	80.35±0.59	72.61±0.21	80.57±0.59	71.98±3.71	68.63±2.94
	Rν-TSTM	85.39±0.23	81.58±0.34	85.45±0.17	85.19±0.55	73.75±0.43
20+15	STM	74.38±5.15	75.02±4.59	76.49±7.16	76.37±7.13	70.78±3.37
	ν-TSTM	79.86±0.35	78.11±4.76	84.01±4.76	76.43±1.24	67.73±0.15
	Rν-TSTM	85.39±0.48	84.73±1.31	85.54±0.36	85.41±0.38	76.43±1.68
28+23	STM	76.85±2.59	77.17±5.47	81.05±5.48	82.39±4.52	70.11±3.14
	ν-TSTM	77.48±0.71	72.82±5.85	83.79±0.24	76.82±2.77	62.14±2.13
	Rν-TSTM	85.37±0.44	81.58±1.46	86.07±0.57	84.81±0.41	74.39±10.53
38+30	STM	79.26±4.68	78.47±8.41	82.70±4.23	80.35±5.59	72.61±4.59
	ν-TSTM	70.47±3.43	77.04±6.24	82.01±0.78	73.75±4.31	62.14±2.14
	Rν-TSTM	85.47±0.34	85.35±0.92	85.71±0.35	85.61±0.46	75.16±1.09

The experimental results indicate that the closer of n₁ and n₂, the better classification performance. Based on this principle, we establish the tensor sizes of the 10 vector-based datasets involved in this paper, which are also shown Table 1.

4.3 Experiments on various datasets

We evaluate the six algorithms: SVM, STM, ν-TSVM, ν-TSTM, Rν-TSVM, and Rν-TSTM on 10 vector datasets and 7 tensor datasets in Table 1. We focus on small training sets and record the testing accuracy and Gmeans with various training sample sizes. The averaged percentages of testing accuracy of the six algorithms on 10 vector datasets and on 7 tensor datasets are reported in Table 3 and Table 4, respectively. The averaged Gmeans of the six algorithms on 10 vector datasets and on 7 tensor datasets are reported in Table 5 and Table 6, respectively. Similarly, the bold values in Table 3 and Table 4 indicate the best ones. To get a more visual image of the classification performance and running time with respect to the training set size, we illustrate the experimental results in Figs. 2-18. In order to see the trend of running time clearly, we draw the logarithmic relationship between the averaged running time and training number in these figures.

Table 3
Averaged percentages of testing accuracy on 10 vector-based datasets.

Datasets Num SVM STM ν-TSVM ν-TSTM Rν-TSVM RνTSTM

Iris 2+3 100.00±0.00 99.72±0.67 99.79±0.47 100.00±0.00 99.93±0.22 100.00±0.00

3+5 100.00±0.00 100.00±0.00 99.86±0.30 100.00±0.00 100.00±0.00 100.00±0.00

4+8 100.00.±0.00 100.00±0.00 100.00±0.23 100.00±0.00 100.00±0.00 100.00±0.00

5+10 100.00±0.00 100.00±0.00 100.00±0.23 100.00±0.00 100.00±0.00 100.00±0.00

Heart 4+3 62.28±3.99 63.54±8.75 70.38±24.25 77.63±2.09 77.34±19.96 79.93±0.35

8+6 69.92±11.2 71.29±5.12 74.61±21.25 77.74±1.14 81.37±13.89 81.74±1.28

12+9 74.02±21.39 71.21±10.16 77.63±18.36 77.81±0.86 81.21±18.61 81.97±1.38

15+12 76.59±4.99 76.94±3.01 79.68±12.61 81.55±5.13 81.18±4.04 83.99±6.17

Wine 3+4 88.05±5.21 88.21±5.36 88.70±9.29 88.54±0.61 95.85±6.76 93.58±0.46

4+5 88.60±9.96 88.43±5.23 90.08±19.65 90.25±11.99 96.36±5.39 93.72±0.43

5+6 88.82±5.60 89.41±2.96 91.76±7.80 92.44±0.49 96.72±3.18 97.23±1.77

6+7 89.23±2.83 89.49±9.22 92.48±6.67 94.87±11.42 96.32±1.56 97.61±0.36

Australian 10+8 68.99±4.61 71.34±4.51 75.46±15.75 80.57±0.59 80.83±14.05 85.45±0.17

20+5 74.72±4.96 76.49±7.16 83.17±5.03 84.00±4.76 85.44±6.83 85.54±0.36

28+23 78.84±6.53 81.05±5.48 84.87±7.49 83.79±0.24 87.78±2.97 86.07±0.57

38+30 79.89±5.92 82.70±4.23 85.82±8.09 82.01±0.78 90.39±2.27 85.71±0.36

Vote 4+6 91.98±2.27 90.28±5.04 95.44±3.25 90.56±0.38 97.27±1.85 94.87±0.22

8+12 93.18±2.33 92.60±2.40 96.63±2.90 94.00±0.29 97.90±1.80 95.37±0.25

12+20 93.37±2.21 92.95±3.97 96.58±1.16 95.78±0.35 98.06±1.17 95.66±0.39

16+26 93.44±2.28 93.31±1.46 96.49±3.35 95.45±0.39 98.88±0.54 95.95±0.25

Lung 3+2 71.11±10.73 67.78±11.94 74.44±21.31 78.33±6.11 79.44±15.06 81.61±5.19

6+4 70.00±11.15 70.00±6.74 75.38±18.42 80.77±9.07 82.31±14.97 86.15±7.94

10+6 75.71±9.64 71.43±9.52 77.14±26.26 85.71±11.66 81.43±17.88 90.00±13.55

11+8 75.00±16.67 75.00±20.41 80.00±22.97 85.00±21.08 82.50±20.58 92.5±12.08

Banknote 19+15 95.72±1.54 92.03±6.55 97.53±0.51 96.34±0.42 97.50±0.61 97.58±0.49

38+30 95.64±3.56 92.63±8.52 97.56±0.41 96.62±0.21 97.88±0.55 97.10±0.12

56+48 95.53±2.36 94.07±3.36 97.59±0.46 96.70±0.42 97.77±0.51 95.96±0.18

76+61 92.08±6.91 95.27±4.32 97.55±0.38 96.43±0.42 97.62±0.35 96.83±0.21

Mushrooms 98+105 98.36±0.83 94.82±2.36 98.99±0.51 98.41±0.32 99.52±0.46 98.44±0.22

195+210 98.79±2.02 98.42±0.20 99.75±0.31 98.45±0.25 99.81±0.23 98.51±0.08

292+315 99.80±0.16 98.51±0.04 99.86±0.23 98.49±0.25 99.95±0.07 98.49±0.13

390+420 99.41±0.45 98.52±0.07 99.91±0.10 98.40±0.25 99.98±0.05 98.23±0.06

Dbworld 2+2 75.83±13.84 74.00±6.05 88.33±2.36 77.50±9.27 91.67±2.36 92.24±0.91

5+5 83.89±6.88 80.74±5.40 91.30±1.96 83.33±5.86 92.96±2.59 95.00±0.89

8+8 88.13±5.01 85.42±4.71 90.21±1.98 88.13±3.11 93.13±1.98 95.40±1.65

Dlbcl 2+2 49.63±5.91 51.30±4.19 54.63±8.52 51.85±7.25 58.33±19.58 71.67±1.52

3+3 50.77±7.26 53.27±6.48 53.27±6.48 54.04±6.75 67.69±13.83 68.27±2.90

4+4 51.00±6.13 54.20±6.49 55.20±7.32 52.20±6.70 70.40±11.65 74.40±2.80

5+5 53.96±2.68 55.21±2.99 53.33±5.99 52.08±7.28 60.83±24.92 70.62±3.47

Datasets	Num	SVM	STM	ν-TSVM	ν-TSTM	Rν-TSVM	RνTSTM
Iris	2+3	100.00±0.00	99.72±0.67	99.79±0.47	100.00±0.00	99.93±0.22	100.00±0.00
	3+5	100.00±0.00	100.00±0.00	99.86±0.30	100.00±0.00	100.00±0.00	100.00±0.00
	4+8	100.00.±0.00	100.00±0.00	100.00±0.23	100.00±0.00	100.00±0.00	100.00±0.00
	5+10	100.00±0.00	100.00±0.00	100.00±0.23	100.00±0.00	100.00±0.00	100.00±0.00
Heart	4+3	62.28±3.99	63.54±8.75	70.38±24.25	77.63±2.09	77.34±19.96	79.93±0.35
	8+6	69.92±11.2	71.29±5.12	74.61±21.25	77.74±1.14	81.37±13.89	81.74±1.28
	12+9	74.02±21.39	71.21±10.16	77.63±18.36	77.81±0.86	81.21±18.61	81.97±1.38
	15+12	76.59±4.99	76.94±3.01	79.68±12.61	81.55±5.13	81.18±4.04	83.99±6.17
Wine	3+4	88.05±5.21	88.21±5.36	88.70±9.29	88.54±0.61	95.85±6.76	93.58±0.46
	4+5	88.60±9.96	88.43±5.23	90.08±19.65	90.25±11.99	96.36±5.39	93.72±0.43
	5+6	88.82±5.60	89.41±2.96	91.76±7.80	92.44±0.49	96.72±3.18	97.23±1.77
	6+7	89.23±2.83	89.49±9.22	92.48±6.67	94.87±11.42	96.32±1.56	97.61±0.36
Australian	10+8	68.99±4.61	71.34±4.51	75.46±15.75	80.57±0.59	80.83±14.05	85.45±0.17
	20+5	74.72±4.96	76.49±7.16	83.17±5.03	84.00±4.76	85.44±6.83	85.54±0.36
	28+23	78.84±6.53	81.05±5.48	84.87±7.49	83.79±0.24	87.78±2.97	86.07±0.57
	38+30	79.89±5.92	82.70±4.23	85.82±8.09	82.01±0.78	90.39±2.27	85.71±0.36
Vote	4+6	91.98±2.27	90.28±5.04	95.44±3.25	90.56±0.38	97.27±1.85	94.87±0.22
	8+12	93.18±2.33	92.60±2.40	96.63±2.90	94.00±0.29	97.90±1.80	95.37±0.25
	12+20	93.37±2.21	92.95±3.97	96.58±1.16	95.78±0.35	98.06±1.17	95.66±0.39
	16+26	93.44±2.28	93.31±1.46	96.49±3.35	95.45±0.39	98.88±0.54	95.95±0.25
Lung	3+2	71.11±10.73	67.78±11.94	74.44±21.31	78.33±6.11	79.44±15.06	81.61±5.19
	6+4	70.00±11.15	70.00±6.74	75.38±18.42	80.77±9.07	82.31±14.97	86.15±7.94
	10+6	75.71±9.64	71.43±9.52	77.14±26.26	85.71±11.66	81.43±17.88	90.00±13.55
	11+8	75.00±16.67	75.00±20.41	80.00±22.97	85.00±21.08	82.50±20.58	92.5±12.08
Banknote	19+15	95.72±1.54	92.03±6.55	97.53±0.51	96.34±0.42	97.50±0.61	97.58±0.49
	38+30	95.64±3.56	92.63±8.52	97.56±0.41	96.62±0.21	97.88±0.55	97.10±0.12
	56+48	95.53±2.36	94.07±3.36	97.59±0.46	96.70±0.42	97.77±0.51	95.96±0.18
	76+61	92.08±6.91	95.27±4.32	97.55±0.38	96.43±0.42	97.62±0.35	96.83±0.21
Mushrooms	98+105	98.36±0.83	94.82±2.36	98.99±0.51	98.41±0.32	99.52±0.46	98.44±0.22
	195+210	98.79±2.02	98.42±0.20	99.75±0.31	98.45±0.25	99.81±0.23	98.51±0.08
	292+315	99.80±0.16	98.51±0.04	99.86±0.23	98.49±0.25	99.95±0.07	98.49±0.13
	390+420	99.41±0.45	98.52±0.07	99.91±0.10	98.40±0.25	99.98±0.05	98.23±0.06
Dbworld	2+2	75.83±13.84	74.00±6.05	88.33±2.36	77.50±9.27	91.67±2.36	92.24±0.91
	5+5	83.89±6.88	80.74±5.40	91.30±1.96	83.33±5.86	92.96±2.59	95.00±0.89
	8+8	88.13±5.01	85.42±4.71	90.21±1.98	88.13±3.11	93.13±1.98	95.40±1.65
Dlbcl	2+2	49.63±5.91	51.30±4.19	54.63±8.52	51.85±7.25	58.33±19.58	71.67±1.52
	3+3	50.77±7.26	53.27±6.48	53.27±6.48	54.04±6.75	67.69±13.83	68.27±2.90
	4+4	51.00±6.13	54.20±6.49	55.20±7.32	52.20±6.70	70.40±11.65	74.40±2.80
	5+5	53.96±2.68	55.21±2.99	53.33±5.99	52.08±7.28	60.83±24.92	70.62±3.47

Table 4

Averaged percentages of testing accuracy on 7 tensor-based datasets.

Datasets	Num	SVM	STM	ν-TSVM	ν-TSTM	Rν-TSVM	RνTSTM
Letters	10+10	95.77±1.40	93.65±1.28	95.13±2.93	94.92±0.94	97.06±2.05	96.90±0.73
	30+30	96.27±1.49	94.32±2.45	97.96±1.24	97.4±0.36	98.82±1.08	97.62±0.77
	50+50	96.23±2.25	95.21±1.23	98.47±1.04	97.72±0.42	99.40±0.39	98.15±0.14
	70+70	95.72±3.27	95.36±4.02	98.88±0.48	97.85±0.41	99.50±0.34	98.16±0.09
Libras	2+2	65.68±10.30	66.14±4.35	71.14±17.74	74.55±2.09	78.86±10.05	79.77±4.73
	4+4	74.75±9.46	68.00±12.57	78.25±20.24	78.00±3.07	83.50±5.55	85.25±2.49
	6+6	78.61±6.68	74.44±8.36	84.44±8.09	83.61±3.06	88.06±6.29	89.44±1.76
	8+8	82.19±13.01	81.56±2.74	90.00±1.98	90.94±2.31	90.63±4.42	91.88±3.67
Robot	4+5	59.21±5.44	66.58±8.78	66.32±35.04	74.74±4.33	78.95±29.38	85.56±2.19
	6+8	61.52±8.34	69.09±9.56	76.67±28.21	80.91±4.53	78.48±27.23	86.36±3.85
	8+10	62.07±7.80	73.45±10.03	81.03±17.68	81.72±4.00	83.45±24.37	88.97±3.91
	10+13	63.33±6.75	72.50±9.86	82.92±18.05	84.17±5.83	83.75±20.64	89.17±4.03
Hand	4+4	98.06±2.64	98.33±1.43	98.89±1.94	95.62±1.61	100.00±0.00	96.56±0.99
	6+6	99.69±0.99	98.13±1.61	99.38±1.98	98.13±2.19	100.00±0.00	97.14±1.51
	8+8	99.64±1.13	99.29±1.51	100.00±0.00	99.29±1.51	100.00±0.00	99.58±1.32
	10+10	100.00±0.00	100.00±0.00	100.00±0.00	99.58±1.32	100.00±0.00	100.00±0.00
ORL	2+2	98.75±2.64	98.13±3.02	98.13±3.02	98.75±2.64	98.13±3.02	100.00±0.00
	5+5	100.00±0.00	100.00±0.00	100.00.00±0	99.00±3.16	100.00.00±0	100.00±0.00
	8+8	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00
Eyes	2+2	96.97±8.28	97.37±8.32	100.00±0.00	92.11±5.48	94.74±11.10	100.00±0.00
	6+6	100.00±0	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00
	10+10	100.00±0	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00
Yale	2+2	90.00±6.83	93.33±8.20	97.78±2.87	93.33±2.34	97.78±2.87	97.78±2.87
	5+5	92.50±6.15	97.50±4.03	98.33±3.51	98.33±3.51	100.00±0.00	100.00±0.00
	8+8	93.33±11.65	98.33±5.27	100.00±0.00	100.00±0.00	100.00±0.00	100.00±0.00

Table 5

The averaged Gmeans of six algorithms on 10 vector-based datasets.

Datasets	Num	SVM	STM	ν-TSVM	ν-TSTM	Rν-TSVM	RνTSTM
Iris	2+3	100.00	99.79	99.85	100.00	99.95	100.00
	3+5	100.00	100.00	99.90	100.00	100.00	100.00
	4+8	100.00	100.00	100.00	100.00	100.00	100.00
	5+10	100.00	100.00	100.00	100.00	100.00	100.00
Heart	4+3	60.67	63.87	61.52	77.60	70.03	81.02
	8+6	70.34	71.56	68.36	77.70	80.78	83.05
	12+9	68.22	70.16	74.53	78.43	81.16	82.39
	15+12	75.58	76.41	71.59	77.31	73.04	76.04
Wine	3+4	87.21	87.61	91.80	89.85	96.86	94.20
	4+5	87.64	87.82	89.00	93.09	97.15	93.68
	5+6	87.44	88.16	93.83	93.28	97.25	97.46
	6+7	88.26	88.79	94.22	96.20	96.83	97.88
Australian	10+8	66.91	69.57	77.01	81.01	80.34	85.49
	20+15	74.47	76.61	85.47	86.00	87.28	85.59
	28+23	79.17	81.11	86.88	83.58	88.67	85.97
	38+30	80.37	82.62	87.66	81.92	90.76	85.81
Vote	4+6	91.55	90.09	96.58	89.68	97.88	94.34
	8+12	92.64	92.40	97.45	93.09	98.37	94.78
	12+20	93.28	92.43	97.33	95.05	98.47	95.00
	16+26	93.61	92.87	97.36	94.74	99.10	95.23
Lung	3+2	70.99	66.11	76.26	79.54	83.64	82.07
	6+4	69.25	64.81	77.11	81.57	86.63	87.05
	10+6	75.39	70.71	80.95	90.51	88.05	90.99
	11+8	75.00	72.28	82.73	86.60	88.72	94.99
Banknote	19+15	96.02	92.55	97.36	96.12	97.31	97.45
	38+30	95.97	92.41	97.38	96.40	97.71	96.89
	56+48	95.90	94.46	97.40	96.47	97.58	95.73
	76+61	92.18	95.53	97.35	96.21	97.43	96.62
Mushrooms	98+105	98.32	94.82	99.00	98.48	99.54	98.51
	195+210	98.74	98.36	99.74	98.52	99.82	98.54
	292+315	99.80	98.44	99.86	98.53	99.95	98.57
	390+420	99.39	98.45	99.91	98.48	99.98	98.32
Dbworld	2+2	74.94	72.83	88.81	81.05	91.93	92.34
	5+5	82.61	79.57	91.25	84.90	93.02	94.82
	8+8	87.70	84.75	90.19	88.02	93.03	95.55
Dlbcl	2+2	46.77	50.80	54.31	51.71	58.57	71.58
	3+3	49.77	52.71	52.47	54.17	72.08	67.85
	4+4	51.23	54.40	55.20	51.97	73.29	79.84
	5+5	53.94	55.53	52.17	50.70	60.60	71.65

Table 6

The averaged Gmeans of six algorithms on 7 tensor-based datasets.

Datasets	Num	SVM	STM	ν-TSVM	ν-TSTM	Rν-TSVM	RνTSTM
Letters	10+10	95.78	93.56	95.52	95.04	97.20	96.98
	30+30	96.27	94.32	98.02	97.42	98.84	97.66
	50+50	96.23	95.20	98.50	97.72	99.40	98.15
	70+70	95.71	95.34	98.89	97.87	99.50	98.17
Libras	2+2	63.43	64.36	75.98	75.20	84.57	78.10
	4+4	73.74	65.71	80.39	78.70	87.00	85.77
	6+6	77.94	74.04	87.88	86.87	90.27	90.84
	8+8	81.69	79.93	91.33	91.43	91.99	92.91
Robot	4+5	56.72	67.04	64.68	74.16	77.78	86.23
	6+8	59.36	69.12	75.95	83.25	77.88	86.17
	8+10	62.13	74.38	83.17	81.46	80.56	88.66
	10+13	63.29	72.57	84.04	85.34	85.09	89.00
Hand	4+4	98.06	98.32	98.97	96.01	100.00	96.74
	6+6	99.69	98.12	99.44	98.25	100.00	97.00
	8+8	99.64	99.28	100.00	99.33	100.00	99.23
	10+10	100.00	100.00	100.00	99.67	100.00	100.00
ORL32	2+2	98.74	98.11	98.32	98.88	98.32	100.00
	5+5	100.00	100.00	100.00	99.16	100.00	100.00
	8+8	100.00	100.00	100.00	100.00	100.00	100.00
Eyes	2+2	96.95	97.33	100.00	93.34	96.49	100.00
	6+6	100.00	100.00	100.00	100.00	100.00	100.00
	10+10	100.00	100.00	100.00	100.00	100.00	100.00
Yale	2+2	89.44	93.33	97.98	94.00	97.98	97.98
	5+5	92.20	97.50	98.56	98.56	100.00	100.00
	8+8	93.09	98.32	100.00	100.00	100.00	100.00

Fig. 2

(a) Test accuracy and (b) Time costing of six algorithms on Iris dataset with respect to training sample sizes, respectively.

Fig. 3

(a) Test accuracy and (b) Time costing of six algorithms on Heart dataset with respect to training sample sizes, respectively.

Fig. 4

(a) Test accuracy and (b) Time costing of six algorithms on Wine dataset with respect to training sample sizes, respectively.

Fig. 5

(a) Test accuracy and (b) Time costing of six algorithms on Australian dataset with respect to training sample sizes, respectively.

Fig. 6

(a) Test accuracy and (b) Time costing of six algorithms on Vote dataset with respect to training sample sizes, respectively.

Fig. 7

(a) Test accuracy and (b) Time costing of six algorithms on Lung dataset with respect to training sample sizes, respectively.

Fig. 8

(a) Test accuracy and (b) Time costing of six algorithms on Banknote dataset with respect to training sample sizes, respectively.

Fig. 9

(a) Test accuracy and (b) Time costing of six algorithms on Mushrooms dataset with respect to training sample sizes, respectively.

Fig. 10

(a) Test accuracy and (b) Time costing of six algorithms on Dbworld dataset with respect to training sample sizes, respectively.

Fig. 11

(a) Test accuracy and (b) Time costing of six algorithms on Dlbcl dataset with respect to training sample sizes, respectively.

Fig. 12

(a) Test accuracy and (b) Time costing of six algorithms on Letters dataset with respect to training sample sizes, respectively.

Fig. 13

(a) Test accuracy and (b) Time costing of six algorithms on Libras dataset with respect to training sample sizes, respectively.

Fig. 14

(a) Test accuracy and (b) Time costing of six algorithms on Robot dataset with respect to training sample sizes, respectively.

Fig. 15

(a) Test accuracy and (b) Time costing of six algorithms on Hand dataset with respect to training sample sizes, respectively.

Fig. 16

(a) Test accuracy and (b) Time costing of six algorithms on ORL dataset with respect to training sample sizes, respectively.

Fig. 17

(a) Test accuracy and (b) Time costing of six algorithms on Eyes dataset with respect to training sample sizes, respectively.

Fig. 18

(a) Test accuracy and (b) Time costing of six algorithms on Yale dataset with respect to training sample sizes, respectively.

To avoid unnecessary repetition, we give an overall comparison on these datasets. From these results, we have the following observations:

From the perspective of testing accuracy, our Rν-TSTM is obviously higher than the other five algorithms in most cases. It is also clear that tensor-based algorithms have a better performance compared with vector-based algorithms in most cases, i.e. STM is better than SVM; ν-TSTM is better than ν-TSVM; Rν-TSTM is better than Rν-TSVM. From Figs. 2-18(a), we get that as the training samples increase, the testing accuracies of the six algorithms arise. Our proposed Rν-TSTM performs best, followed by Rν-TSVM, ν-TSTM, ν-TSVM, STM and SVM in most cases. However, the proposed Rν-TSTM performs worse than Rν-TSVM on datasets Banknote, Mushrooms, and Letters (seen Fig. 8, Fig. 9 and Fig. 12) which have larger training samples compared with the number of attributes, which indicate that tensor-based algorithms are not suitable for this kind of datasets.

It is clear that tensor-based algorithms STM, ν-TSTM and Rν-TSTM take more time than vector-based algorithms SVM, ν-TSVM and Rν-TSVM in most cases. While in the 3 tensor-based algorithms, our Rν-TSTM and ν-TSTM spend less time compared to STM in the majority of cases. It is worthwhile to find that on the datasets with larger dimension (attribute), tensor-based algorithms Rν-TSTM and ν-TSTM take less time compared with vector-based algorithms Rν-TSVM and ν-TSVM, such as on the datsets Dbworld (Fig. 10(b)), Dlbcl(Fig. 11(b)), ORL(Fig. 16(b)), Eyes(Fig. 18(b)) and Yale(Fig. 19(b)). Especially in Yale dataset, the dimension is 10,000 for vector-based algorithm, the solving speed of tensor-based algorithms becomes faster comparatively. The proposed Rν-TSTM takes the least running time except SVM. Besides, the proposed Rν-TSTM yields the best accuracy and Gmeans on these 5 datasets. That means our Rν-TSTM can deal with high dimensional and S3 problems effectively and efficiently.

The Gmeans of Rν-TSTM is higher than Rν-TSVM under the same training number in most comparisons. We can get similar results when compare ν-TSTM with ν-TSVM, compare STM with SVM. The proposed Rν-TSTM scores best among the six compared algorithms in most cases, which also indicates our Rν-TSTM can deal with overfitting problem to a great extent.

Generally speaking, with the increase of training number, the testing accuracy, running time and Gmeans will increase in most cases. The proposed Rν-TSTM makes full use of structural information of data. Its computational cost is basically less than that of STM. It has evident advantages on dealing high dimensional and S3 problems. It has acceptable or better classification accuracy compared to SVM, STM, ν-TSVM, ν-TSTM and Rν-TSVM. It can overcome the overfitting problem to a great extent.

5 Conclusions

In this paper, we propose a novel tensor based algorithm termed as Rough margin-based ν-Twin Support Tensor Machine. The proposed Rν-TSTM uses tensor data as input data, and gives different penalties according to the samples’ positions. Compared with vector-based algorithms, Rν-TSTM utilizes the data structural information sufficiently, which is crucial and beneficial to get better classification performance. It avoids overfitting problem to a great extent and especially suits for high dimensional and small sample size problems. As respected, the computational experiments on 17 datasets testified the mentioned advantages.

Footnotes

Acknowledgment

This work was supported in part by the Fundamental Research Funds for the Central Universities (No. BLX201928) and National Natural Science Foundation of China (No. 11671010).

References

Wang

and Hill

D.J.

, Deterministic learning and rapid dynamical pattern recognition, IEEE Transactions on Neural Networks 18(3) (2007), 617–630. DOI: 10.1109/TNN.2006.889496

Ozawa

, Roy

and Roussinov

, A multitask learning model for online pattern recognition, IEEE Transactions on Neural Networks 20(3) (2009), 430–445. DOI: 10.1109/TNN.2008.2007961

, Shen

, Zhang

, Yuan

and Yang

, Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning, IEEE Transactions on Geoence & Remote Sensing 52(11) (2014), 7086–7098. DOI: 10.1109/TGRS.2014.2307354

Vapnik

V.N.

, The nature of statistical learning theory, Springer, Berlin, 1995.

Jayadeva , Khemchandani

and Chandra

, Twin support vector machines for pattern classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007), 905–910. DOI: 10.1109/TPAMI.2007.1068

Peng

X.J.

, A ν-twin support vector machine (ν-TSVM) classifier and its geometric algorithms, Information Sciences 180(20) (2010), 3863–3875. DOI: 10.1016/j.ins.2010.06.039

Kumar

M.A.

and Gopal

, Least squares twin support vector machines for pattern classification, Expert Systems with Applications 36(4) (2009), 7535–7543. DOI: 10.1016/j.eswa.2008.09.066

, Wang

and Zhong

, A rough margin-based ν-twin support vector machine, Neural Computing and Applications 21(6) (2012), 1307–1317. DOI: 10.1007/s00521-011-0565-y

Z.Q.

, Tian

Y.J.

and Shi

, Robust twin support vector machine for pattern classification, Pattern Recognition 46(1) (2013), 305–316. DOI: 10.1016/j.patcog.2012.06.019

10.

Wang

and Zhou

, An improved rough margin-based ν-twin bounded support vector machine, Knowledge-Based Systems 128 (2017), 125–138. DOI: 10.1016/j.knosys.2017.05.004

11.

Rastogi

, Saigal

and Chandra

, Angle-based twin parametric-margin support vector machine for pattern classification, Knowledge-Based Systems 139 (2018), 64–77. DOI: 10.1016/j.knosys.2017.10.008

12.

Lopez

, Maldonado

and Carrasco

, Robust nonparallel support vector machines via second-order cone programming, Neurocomputing 364 (2019), 227–238. DOI: 10.1016/j.neucom.2019.07.072

13.

Mello

A.R.

, Stemmer

M.R.

and Koerich

A.L.

, Incremental and decremental fuzzy bounded twin support vector machine, Information Sciences 526 (2020), 20–38. DOI: 10.1016/j.ins.2020.03.038

14.

Kolda

T.G.

and Bader

B.W.

, Tensor Decompositions and Applications, SIAM Review 51(3) (2009), 455–500. DOI: 10.1137/07070111X

15.

Etemad

and Chellappa

, Discriminant analysis for recognition of human face images, Journal of the Optical Society of America A: Optics Image Science and Vision 14(8) (1997), 1724–1733. DOI: 10.1007/BFb0015988

16.

Green

R.D.

and Guan

, Quantifying and recognizing human movement patterns from monocular video images-part II: applications to biometrics, IEEE Transactions on Circuits & Systems for Video Technology 14(2) (2004), 191–198. DOI: 10.1109/TCSVT.2003.821977

17.

Tao

, Li

, Wu

and Maybank

S.J.

, Supervised tensor learning, Knowledge and Information Systems 13(1) (2005), 450–457. DOI: 10.1007/s10115-006-0050-6

18.

Cai

, He

, Wen

, Han

and Ma

, Support tensor machines for text categorization, Computer Science Department, 2006 Technical report. UIUCDCS-R-200-(2714).

19.

Cai

, Hei

and Han

, Learning with tensor representation, Department of Computer Science, University of Illinoisat Urbana-Champaign, (2006) UIUCDCS-R-2006-2716.

20.

Zhang

, Gao

and Wang

, Twin support tensor machines for MCS detection, Journal of Electronics(China) 26(3) (2009), 318–325. DOI: 10.1007/s11767-007-0211-0

21.

Gao

X.Z.

, Fan

and Xu

, NLS-TSTM: A novel and fast nonlinear image classification method, Wseas Transactions on Mathematics 13 (2014), 626–635.

22.

Shi

, Zhao

, Zhen

and Jing

, Twin bounded support tensor machine for classification, International Journal of Pattern Recognition and Artificial Intelligence 30(1) (2016), 1650002.1–1650002.20. DOI: 10.1142/S0218001416500026

23.

Wang

, Wu

and Zhou

, A ν-twin support tensor machine, International Conference on Information Engineering and Communications Technology, (2016). DOI: 10.12783/dtetr/iect2016/3732

24.

Kotsia

, Guo

W.W.

and Patras

, Higher rank support tensor machines for visual recognition, Pattern Recognition 45(12) (2012), 4192–4203. DOI: 10.1016/j.patcog.2012.04.033

25.

Khemchandani

, Karpatne

and Chandra

, Proximal support tensor machines, International Journal of Machine Learning & Cybernetics 4(6) (2013), 703–712. DOI: 10.1007/s13042-012-0132-6

26.

Jiang

and Yang

, Multiple rank multi-linear twin support matrix classification machine, Journal of Intelligent & Fuzzy Systems 35(5) (2018), 5741–5754. DOI: 10.3233/JIFS-17414

27.

Chen

, Wang

and Zhong

, One-class support tensor machine, Knowledge-Based Systems 96 (2016), 14–28. DOI: 10.1016/j.knosys.2016.01.007

28.

Chen

, Lu

and Zhong

, One-class support higher order tensor machine classifier, Applied Intelligence 47(4) (2017), 1022–1030. DOI: 10.1007/s10489-017-0945-9

29.

, A nonlinear kernel support matrix machine for matrix learning, International Journal of Machine Learning & Cybernetics 10(10) (2019), 2725–2738. DOI: 10.1007/s13042-018-0896-4

30.

Shu

and Yang

, Least square support tensor regression machine based on submatrix of the tensor, Mathmatical Problems in Engineering 2017 (2017), 1–11. DOI: 10.1155/2017/3818949

31.

Biswas

S.K.

and Milanfar

, Linear support tensor machine with LSK channels: Pedestrian detection in thermal infrared images, IEEE Transactions on Image Processing 26(9) (2017), 4229–4242. DOI: 10.1109/TIP.2017.2705426

32.

Liu

and Wang

, A sparse tensor-based classification method of hyperspectral image, Signal Processing 168 (107361) (2020). DOI: 10.1016/j.sigpro.2019.107361

33.

Zhou

, Song

, Hassan

M.M.

and Alamri

, Multilinear rank support tensor machine for crowd density estimation, Engineering Applications of Artifical Intelligence 72 (2018), 382–392. DOI: 10.1016/j.engappai.2018.04.011

34.

, Shao

, Cheng

, Zhao

and Yang

, Support tensor machine with dynamic penalty factors and its application to the fault diagnosis of rotating machinery with unbalanced data, Mechanical Systems and Signal Processing 141(106441) (2020). DOI: 10.1016/j.ymssp.2019.106441