A linear support higher order tensor domain description for one-class classification

Abstract

One-class classification is an important problem encountered in a lot of applications. The datasets extracted from the real-world problems are often represented as tensors. The classical support vector domain description (SVDD) for one-class classification problems cannot work directly since its inputs are vectors. This paper develops a linear tensor-based algorithm named as Linear Support Tensor Domain Description (LSTDD) to find a closed hypersphere with the minimal volume in the tensor space which can contain almost entirely of the target samples. LSTDD can keep data topology and make the parameters need to be estimated less, and it is more suitable for learning the high dimensional and small sample size problem. Firstly, we detail the LSTDD model with 2nd-order tensors, and then extend it to the higher order tensors. It has been shown by experiments on the real-world datasets that LSTDD is a promising method for handling one-class classification problems with both 2nd-order and higher order tensor inputs.

Keywords

One-class classification support vector domain description support tensor machine support tensor domain description

1 Introduction

Support tensor machine (STM) [1, 2], proposed by Cai et al. based on the Supervised Tensor Learning (STL) framework [3, 4], has attracted considerable attention. As an extension of support vector machines (SVMs) [5, 6], its inputs are tensors instead of vectors. STM can overcome over-fitting problem that often happens in the traditional SVMs when they deal with the small sample datasets with high dimensionality. In addition, compared with SVMs, STM needs less parameters. For example, SVMs take a vector $x \in ℝ^{n}$ as input, and a linear classifier in vector space $ℝ^{n}$ can be represented as w^Tx + b, and so there are n + 1 parameters (w_i and b, i = 1, …, n). For STM, its input is a tensor, such as a 2nd-order tensor $Z \in ℝ^{n_{1}} \otimes ℝ^{n_{2}}$ , where n₁ × n₂ ≈ n. And a linear classifier in tensor space $ℝ^{n_{1}} \otimes ℝ^{n_{2}}$ can be represented as μ^TZ ϑ + b where $μ \in ℝ^{n_{1}}$ and $ϑ \in ℝ^{n_{2}}$ , so there are only n₁ + n₂ + 1 parameters (μ_i, ϑ_j and b, i = 1, …, n₁, j = 1, …, n₂).

With the development of tensor theories, STM has gotten the growing attention in recent years. A lot of tensor-based learning algorithms have been proposed, such as C-STM, ν-STM [4], Twin STM [7], Proximal STM [8], Transductive STM [9, 10], higher rank STM [11], a higher-order STM for classification [12], the hyperplane-based one-class STMs [13 –15] and so on.

One-class classification, as an important problem in machine learning and data ming, has wide applications in the areas such as image retrieval, document classification, and fault detection and diagnosis. There are basically three categories of one-class classification solutions: density estimation methods, reconstruction methods, and boundary methods [16]. The effective boundary methods are different from the other two kinds of methods since they find a closed boundary to contain target samples and meanwhile discard outliers without concentrating on the full description of the underlying distribution. Support Vector Domain Description (SVDD) [17] derived from SVM is one of the most classical techniques in this type. It finds the minimum volume enclosing hyperspere to contain almost all the target objects, and has been widely applied [18 –25]. Another research topic focuses on the improvement of algorithm [26 –29]. All of these algorithms are grounded on vector space and cannot directly take a tensor as input.

It is well known that the original representations of objects are tensors in a lot of applications. It is well known that gray image can be considered as a second order tensor, while color image, grayscale video sequence and gait contour sequence can be considered as third order tensors. Generally speaking, vector-based algorithms learn tensor data by transforming them into vectors. But there are two main disadvantages. The first is that transforming tensors into vectors may result in the loss of data structural information. For instance, the data correlation of the adjacent columns in a matrix may be destroyed when the matrix is converted as a vector. The second is that if a tensor is converted as a high dimensional vector, then a substantial number of training samples are required to overcome the over-fitting problem. However, for some tensor data such as video data stream, it is hard to get plenty of samples.

In this paper, we extend SVDD to the tensor-based model, and present a Linear Support Tensor Domain Description (LSTDD) to find a minimal volume hypersphere in the tensor space to contain target samples. We first detail LSTDD model with 2nd-order tensors, and then generalize it to the higher order tensor space. We present the specific iterative algorithm and analyse its computation complexity. We test LSTDD by experiments on both the 2nd-order tensor datasets and the higher order tensor datasets, which are really high dimensional and small sample size datasets. Experimental results confirm that LSTDD is a promising method for handling one-class classification with tensor inputs.

In the following of the paper, we concisely introduce the tensor algebra involved in the paper, STM, and SVDD in Section 2. Then the new method is described in Section 3. After the detail description of the LSTDD with the 2nd-order tensors, we extend it into the higher order tensor space. The flow chart of specific algorithm arises at the end of this section. The experimental results on real-world datasets including vector datasets and tensor datasets are given in Sections 4 and 5, respectively. Finally, we give the conclusions in Section 6.

2 Preliminaries

In this section, we briefly review some relevant concepts of tensors, support tensor machine (STM), and support vector domain description (SVDD).

2.1 Some basic operations of tensors

Simply, a 1st-order tensor is a vector and a 2nd-order tensor is a matrix. Suppose a q-th order tensor $Z \in ℝ^{n_{1} \times n_{2} \times \dots \times n_{q}}$ , and denote an element of Z as Z_{p₁×p₂×…×p_q}, where 1 ≤ p_k ≤ n_k, 1 ≤ k ≤ q.

Definition 1. The tensor product (or outer product) $Z \otimes \tilde{Z}$ of two tensors $Z \in ℝ^{n_{1} \times n_{2} \times \dots \times n_{q}}$ and $\tilde{Z} \in ℝ^{n_{1}^{'} \times n_{2}^{'} \times \dots \times n_{q}^{'}}$ is

$\begin{matrix} (Z \otimes \tilde{Z})_{p_{1} \times p_{2} \times \dots \times p_{q} \times p_{1}^{'} \times p_{2}^{'} \times \dots \times p_{q}^{'}} \\ = Z_{p_{1} \times p_{2} \times \dots \times p_{q}} {\tilde{Z}}_{p_{1}^{'} \times p_{2}^{'} \times \dots \times p_{q}^{'}} \end{matrix}$ (1) for all index values.

When $μ \in ℝ^{n_{1}}$ and $ϑ \in ℝ^{n_{2}}$ are two vectors, the result of their tensor product is a matrix $W \in ℝ^{n_{1} \times n_{2}}$ , where W = μ ⊗ ϑ = μ ϑ^T.

Definition 2. The mode-d product of a tensor $Z \in ℝ^{n_{1} \times n_{2} \times \dots \times n_{q}}$ and a vector $z \in ℝ^{n_{d}} (1 \leq d \leq q)$ is a size of n₁ × n₂ × … × n_d-1 × n_d+1 × … × n_q tensor

$\begin{matrix} (Z \times_{d} z)_{p_{1} \times p_{2} \times \dots \times p_{d - 1} \times p_{d + 1} \times \dots \times p_{q}} \\ = \sum_{p_{d} = 1}^{n_{d}} Z_{p_{1} \times p_{2} \times \dots \times p_{d - 1} \times p_{d} \times p_{d + 1} \times \dots \times p_{q}} z_{p_{d}} \end{matrix}$ (2) for all index values.

Definition 3. A q-th order tensor Z is a rank-1 tensor if it is the tensor product of q vectors $z_{i} \in ℝ^{n_{i}} (1 \leq i \leq q)$ , where $Z = z_{1} \otimes z_{2} \dots \otimes z_{q} = \prod_{i = 1}^{q} \otimes z_{i}$ (3)

Definition 4. The Frobenius norm of a tensor $Z \in ℝ^{n_{1} \times n_{2} \times \dots \times n_{q}}$ is $∥ Z ∥ = \sqrt{\sum_{p_{1} = 1}^{n_{1}} \dots \sum_{p_{q} = 1}^{n_{q}} {Z^{2}}_{p_{1} \times p_{2} \times \dots \times p_{q}}}$ (4)

The Frobenius norm of Z gives the size measurement of Z, and its square measures the energy of Z.

2.2 STM

Suppose the training set ${(Z_{i}, y_{i})}_{i = 1}^{l}$ , where $Z_{i} \in ℝ^{n_{1}} \otimes ℝ^{n_{2}}$ is a 2nd-order tensor, y_i ∈ {-1, 1} is the class label of Z_i. STM seeks for the following classifier to separate the two classes with maximum margin $f (Z) = sign (μ^{T} Z ϑ + b), μ \in ℝ^{n_{1}}, ϑ \in ℝ^{n_{2}}$ (5)

And the corresponding mathematical programming is $\begin{matrix} \min_{μ \in ℝ^{n} 1, ϑ \in ℝ^{n} 2, b \in ℝ, η \in ℝ^{l}} \frac{1}{2} {| | μ ϑ^{T} | |}^{2} + C \sum_{i = 1}^{l} η_{i} \\ s .t . y_{i} (μ^{T} Z_{i} ϑ + b) \geq 1 - η_{i} \\ η_{i} \geq 0, i = 1, \dots, l \end{matrix}$ (6) $η_{i} \geq 0, i = 1, \dots, l$

Cai et al. [1] presented an iterative method to effectively solve (2-6). Firstly, fix μ and let λ₁ = μ ², z_i = Z_i^Tμ. Equations (2-6) becomes to the classical support vector machine with variable ϑ: $\begin{matrix} \min_{ϑ \in ℝ^{n} 2, b \in ℝ, η \in ℝ^{l}} \frac{1}{2} λ_{1} {| | ϑ | |}^{2} + C \sum_{i = 1}^{l} η_{i} \\ s .t . y_{i} (ϑ^{T} z_{i} + b) \geq 1 - η_{i} \\ η_{i} \geq 0, i = 1, \dots, l \end{matrix}$ (7) Equations (2-7) can be easily solved. While ϑ is calculated, let λ₂ = ϑ ² and ${\tilde{z}}_{i} = Z_{i} ϑ$ , optimization problem (2-6) becomes to the standard support vector machine of μ: $\begin{array}{l} \min_{τ} \sum_{i, j = 1}^{l} τ_{i} τ_{j} X_{i}^{T} X_{j} - \sum_{i = 1}^{l} τ_{i} X_{i}^{T} X_{i} \\ s .t . \sum_{i = 1}^{l} τ_{i} = 1 \\ 0 \leq τ_{i} K, \frac{1}{v l}, i = 1, \dots, l \end{array}$ (8) $η_{i} \geq 0, i = 1, \dots, l$

Then one can get μ and ϑ by solving two standard SVMs (2-7) and (2-8) iteratively.

2.3 SVDD

Suppose training data $x_{i} \in ℝ^{n}, i = 1, \dots, l$ , and c, r the center and radius of the hypersphere, respectively. SVDD minimizes the volume of hypersphere which contains almost all the target data. It solves the following optimization problem: $\begin{matrix} min_{r, c, η} & r^{2} + \frac{1}{ν l} \sum_{i = 1}^{l} η_{i} \\ s . t . & ∥ {x_{i} - c ∥}^{2} \leq r^{2} + η_{i} \end{matrix}$ (9) $η_{i} \geq 0, i = 1, \dots, l$ where ν > 0 is a trade-off parameter. The dual of optimization problem (2-9) is $\begin{matrix} min_{τ} & \sum_{i, j = 1}^{l} τ_{i} τ_{j} {x_{i}}^{T} x_{j} - \sum_{i = 1}^{l} τ_{i} {x_{i}}^{T} x_{i} \\ s . t . & \sum_{i = 1}^{l} τ_{i = 1} \end{matrix}$ (10) $0 \leq τ_{i} \leq \frac{1}{ν l}, i = 1, \dots, l$

Then we can get $c = \sum_{i = 1}^{l} τ_{i} x_{i}$ , and the decision function

$\begin{matrix} f (x) & = & sign (r^{2} - \sum_{i, j = 1}^{l} τ_{i} τ_{j} {x_{i}}^{T} x_{j} \\ + 2 \sum_{i = 1}^{l} τ_{i} x^{T} x_{i} - x^{T} x) \end{matrix}$ (11) where r² can be calculated by a support vector x_i with $0 < τ_{i} < \frac{1}{ν l}$ .

3 Linear support tensor domain description

In this section, we firstly present LSTDD model with 2nd-order tensors, then extend it to the higher order tensor space, and finally give the specific iterative algorithm.

3.1 LSTDD with 2nd-order tensors

Suppose the training set ${Z_{i}}_{i = 1}^{l}, Z_{i} \in ℝ^{n_{1}} \otimes ℝ^{n_{2}}$ , where $ℝ^{n_{1}}$ and $ℝ^{n_{2}}$ are two vector spaces. Denote r as the radius of the hypersphere. Let C be the center of the hypersphere in the 2nd-order tensor space, and it can be expressed as rank-1 tensor μϑ^T $(μ \in ℝ^{n_{1}}, ϑ \in ℝ^{n_{2}})$ . LSTDD solves the following optimization problem: $\begin{matrix} \min_{μ \in ℝ^{n} 1, ϑ \in ℝ^{n} 2, b \in ℝ, η \in ℝ^{l}} r^{2} + \frac{1}{v l} \sum_{i = 1}^{l} η_{i} \\ s .t . {‖ z_{i} - μ ϑ^{T} ‖}^{2} \leq 1 - η_{i} \\ η_{i} \geq 0, i = 1, \dots, l \end{matrix}$ (12) $η_{i} \geq 0, i = 1, \dots, l$

The decision function can be given by the formulation (3-2) $f (Z) = sign (r^{2} - ∥ Z - μ ϑ^{T} ∥^{2})$ (13)

By introducing Lagrange multipliers to optimization problem (3-1), we get the Lagrangian function:

$\begin{array}{l} ℒ (μ, ϑ, η, r, τ, β) = r^{2} + \frac{1}{v l} \sum_{i = 1}^{l} η_{i} \\ + {\sum_{i = 1}^{l} τ_{i} (| | Z_{i} - μ ϑ^{T} | |}^{2} - r^{2} - η_{i}) \\ - \sum_{i = 1}^{l} β_{i} η_{i} \end{array}$ (14) where the Lagrange multipliers τ_i, β_i ≥ 0, i = 1, …, l.

Since

$\begin{array}{l} {| | Z_{i} - μ ϑ^{T} | |}^{2} = trace[(Z_{i} - μ ϑ^{T})^{T} (Z_{i} - μ ϑ^{T})] \\ =trace (Z_{i}^{T} Z_{i}) - 2 trace (ϑ μ^{T} Z_{i}) \\ + (ϑ^{T} ϑ) (μ^{T} μ) \end{array}$ (15) we get

$\begin{array}{l} ℒ (μ, ϑ, η, r, τ, β) = r^{2} + \frac{1}{v l} \sum_{i = l}^{l} η i - \sum_{i = l}^{l} τ_{i} r^{2} \\ + \sum_{i = l}^{l} τ_{i} {trace(Z}_{i}^{T} Z_{i}) + \sum_{i = l}^{l} τ_{i} (ϑ^{T} ϑ)(μ^{T} μ) \\ - 2 \sum_{i = l}^{l} τ_{i} trace(ϑ μ^{T} Z_{i}) - \sum_{i = l}^{l} τ_{i} η_{i} \\ - \sum_{i = l}^{l} β_{i} η_{i} \end{array}$ (16)

By the Karush-Kuhn-Tucker (KKT) conditions, we have $\frac{\partial L}{\partial η_{i}} = 0 \Rightarrow \frac{1}{ν l} - τ_{i} - β_{i} = 0$ (17) $\frac{\partial L}{\partial r} = 0 \Rightarrow \sum_{i = 1}^{l} τ_{i} = 1$ (18) $\frac{\partial ℒ}{\partial μ} = 0 \Rightarrow μ = \frac{1}{∥ ϑ ∥^{2}} \sum_{i = 1}^{l} τ_{i} Z_{i} ϑ$ (19) $\frac{\partial ℒ}{\partial ϑ} = 0 \Rightarrow ϑ = \frac{1}{∥ μ ∥^{2}} {\sum^{}}_{i = 1}^{l} τ_{i} Z_{i}^{T} μ$ (20)

It can be seen by (3-6) and (3-7) that μ and ϑ are not independent. In the light of the idea proposed by Cai et al. [1], we solve the optimization problem (3-1) iteratively.

Firstly, we fix μ, and denote λ₁ = μ ², $z_{i} = Z_{i}^{T} μ$ . According to (3-9) – (3-6), the Lagrangian function (3-5) becomes to

$\begin{array}{l} ℒ (μ, ϑ, η, r, τ, β) = \sum_{i = 1}^{l} τ_{i} trace(Z_{i}^{T} Z_{i}) \\ + λ_{1} ϑ^{T} ϑ - 2 \sum_{i = 1}^{l} τ_{i} trace(ϑ Z_{i}^{T}) \end{array}$ (21) where $\begin{matrix} trace (ϑ z_{i}^{T}) = z_{i}^{T} ϑ = \frac{1}{λ_{1}} \sum_{j = 1}^{l} τ_{j} z_{i}^{T} z_{j} \\ ϑ^{T} ϑ = \frac{1}{λ_{1}^{2}} \sum_{i, j = 1}^{l} τ_{i} τ_{j} z_{i}^{T} z_{j} \end{matrix}$ (22)

Thus we have

$\begin{matrix} ℒ (μ, ϑ, η, r, τ, β) = \sum_{i = l}^{l} τ_{i} {trace(Z}_{i}^{T} Z_{i}) \\ - \frac{1}{λ_{1}} \sum_{i, j = l}^{l} τ_{i} τ_{j} Z_{i}^{T} Z_{i} \end{matrix}$ (24)

Since the Frobenius norm of a tensor indicates that $trace (Z_{i}^{T} Z_{i}) = ∥ Z_{i} ∥^{2}$ , we can get the dual problem: $\begin{matrix} min_{τ} & \frac{1}{λ_{1}} \sum_{i, j = 1}^{l} τ_{i} τ_{j} z_{i}^{T} z_{j} - \sum_{i = 1}^{l} τ_{i} ∥ Z_{i} ∥^{2} \\ s . t . & \sum_{i = 1}^{l} τ_{i} = 1 \end{matrix}$ (25) $0 \leq τ_{i} \leq \frac{1}{ν l}, i = 1, \dots, l$

It appears that the formulation (3-13) is similar to standard dual problem (2-10) of SVDD. Therefore, we can easily solve it. Solving (3-13) determines the Lagrange multipliers τ_i^*, then we can get $ϑ = \frac{1}{λ_{1}} \sum_{i = 1}^{l} {τ_{i}}^{*} z_{i}$ .

Similarly, let λ₂ = ϑ ² and $\hat{z_{i}} = Z_{i} ϑ$ , then we can calculate μ by solving the dual quadratic programming: $\begin{matrix} min_{\hat{τ}} & \frac{1}{λ_{2}} \sum_{i, j = 1}^{l} \hat{τ_{i}} \hat{τ_{j}} {\hat{z_{i}}}^{T} \hat{z_{j}} - \sum_{i = 1}^{l} \hat{τ_{i}} ∥ Z_{i} ∥^{2} \\ s . t . & \sum_{i = 1}^{l} \hat{τ_{i}} = 1 \end{matrix}$ (26) $0 \leq \hat{τ_{i}} \leq \frac{1}{ν l}, i = 1, \dots, l$

And we get a new $μ = \frac{1}{λ_{2}} \sum_{i = 1}^{l} {\hat{τ_{i}}}^{*} \hat{z_{i}}$ . Thus, μ can be obtained by iteratively solving formulations (3-13) and (3-14).

Finally, we can get the optimal boundary as follows: $f (Z) = sign (r^{2} - ∥ Z - μ ϑ^{T} ∥^{2})$ (27) with

$r^{2} = mean (∥ Z_{i_s t} - μ ϑ^{T} ∥^{2})$ (28) where Z_{i_st} is the support tensor sample corresponding to non-zero τ_i^*, and mean (·) is the averaged distance between support tensors and the center of the hypersphere.

3.2 LSTDD with higher order tensors

The inputs of LSTDD described above are 2nd-order tensors, namely, matrices. However, we often encounter higher order tensors, for example, color image and gait contour sequence represented as the third order tensors. Therefore, it is necessary to extend the LSTDD into the higher order tensor (>2) algorithm.

Denote ${S_{i}}_{i = 1}^{l}$ the training set, and $S_{i} \in ℝ^{n_{1} \times \dots \times n_{k}}$ is a k-th order tensor. In the k-th order tensor space, the center C of the hypersphere is replaced by the rank-1 tensor μ₁ ⊗ ⋯ ⊗ μ_k $(μ_{1} \in ℝ^{n_{1}}, \dots, μ_{k} \in ℝ^{n_{k}})$ . And in the higher order tensor space, LSTDD solves the following optimization problem: $\begin{array}{l} \min_{μ_{1}, \dots, μ_{k}, r, η} r^{2} + \frac{1}{v l} \sum_{i = 1}^{l} η_{i} \\ \begin{matrix} s . t & {| | S_{i} - μ_{1} \otimes \dots \otimes μ_{k} | |}^{2} \geq r^{2} + η_{i} \end{matrix} \\ η_{i} \geq 0, i = 1, \dots, l \end{array}$ (29)

The corresponding decision function is given as follows:

$\begin{matrix} f (S)=sign (r^{2} - {| | S - μ_{1} \otimes \dots \otimes μ_{k} | |}^{2}), \\ μ_{1} \in ℝ^{n 1}, \dots, μ_{k} \in ℝ^{n_{k}} \end{matrix}$ (30)

We use the alternating projection algorithm to optimize the variables μ₁ ⋯, μ_k. Firstly, we fix μ₂ ⋯, μ_k, and let λ₂ = μ₂ ², ⋯, λ_k = μ_k ². According to the mode-d product of tensor and vector, we define $s_{i} = S_{i} \times_{2} μ_{2} \dots \times_{k} μ_{k}$ (31)

Thus, the dual of mathematical programming (3-18) can be derived as follows: $\begin{matrix} min_{τ} & \frac{1}{λ_{2} \dots λ_{k}} \sum_{i, j = 1}^{l} τ_{i} τ_{j} s_{i}^{T} s_{j} - \sum_{i = 1}^{l} τ_{i} ∥ S_{i} ∥^{2} \\ s . t . & \sum_{i = 1}^{l} τ_{i} = 1 \end{matrix}$ (32) $0 \leq τ_{i} \leq \frac{1}{ν l}, i = 1, \dots, l$

Noticing that (3-20) is a convex quadratic programming, we can easily solve it by the classical optimization algorithms. Once μ₁ is obtained, we fix μ₁, μ₃, ⋯, μ_k to calculate μ₂. Thus, all the μ_i can be calculated in such an iterative way.

Table 1

Description of 12 vector datasets

Dataset	#Sample	#Feature	n₁ × n₂	Source	Targ. Class	#Targ. Sample
BREAST-CANCER	683	11	3×4	UCI	2	444
ABALONE	4177	10	3×4	UCI	1	2770
HEART	297	13	4×4	UCI	1	160
HEPATITIS	155	19	4×5	OC	2	123
IMPORTS	159	25	5×5	OC	1	88
IONOSPHERE	351	34	6×6	OC	1	225
SPECTF	349	44	7×7	OC	2	95
DELFTPUMP	240	64	8×8	OC	2	64
USPS	7291	256	16×16	UCI	2	1005
COLON	62	1908	44 × 44	OC	1	40
METAS	145	4919	70 × 71	OC	2	99
LEU	72	7130	84 × 85	UCI	1	47

3.3 The general algorithm

Next, we present the general algorithm of LSTDD.

Input: The training points $S_{i} \in ℝ^{n_{1} \times \dots \times n_{k}} (i = 1, \dots, l)$ , testing point $S \in ℝ^{n_{1} \times \dots \times n_{k}}$ , parameter ν, the maximum iterative number N, and the tolerance θ.

Output: The optimal parameters $μ_{i} \in ℝ^{n_{i}}, i = 1, \dots, k$ , r², and the class label of testing point.

Step 1 Initialization: set μ_i = (1, …, 1) ^T, i = 1, …, k;

Step 2 For n = 1, 2, …, k, run Step 3∼5 to cal-culate μ_n;

Step 3 Calculate $s_{i}^{n} = S_{i} \times_{1} μ_{1} \dots \times_{n - 1} μ_{n - 1} \times_{n + 1} μ_{n + 1} \dots \times_{k} μ_{k}$ ;

Step 4 Obtain τⁿ through solving $\begin{matrix} min_{τ^{n}} & \frac{1}{\prod_{1 \leq i \leq k}^{i \neq n} λ_{i}} \sum_{i, j = 1}^{l} τ_{i}^{n} τ_{j}^{n} {s_{i}^{n}}^{T} s_{j}^{n} - \sum_{i = 1}^{l} τ_{i}^{n} ∥ S_{i} ∥^{2} \\ s . t . & \sum_{i = 1}^{l} τ_{i}^{n} = 1 \end{matrix}$ (33) $0 \leq τ_{i}^{n} \leq \frac{1}{ν l}, i = 1, \dots, l$ with λ_i = μ_i ², i = 1, …, k;

Step 5 Calculate $μ_{n} = \frac{1}{\prod_{1 \leq i \leq k}^{i \neq n} λ_{i}} \sum_{i = 1}^{l} τ_{i}^{n} s_{i}^{n}$ ;

Step 6 Do Step 2∼5 iteratively until convergence: if the iterative number is over N or $∥ {μ_{n}}_{n e w} - {μ_{n}}_{o l d} ∥ \leq θ, n = 1, \dots, k;$ (34)

Step 7 Calculate the label of test point S: $f (S) = sign (r^{2} - ∥ S - μ_{1} \otimes \dots \otimes μ_{k} ∥^{2})$ (35) with r² = mean (∥ S_{i_st} - μ₁ ⊗ ⋯ ⊗ μ_k ²);

Step 8 End.

Suppose l training samples with n features, and the computation complexity of SVDD is O (l²n). The computation complexity of LSTDD in the k-th order tensor space is O (l² (n₁ + ⋯ + n_k)), where n₁ × ⋯ × n_k ≈ n. Clearly, compared with SVDD, LSTDD has lower computation complexity. The training time of LSTDD depends upon the iterative number. In the following experiments, we set the maximum iterative number N = 50 and the tolerance θ = 10^-4.

4 Experiments on vector datasets

4.1 Experiment setup

The experiments are performed on 12 publicly available datasets from UCI repository on LIBSVM webpage [30] and David Tax’s homepage [31]. Table 1 gives the description of these datasets. All the features of these datasets were scaled to [-1, 1]. For one-class classification problem, we concentrate on the target classes. Table 1 gives the target classes and the number of target samples. The approach of transforming a vector $x \in ℝ^{n}$ to a 2nd-order tensor $Z \in ℝ^{n_{1} \times n_{2}}$ is that the first column of tensor is filled by the first n₁ elements of the vector, and the second column of tensor is filled by the next n₁ elements of the vector and so on. If n < n₁ × n₂, then the rest is filled with 0. Following the idea in [1, 13], the proper 2nd-order tensor sizes n₁, n₂ are also listed in Table 1.

As we know, it is difficult to deal with the high dimensional and small sample size classification problems. To test the effectiveness of LSTDD, small training sizes are chosen as in [1 , 13]. In one-class classification problems, the training sets are formed by target samples, and the True Positive Rate (TPR) is used as the performance evaluation criterion. We employ 5-fold cross validation to optimize the parameter. The optimal parameter ν are chosen from {0.1, 0.2, 0.3, 0.4, 0.4, 0.6, 0.7, 0.8, 0.9}. For statistical significance, the averaged results over fifty random splits from target class are reported. The implements of all the algorithms are in MATLAB R2011b on a PC, whose system configuration is Intel Core i3 (2.4 GHz) and 6 GB of RAM.

4.2 Experiment results and analysis

We compare LSTDD with the standard SVDD on all datasets in Table 1. The training set consists of 10 samples, and Table 2 reports the averaged results. As can be seen, the TPRs of LSTDD have been significantly promoted in comparison with SVDD on all 12 datasets. We calculate the averaged value of TPRs of all 12 datasets, and the results are 0.7403 of LSTDD and 0.6222 of SVDD. Specifically, for COLON, METAS, and LEU datasets, they are the really small sample size data with high dimensionality. And LSTDD obtains the far better TPRs than SVDD. It is shown that although the vector datasets can be directly trained by the vector-based algorithm, if they are trained in the form of tensors, the much better performance can be achieved by the tensor-based algorithm.

Table 2
Averaged TPRs on various datasets

Dataset Targ. class LSTDD SVDD

BREAST-CANCER 2 0.6892 0.6135

ABALONE 1 0.5989 0.5691

HEART 1 0.6550 0.5377

HEPATITIS 2 0.6187 0.4665

IMPORTS 1 0.6778 0.5722

IONOSPHERE 1 0.7233 0.6503

SPECTF 2 0.6935 0.5112

DELFTPUMP 2 0.6467 0.3922

USPS 2 0.7877 0.7231

COLON 1 0.9000 0.7250

METAS 2 0.9478 0.8696

LEU 1 0.9455 0.8364

Dataset	Targ. class	LSTDD	SVDD
BREAST-CANCER	2	0.6892	0.6135
ABALONE	1	0.5989	0.5691
HEART	1	0.6550	0.5377
HEPATITIS	2	0.6187	0.4665
IMPORTS	1	0.6778	0.5722
IONOSPHERE	1	0.7233	0.6503
SPECTF	2	0.6935	0.5112
DELFTPUMP	2	0.6467	0.3922
USPS	2	0.7877	0.7231
COLON	1	0.9000	0.7250
METAS	2	0.9478	0.8696
LEU	1	0.9455	0.8364

4.3 Parameter analysis

The performance of LSTDD and SVDD with special reference to parameter ν is discussed in this subsection. In the standard SVDD, parameter ν can control the trade-off between the volume and the errors. Noticing that the training sets are pretty small, it is more meaningful to find the regularity corresponding with ν in the tensor space rather than to validate how ν can be used to control the above trade-off.

In particular, there are three datasets in Table 1: COLON, METAS and LEU. As we can see, these three datasets are really high dimensional and small sample size datasets. It makes sense to convert the samples to the higher order tensors. Thus, based on the idea of [1], the third order tensor sizes of the three datasets are selected as 12 × 13 × 13, 17 × 17 × 18, and 19 × 19 × 20, respectively.

Fig.1

The comparisons of 2nd-order LSTDD, 3rd-order LSTDD and SVDD on averaged TPRs with different values of ν.

We train LSTDD and SVDD with different choices of parameter ν = {0.1, 0.2, ⋯, 0.9}, and evaluate LSTDD with both the 2nd-order tensors and the 3rd-order tensors. We denote them as 2nd-order LSTDD and 3rd-order LSTDD, respectively. To better understand the experimental results, we illustrate the averaged TPRs of the three algorithms for each dataset in Fig. 1. We can see that in all the six target classes, the TPRs of 2nd-order LSTDD are much better than those of SVDD, especially when ν = 0.1. Meanwhile, we can illustrate from Figure 1 that the TPRs of 2nd-order LSTDD and SVDD have significant relevance with parameter ν. The TPRs of 2nd-order LSTDD and SVDD tend to decrease with the increase of parameter ν. Especially for the 2nd-order LSTDD, the tendency of decreasing is more obvious and the best TPR appears when ν = 0.1 in all experiments. We conclude that parameter ν can control the volume of the hypersphere which contains most of target samples in the 2nd-order tensor space.

On the contrary, the TPR of 3rd-order LSTDD has no significant difference with the different values of ν. In addition, it is shown that the performance of 3rd-order LSTDD is worse than that of SVDD and 2nd-order LSTDD. The raise of the tensor order does not make the accuracy of the algorithm improved. As we know, unlike the real tensor-based datasets, there is no structure information retained in the vector-based datasets. Although the vector-based sample can be converted to higher order tensors, 2nd-order tensor is good enough for converting high dimensional and small sample size vector-based datasets to the tensor representation.

Table 3

Averaged TPRs on 40 target classes in ORL dataset

Targ.cls	LSTDD	SVDD	Targ.cls	LSTDD	SVDD	Targ.cls	LSTDD	SVDD
Face01	0.5	0.3	Face15	0.7	0.4	Face29	0.8	0.4
Face02	0.7	0.4	Face16	0.4	0.4	Face30	0.8	0.7
Face03	0.7	0.6	Face17	0.6	0.5	Face31	0.8	0.5
Face04	0.5	0.5	Face18	0.6	0.5	Face32	0.6	0.3
Face05	0.8	0.5	Face19	0.7	0.7	Face33	0.8	0.8
Face06	0.6	0.4	Face20	0.6	0.5	Face34	0.8	0.7
Face07	0.5	0.5	Face21	0.7	0.4	Face35	0.5	0.4
Face08	0.8	0.4	Face22	0.8	0.5	Face36	0.6	0.3
Face09	0.6	0.4	Face23	0.6	0.3	Face37	0.6	0.5
Face10	0.8	0.5	Face24	0.8	0.6	Face38	0.6	0.5
Face11	0.7	0.7	Face25	0.6	0.4	Face39	0.5	0.1
Face12	0.8	0.5	Face26	0.7	0.4	Face40	0.5	0.5
Face13	0.6	0.4	Face27	0.8	0.7
Face14	0.5	0.3	Face28	0.6	0.4

Table 4

Averaged TPRs on 15 target classes in YALE dataset

Targ. cls	LSTDD	SVDD	Targ.cls	LSTDD	SVDD
Face01	0.8667	0.7333	Face09	0.7333	0.6667
Face02	0.7333	0.6667	Face10	0.9333	0.5333
Face03	0.7333	0.8000	Face11	0.7333	0.7333
Face04	0.8667	0.7333	Face12	0.8667	0.8000
Face05	0.8667	0.8667	Face13	0.7333	0.8667
Face06	0.8667	0.8667	Face14	0.8000	0.8667
Face07	0.7333	0.4667	Face15	0.4000	0.5333
Face08	0.8000	0.7333

5 Experiments on tensor datasets

In this section, we focus on real-world tensor datasets: human face datasets and gait silhouette sequence datasets. The human face datasets are the 2nd-order tensor datasets, and gait silhouette sequence datasets are the 3rd-order tensor datasets. The 2nd-order and 3rd-order tensor datasets are the current major datasets for the experiments of tensor-based machine learning algorithms.

5.1 Face recognition

There are two 2nd-order tensor datasets we concern about: the ORL [32] and the YALE datasets [33]. The ORL dataset includes 40 individuals’ face images, and each one has 10 different images. Each image is 28 × 23 with 256 grayscale levels per pixel. The YALE dataset includes fifteen persons’ images with eleven size of 100 × 100 images for each one. All the features have been scaled to [0, 1], and we do not carry out cropping or resizing to reduce the feature number of images.

Each person’s face images are considered as a target class. ORL has 40 target classes, and each target class has 10 samples. YALE has 15 target classes, and each target class has 11 samples. We adopt the 5-fold cross validation for optimizing parameters, and report the averaged results. Tables 3 and 4 have summarized the averaged results of TPRs on all target classes of these two datasets, respectively.

As can be seen in Table 3, the TPRs of LSTDD are much better than those of SVDD on the target classes of the ORL dataset. Specifically, the TPRs of LSTDD are better than those of SVDD in 33 out of 40 comparisons, and equal to each other in the rest 7 comparisons. We calculate the averaged TPR of the 40 experiments, and the results are 0.655 of LSTDD and 0.47 of SVDD.

The performance of the LSTDD is also significant on YALE dataset. We can see from Table 4 that the TPRs of LSTDD are better than those of SVDD in 8 out of 15 comparisons, and in 3 out of 15 comparisons they are equal to each other. In the rest 4 comparisons, the TPRs of LSTDD are not as good as those of SVDD. We calculate the averaged TPRs of the 15 experiments, and the results are 0.7778 of LSTDD and 0.7244 of SVDD.

In brief, the tensor-based classifier LSTDD can greatly promote the identification of the target class in comparison with SVDD.

5.2 Gait recognition

The aim of gait recognition is to identify people by analyzing the gait patterns extracted from video. In this subsection, we consider three gait recognition datasets: USFGait17_ 32 × 22 × 10, USFGait17_ 64 × 44 × 20, and USFGait17_ 128 × 88 × 20 which are represented as third order tensors. They all come from the USF HumanID Gait Challenge dataset version 1.7 [34].

Table 5
Averaged TPRs on 3 gait recognition datasets

Targ.cls USFGait17_32×22×10 USFGait17_64×44×20 USFGait17_128×88×20

LSTDD SVDD LSTDD SVDD LSTDD SVDD

3 0.6467 0.4074 0.6467 0.4074 0.6467 0.4074

9 0.7032 0.4662 0.6337 0.4607 0.6865 0.5023

11 0.6955 0.4389 0.6102 0.4943 0.6117 0.4954

16 0.6334 0.5002 0.6258 0.2919 0.6258 0.2918

20 0.7547 0.5291 0.6880 0.3991 0.7176 0.4086

45 0.5250 0.1902 0.5195 0.1946 0.5195 0.1900

64 0.6586 0.2091 0.4924 0.2091 0.4924 0.2091

66 0.7961 0.2558 0.7498 0.2558 0.7498 0.2975

68 0.7027 0.5349 0.6458 0.5071 0.6458 0.5499

71 0.6315 0.1150 0.6312 0.1982 0.6312 0.1984

Targ.cls	USFGait17_32×22×10	USFGait17_64×44×20	USFGait17_128×88×20
3	0.6467	0.4074	0.6467	0.4074	0.6467	0.4074
9	0.7032	0.4662	0.6337	0.4607	0.6865	0.5023
11	0.6955	0.4389	0.6102	0.4943	0.6117	0.4954
16	0.6334	0.5002	0.6258	0.2919	0.6258	0.2918
20	0.7547	0.5291	0.6880	0.3991	0.7176	0.4086
45	0.5250	0.1902	0.5195	0.1946	0.5195	0.1900
64	0.6586	0.2091	0.4924	0.2091	0.4924	0.2091
66	0.7961	0.2558	0.7498	0.2558	0.7498	0.2975
68	0.7027	0.5349	0.6458	0.5071	0.6458	0.5499
71	0.6315	0.1150	0.6312	0.1982	0.6312	0.1984

There are 71 sequences in each dataset, and each sequence from one person who walks in elliptical paths in front of the camera. In our experiments, for simplicity, we randomly choose 10 sequences from each dataset as target classes since there exist too many target classes. Table 5 summarizes the averaged results of 5-fold cross validation on each dataset, respectively. As we can see in Table 5 that the averaged TPRs of LSTDD have been promoted significantly on each considered target class of all the three gait datasets. We calculate the averaged TPR of 10 experiments, the results are 0.6747, 0.6243 and 0.6327 of LSTDD in comparison with 0.3647, 0.3418 and 0.3550 of SVDD in these three datasets, respectively. We conclude that LSTDD outperforms SVDD on the recognition of the target classes.

6 Conclusions

In this paper, a new data domain description with the tensor input has been proposed, which is termed as Linear Support Tensor Domain Description. We first describe LSTDD based on the second order tensor space, and then extend it into the higher order tensor space. The experiments on both the second order datasets and the higher order tensor datasets show that LSTDD has good generalization capability.

As we know, in a wide spectrum of real world problems, such as handwriting detection, image database retrieval, and face recognition, we often encounter one-class classification problems. The original objects of these fields can be represented as multidimensional arrays, namely tensors. The tensor-based one classification algorithms like LSTDD have potential on the practical applications. In addition, there are some extensions of SVDD, such as Frequency-based SVDD and Write-Related SVDD [35]. Although LSTDD is developed based on SVDD, the idea of constructing the algorithm can provide for reference in establishment of other tensor-based one-class classification algorithms.

However, there is drawback of the proposed method. LSTDD costs more training time due to the iterative manner of solving the optimization problems. Finding the efficient methods to solve the corresponding optimization problem in LSTDD is included in the further research. The proposed LSTDD is a linear algorithm and more research is need to develop the nonlinear algorithm for the higher order tensor.

Footnotes

Acknowledgments

The work is supported by the National Natural Science Foundation of China (Nos. 11171346, 11626186), New Start Academic Research Projects of Beijing Union University No. Zk10201513, and Xi’an Shiyou University Youth Science and Technology Innovation Fund Project No. 2016BS17.

References

Cai

, He

and J.

Han

, Learning with tensor representation, Technical report, UIUCDCS, Department of ComputerScience, University of Illinois at Urbana-Champaign (2006), R2716.

Cai

, He

, Wen

J.R.

, Han

and W.Y.

, Support tensor machines for text categorization, Technical report, UIUCDCS-R, Department of Computer Science, University of Illinois at Urbana-Champaign (2006), 2714.

, Maybank

S.J.

, Hu

, Li

and D.

Tao

, Supervised tensor learning, IEEE International Conference onData Mining, IEEE Computer Society 13 (2005), 450–457.

Tao

, Li

, Wu

and S.J.

Maybank

, Supervised tensor learning, Knowledge and Information Systems 13(1) (2007), 1–42.

Vapnik

, The Nature of Statistical Learning Theory 1995 New York, USA: Springer.

Muller

K.R.

, Mika

, Ratsch

, Tsuda

and B.

Schölkopf

, An introduction to kernel-based learningalgorithms, IEEE Transactions on Neural Networks 12(2) (2001), 181–201.

Zhang

, Gao

and Y.

Wang

, Twin support tensor machines for MCS detection, Journal of Electronics(China) 26(3) (2009), 318–325.

Khemchandani

, Karpatne

and S.

Chandra

, Proximal support tensor machines, International Journal ofMachine Learning and Cybernetics 4 (2013), 703–712.

, Liu

and Y.

Zhuang

, Tensor-based transductive learning for multimodality video antic conceptdetection, IEEE Transactions on Multimedia 11(5) (2009), 868–878.

10.

Liu

, Guo

, He

and X.

Yang

, A low-rank approximation-based transductive support tensor machine forisupervised classification, IEEE Transactions on Image Processing 24(6) (2015), 1825–1838.

11.

Kotsia

, Guo

and I.

Patras

, Higher rank support tensor machines for visual recognition, PatternRecognition 45(12) (2012), 4192–4203.

12.

Hao

, He

, Chen

and X.

Yang

, A linear support higher-order tensor machine for classification, IEEETransactions on Image Processing 22(7) (2013), 2911–2920.

13.

Chen

Y.Y.

, Wang

K.N.

and P.

Zhong

, One-class support tensor machine, Knowledge-Based Systems 96(2016), 14–28.

14.

Chen

Y.Y.

and P.

Zhong

, Linear one-class support tensor machine, International Journal of Signal Processing,Image Processing and Pattern Recognition 9(9) (2016), 379–388.

15.

Chen

Y.Y.

, Lu

L.Y.

and P.

Zhong

, One class support higher order tensor machine classifier, AppliedIntelligence 7 (2017), 1–9.

16.

Khan

S.S.

and M.G.

Madden

, One-class classification: Tomy of study and review of techniques, TheKnowledge Engineering Review 29(3) (2014), 345–374 axon.

17.

Tax

D.M.J.

and R.P.W.

Duin

, Support vector domain description, Machine Learning 54(1) (2004), 45–66.

18.

Banerjee

, Burlina

and C.

Diehl

, A support vector method for anomaly detection in hyperspectral imagery, IEEE Transactions on Geoscience and Remote Sensing 44(8) (2006), 2282–2291.

19.

Park

, Kang

, Kim

, Kwok

J.T.

and I.W.

Tsang

, SVDD-based pattern denoising, Neural Computation 19 (2007), 1919–1938.

20.

H.G.

, Wang

and X.B.

Huang

, Fabric defect detection based on multiple frl features and support vectordata description, Engineering Applications of Artificial Intelligence 22 (2009), 224–235.

21.

Zhao

, Wang

and F.

Xiao

, Pattern recognition-based chillers fault detection method using support vector datadescription (SVDD), Applied Energy 112(4) (2013), 1041–1048.

22.

Lee

S.W.

and J.

Park

, Low resolution face recognition based on support vector data description, PatternRecognition 39(9) (2006), 1809–1812.

23.

Seo

and H.

, Face detection using support vector domain description in color images, IEEE InternationalConference on Acoustics, Speech, and Signal Processing 5 (2007), 729–732.

24.

Lai

, Tax

D.M.J.

, Duin

R.P.W.

, Pekalska

and P.

Paclik

, A study on combining image representations for imageclassification and retrieval, International Journal of Pattern Recognition and Artificial Intelligence 18(5) (2004), 867–890.

25.

Sjostrand

, Hansen

M.S.

, Larsson

H.B.

and R.

Larsen

, A path algorithm for the support vector domain descriptionand its application to medical imaging, Medical Image Analysis 11 (2007), 417–428.

26.

Liu

, Xiao

, Cao

, Hao

and F.

Deng

, SVDD-based outlier detection on uncertain data, Knowledge andInformation Systems 34(3) (2013), 597–618.

27.

Huang

, Chen

, Zhou

, Yin

and K.

Guo

, Two-class support vector data description, PatternRecognition 44 (2011), 320–329.

28.

Zhu

and P.

Zhong

, A new one-class SVM based on hiddeninformation, Knowledge-Based Systems 60 (2014), 35–43.

29.

Zhu

and P.

Zhong

, Minimum Class Variance SVM+ for data classification, Advances in Data Analysis andClassification 11 (2017), 79–96.

30.

Chang

and Lin

, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systemsand Technology 2(27) (2011), 1–27.

31.

Tax

D.M.J.

, Pattern Recognition Laboratory, http://prlab.tudelft.nl/users/david-tax.

32.

AT&T Labs Cambridge, The Olivetti and Oracle Research Laboratory database of faces http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html, (1994).

33.

http://vision.ucsd.edu/content/yale-face-database.

34.

, Plataniotis

K.N.

and A.N.

Venetsanopoulos

, MPCA: Multilinear principal component analysis of tensorobjects, IEEE Transactions on Neural Networks 19(1) (2008), 18–39.

35.

Luo

, Ding

, Pan

, Ni

and G.

, Research on cost-sensitive learning in one-class anomaly detectionalgorithms, Springer, In Autonomic and Trusted Computing, Lecture Notes in Computer Science 4610 (2007), 259–268.