Dimensionality reduction of tensor data based on local linear embedding and mode product

Abstract

There are three contributions in this paper. (1) A tensor version of LLE (short for Local Linear Embedding algorithm) is deduced and presented. LLE is the most famous manifold learning algorithm. Since its proposal, various improvements to LLE have kept emerging without interruption. However, all these achievements are only suitable for vector data, not tensor data. The proposed tensor LLE can also be used a bridge for various improvements to LLE to transfer from vector data to tensor data. (2) A framework of tensor dimensionality reduction based on tensor mode product is proposed, in which the mode matrices can be determined according to specific criteria. (3) A novel dimensionality reduction algorithm for tensor data based on LLE and mode product (LLEMP-TDR) is proposed, in which LLE is used as a criterion to determine the mode matrices. Benefiting from local LLE and global mode product, the proposed LLEMP-TDR can preserve both local and global features of high-dimensional tenser data during dimensionality reduction. The experimental results on data clustering and classification tasks demonstrate that our method performs better than 5 other related algorithms published recently in top academic journals.

Keywords

Tensor decompostion local linear embedding mode product dimensionality reduction

1 Introduction

In the current era of big data, high-dimensional data learning has become a serious “dimensional disaster” for machine learning. For example, text analysis and feature extraction are based on high-dimensional data. However, most of the current algorithms deal with high-dimensional data by converting them to vectors, and the algorithms of vector data are further used to process these converted data. This method makes people want to describe things from various aspects in vain. Therefore, this paper proposes a dimensionality reduction method that can consider the different features of each dimension of multidimensional data.

Tensor algebra is a powerful mathematical tool. Tensor models can handle multi-dimensional data with index sets. Therefore, when describing actual problems, compared with the matrix model, it is closer to the properties of the actual problems, so it can better describe the actual problems. However, most current machine learning algorithms do not target tensor, so researching tensor is an urgent research task. In the past, there were some widely used tensor decomposition models, such as Canonical Polyadic (CP) [1] and Tucker decomposition [2, 3]. According to the actual application needs, by adding different constraints, these two basic forms can be derived from a variety of tensor decomposition forms, namely non-negative tensor factorization applied to hyperspectral separation [4, 5] and optimization algorithm based on non-negative tensor decomposition [6], non-negative Tucker factorization [7, 8], orthonormal Tucker factorization, higher-order singular value decomposition (HOSVD) [9], etc. Recently, the method of tensor decomposition has been greatly developed, and many methods have emerged. Sun et al. [10] used the advantages of orthogonal and non-negative constraints to propose a Tucker decomposition form based on polynomial manifold tensor clustering. Li et al. [11] introduced the manifold regularization non-negative Tucker decomposition (MR-NTD) in 2017, and merged the manifold regularization into the kernel tensor to maintain the nearest neighbor relationship of the data points. In order to achieve a more robust and regular effect, Jiang et al. [12] proposed an algorithm called graph-Laplacian tensor decomposition (GLTD) for short, which simultaneously considered the attributes of the graph and the pairwise similar information.

Local linear embedding (LLE), as a classic manifold learning algorithm, is used to maintain the local linear structure of data. It has been a hot research topic. In recent years, improved algorithms based on LLE have continuously emerged and are widely used in various fields such as data mining, image processing, pattern recognition, and error detection. The work very close to this paper, such as locality perserving projections algorithm for hyperspectral image dimensionality reduction (LPP-DR) proposed by Wang et al. [13], an image representation method using Laplacian regularized non-negative tensor factorization (LRNTF) proposed by Wang et al. [14] etc, are used in vector data. This paper proposes an improved algorithm for LLE in the case of tensor, that is, combining LLE with tensor decomposition to reduce dimensionality.

This paper combines the classic manifold learning algorithm of local linear embedding to maintain the local linear structure of the data. Moreover, based on the knowledge of tensor decomposition, we seek a set of matrices that form a projection subspace, so the mode product of the high-dimensional tensor and the projection matrix reduces the tensor’s dimensions globally.In other words, this paper develops a reduced dimensionality method for Tensor data based on local linear embedding and mode product. The algorithm has three contributions, as follows:

(1) Local Linear Embedding (LLE) is the most famous manifold learning algorithm for dimensionality reduction. Since LLE was proposed, various improvements to LLE have been continuous. However, they are all applied to vector data and cannot be directly applied to tensor data. Therefore, this paper develops the tensor version of LLE, which also provides a bridge for various improvements to LLE to transfer from vector data to tensor data.

(2) Inspired by the mode product of tensors and matrices that can change the size of any one-dimensional tensor, this paper proposes a dimensionality reduction scheme for tensor data, where the dimension-reduced tensor is the mode product of the high-dimensional tensor and different matrices. As long as the sizes of these matrices meet the requirements of dimensionality reduction, they can be used to carry out the required dimensionality reduction. The elements of matrices are determined according to specific criteria.

(3) Finally, a new dimensionality reduction algorithm for tensor data combining LLE and mode product is proposed, called LLEMP-TDR for short, in which LLE is used as a criterion to determine the elements of matrices in the mode product. So far, it seems that no similar algorithm has been reported. Since LLE is a local preserving algorithm while mode product is a global preserving algorithm, LLEMP-TDR can maintain both local and global features during the dimensionality reduction process.

The structure of the rest of this paper is as follows: In Section 2, we will introduce some preliminary knowledge. The latest works related to ours are introduced in Section 3. Section 4 mainly includes the derivation, solution, complexity analysis of LLEMP-TDR and comparison with contrast algorithms. In Section 5, we use MATLAB for implementation and verify the performance of LLEMP-TDR through sufficient experiments. Finally, the conclusion of this paper is presented in Section 6.

2 Preliminaries

2.1 Symbols

In this paper, we use lowercase letter x as a scalar, bold lowercase letter x as a vector, bold uppercase letter X as a matrix, and handwritten letter $X$ as a higher-order tensor. The symbols of element division and Hadamard product (element multiplication, that is, the corresponding elements of two matrices of the same size are multiplied) are set to / and .* respectively. The matrix’s transposition, trace, inverse and pseudo-inverse are denoted as X ^T, tr ( X ), X ^-1, and X ^† respectively. The basic operators are shown in Table 1.

Table 1
Basic operators

Symbols Description

$X$ , X , x , x Tensor, Matrix, Vector, Scalar

I_n, n = 1, ⋯ , N - 1 The size of original tensor

J_n, n = 1, ⋯ , N - 1 The size of core tensor

M, N The number of sample, The order of tensor

$X^{(n)} \in R^{I_{n} \times (I_{1} I_{2} \dots I_{n - 1} I_{n + 1} \dots I_{N - 1} \times M)}$ Mode-n unfolding of tensor $X \in R^{I_{1} \times I_{2} \dots I_{N - 1} \times M}$

$X^{(N)} \in R^{M \times (I_{1} I_{2} \dots I_{N - 1})}$ Mode-N unfolding of tensor $X \in R^{I_{1} \times I_{2} \dots I_{N - 1} \times M}$

$G \in R^{J_{1} \times J_{2} \dots J_{N - 1} \times M}$ Core Tensor

A_n ∈ R^I_n×J_n, n = 1, ⋯ , N - 1 Projection Matrices

×_n Mode-n product of a tensor and a matrix

⊗ Kronecker Product

.* Hadamard Product (pointwise product)

〈A, B〉 Inner product of A and B

X ^T, X ^-1, X ^† Transpose, Inverse and Moore-Penrose pseudo-inverses

Symbols	Description
$X$ , X , x , x	Tensor, Matrix, Vector, Scalar
I_n, n = 1, ⋯ , N - 1	The size of original tensor
J_n, n = 1, ⋯ , N - 1	The size of core tensor
M, N	The number of sample, The order of tensor
$X^{(n)} \in R^{I_{n} \times (I_{1} I_{2} \dots I_{n - 1} I_{n + 1} \dots I_{N - 1} \times M)}$	Mode-n unfolding of tensor $X \in R^{I_{1} \times I_{2} \dots I_{N - 1} \times M}$
$X^{(N)} \in R^{M \times (I_{1} I_{2} \dots I_{N - 1})}$	Mode-N unfolding of tensor $X \in R^{I_{1} \times I_{2} \dots I_{N - 1} \times M}$
$G \in R^{J_{1} \times J_{2} \dots J_{N - 1} \times M}$	Core Tensor
A_n ∈ R^I_n×J_n, n = 1, ⋯ , N - 1	Projection Matrices
×_n	Mode-n product of a tensor and a matrix
⊗	Kronecker Product
.*	Hadamard Product (pointwise product)
〈A, B〉	Inner product of A and B
X ^T, X ^-1, X ^†	Transpose, Inverse and Moore-Penrose pseudo-inverses

2.2 Tensor operations

In mathematics, tensor is a multi-linear functional. In the application of data processing, we usually represent a tensor as a multi-index data set. Tensor is a set of real numbers. For example, $X = {x_{i_{1}} \dots x_{i_{N}} | x_{i_{1}} \dots x_{i_{N}} \in R$ , 1 ≤ i_n ≤ L_n, n = 1, ⋯ , N} ⊆R, that is, the elements in $X$ are real numbers. Each real number x_{i
₁} ⋯ x_{i
_N} has N indices: i₁ ⋯ i_N. The range of change for each i_n index is 1 ⩽ i_n ⩽ L_n, n = 1, ⋯ , N. Therefore, the number of real numbers contained in $X$ is $| X | = \prod_{i = 1}^{N} L_{i}$ .

From the perspective of data processing, an N-order tensor $X$ can represent an N- dimensional data, where the dimensionality of the n-th order is L_n. For example, a two-dimensional tensor can represent a digital image, and its two dimensions represent the rows and columns of the image. In other words, any elements x_{i
₁}i₂ represents the gray value of the image at the row $i_{1}^{th}$ and column $i_{2}^{th}$ positions. Therefore, we use multidimensional data to express tensor: $X \subseteq R^{L_{1} \times \dots \times L_{N}}$ .

Tensors can be represented by matrices [15]. For example, the mode-n of a tensor $X$ is expressed as: $X^{(n)} = [\begin{matrix} X_{\underset{n - 1}{\underset{︸}{1 \dots 11}} \underset{n - 1}{\underset{︸}{l \dots 1}}} & \dots & X_{\underset{n - 1}{\underset{︸}{L_{1} \dots L_{n - 1} 1}} \underset{n - 1}{\underset{︸}{l L_{n + 1} \dots L_{N}}}} \\ ⋮ & ⋱ & ⋮ \\ X_{\underset{n - 1}{\underset{︸}{1 \dots 11}} L_{n} \underset{n - 1}{\underset{︸}{l \dots 1}}} & \dots & X_{\underset{n - 1}{\underset{︸}{L_{1} \dots L_{n - 1}}} L_{n} \underset{n - 1}{\underset{︸}{l L_{n + 1} \dots L_{N}}}} \end{matrix}]$ (1)

bfDefinition: Mode product of a tensor and a matrix: With a tensor $X \in R^{L_{1} \times \dots \times L_{N}}$ and a matrix A_n ∈ R^{J_n×⋯×L_n}, the mode-n product of $X$ and A_n is another tensor $G \in R^{L_{1} \times \dots \times L_{n - 1} \times J_{n} \times L_{n + 1} \times \dots \times L_{N}}$ , the mode-n product of $G$ is: $G^{(n)} = A_{n} X^{(n)} \in R^{J_{n} \times (L_{1} \dots L_{n - 1} L_{n + 1} \dots L_{N})}$ (2) The mode-n product of $X$ and A_n is usually expressed as: $G = X \times_{n} A_{n}$ (3) From the definition of the mode product, it can be seen that the mode product has the following properties:

Alter the order of the tensor.

if A_n is full rank of the column, using the mode-n product, there is: $\begin{matrix} G = X \times_{n} A_{n} \Rightarrow G^{(n)} = A_{n} X^{(n)} \\ \Rightarrow X^{(n)} = (A_{n}^{T} A_{n})^{†} A_{n}^{T} G^{(n)} \\ \Rightarrow X = G \times_{n} ((A_{n}^{T} A_{n})^{†} A_{n}^{T}) \end{matrix}$ Here $(A_{n}^{T} A_{n})^{†}$ is pseudo-inverse of $A_{n}^{T} A_{n}$ . In particular, we have $X$ = $G \times_{n} ((A_{n}^{T} A_{n})^{†} A_{n}^{T}) = G \times_{n} A_{n}^{T}$ for $A_{n}^{T} A_{n} = I_{L_{n}}$ , which indicates that the column vector of A_n is orthogonal. If we use spanA_n to represent the linear subspace of the column vector of A_n, then we can get $X^{(n)} = A_{n}^{T} G^{(n)}$ , which represent the column vector of $X^{(n)}$ is the projected coordinate of the column vector of $G^{(n)}$ in the subspace. Therefore, matrix A_n is often called a projection matrix.

associative law: suppose $X \in R^{L_{1} \times \dots \times L_{N}}$ , A_n ∈ R^J_n×L_n, B_n ∈ R^K_n×J_n, then: $\begin{matrix} X \times_{n} A_{n} \times_{n} B_{n} = X \times_{n} (B_{n} A_{n}) \\ \in R^{K_{n} \times (L_{1} \dots L_{n - 1} L_{n + 1} \dots L_{N})} \end{matrix}$

commutative law: suppose $X \in R^{L_{1} \times \dots \times L_{N}}$ , A_n ∈ R^J_n×L_n, A_m ∈ R^J_m×L_m, n ¬ = m, then: $X \times_{n} A_{n} \times_{m} A_{m} = X \times_{m} A_{m} \times_{n} A_{n}$

The following Lemma will be used in this paper.

bfLemma [15]: Suppose $G \in R^{J_{1} \times \dots \times J_{N}}$ , A_n ∈ R^L_n×J_n, n = 1, ⋯ , N if $X = G \times_{n} A_{1} \dots \times_{N - 1} A_{N - 1}$ (4) then $X^{(n)} = G^{(n)} (A_{1} \otimes \dots \otimes A_{N - 1})^{T}$ (5)

3 Related works

3.1 LLE

Local linear embedding (LLE) [16] is known as the first algorithm for manifold learning, and it is a dimensionality reduction method for nonlinear data. The low-dimensional data after processing can maintain the original topological relationship. In recent years, LLE is widely employed in the classification and clustering of image data, character recognition, visualization of multi-dimensional data, bioinformatics and other fields. N. Jain et al. proposed patch based LLE for node localization in sensor networks [17], which is able to localize nodes in dense nodes accurately. In order to extract the global and local characteristic of the observation data more accurately, Y. Zhang et al. proposed a novel modified kernel semi-supervised LLE named MK-SSLLE [18]. C. Yao et al. proposed the LLE score [19], which is a new filter-based feature selection method. In addition, Hessian Local Linear Embedding (HLLE) [20] and Adaptive Local Linear Embedding (ALLE) [21] had been used to solve localization problem. H. Rajaguru et al. [22] used LLE and HLLE to implement Epilepsy Classification from EEG signals.

3.2 Tensor decomposition

More and more algorithms consider adding manifold learning algorithms to the dimensionality reduction model, which is usually used as a regular term to preserve the internal geometric information of the data during the dimensionality reduction process. In this regard, the algorithm called graph regularized non-negative matrix factorization (GNMF) [23] was proposed, in which a manifold regularization term based on non-negative matrix factorization was added, and the neighborhood information was learned through Laplace regularization terms, so local geometric information of high-dimensional data could be retained in the new subspace. Subsequently, Zhang et al. proposed the method of low-rank matrix approximation with manifold regularization (MMF) [24]. It considers the neighborhood information of the data while seeking low-rank factorization, and learns the geometric information of the data through undirected graphs.

Recently, researchers have done in-depth research on the Tucker decomposition of tensor in the field of machine learning. Let $X \in R^{L_{1} \times \dots \times L_{N}}$ be a tensor, then Tucker decomposition of $X$ can be represented by Equation (6). $min_{G, A_{1}, \dots A_{N}} | X - G \times_{1} A_{1} \dots \times_{N} A_{N} |^{2}$ (6) where A_n ∈ R^L_n×J_n, n = 1, 2, ⋯ N, and $G \in R^{J_{1} \times \dots \times J_{N}}$ .

The research of Tucker decomposition mainly focuses on the orthogonal or non-negative constraints as well as the addition of various regularization terms. For example, non-negative Tucker decomposition (NTD) [25] also adds non-negative conditions. Zhang et al. [2] proposed a new model for tensor decomposition–low-rank regularized heterogeneous tensor decomposition for subspace clustering (LRR-HTD). For all modes except the last mode, LRR-HTD aims to seek a series of orthogonal projection matrices that can project the original tensor data to a low-dimensional subspace. And for the last mode, we can get the lowest-rank representation under the condition of satisfying the low-rank constraint, which reveals the overall structure of the sample. In 2017, Li et al. [11] proposed a unique algorithm called manifold regularization non-negative Tucker decomposition for tensor data dimension reduction and representation (MR-NTD). Under non-negative constraints, a method for Tucker decomposition of multiple tensor objects is showed. At the same time, the manifold structure of the data is considered, and the geometric relationship of the core tensor is introduced into the objective function to maintain the geometry of the tensor before and after dimensionality reduction. Then, a more efficient algorithm called the graph-Laplacian tensor decomposition (GLTD) [12] is proposed in 2019. This method also studies the characteristics and similarities between pictures. In addition, the addition of LE regular items improves the robustness of occlusion or abnormal image processing.

4 Dimensionality reduction of tensor data based on local linear embedding and mode product(LLEMP-TDR)

4.1 Tensor representation of tensor datasets

Since dimensionality reduction algorithms have to make use of the relationship between data, unlike many similar algorithms, multidimensional datasets are presented in the form of tensor, not a singular data in this paper. Here, $R \in R^{L_{1} \times \dots \times L_{N - 1} \times M}$ represents a tensor for multidimensional datasets, in which L₁, ⋯ L_N-1 represents the various dimensions of tensor data, and M indicates the number of this dataset. Then, the mode-N product form of the tensor $X$ is expressed as follows: $X^{(N)} = [\begin{matrix} x_{\underset{N - 1}{\underset{︸}{1 \dots 1}} 1} & \dots & x_{\underset{N - 1}{\underset{︸}{L_{1} \dots L_{N - 1}}} 1} \\ ⋮ & ⋱ & ⋮ \\ x_{\underset{N - 1}{\underset{︸}{1 \dots 1}} M} & \dots & x_{\underset{N - 1}{\underset{︸}{L_{1} \dots L_{N - 1}}} M} \end{matrix}] = [\begin{matrix} X_{1 Row}^{(N)} \\ ⋮ \\ X_{MRow}^{(N)} \end{matrix}]$ (7) These row vectors $X_{1 Row}^{(N)}, \dots, X_{NRow}^{(N)}$ of $X^{(N)}$ indicates M high-dimensional data points of tensor included in tensor $X$ .

In order to explain the dimensionality reduction of tensor data more clearly, for example, given a tensor $X \in R^{L_{1} \times \dots \times L_{N - 1} \times M}$ , then to obtain the corresponding reduced-dimensionality tensor $G \in R^{J_{1} \times \dots \times J_{N - 1} \times M}$ , in which L_n ⩾ J_n, n = 1, ⋯ , N - 1. The mode-N of $G$ is expanded as follows: $\begin{matrix} G^{(N)} = [\begin{matrix} g_{\underset{N - 1}{\underset{︸}{1 \dots 1}} 1} & \dots & g_{\underset{N - 1}{\underset{︸}{L_{1} \dots L_{N - 1}}} 1} \\ ⋮ & ⋱ & vdots \\ g_{\underset{N - 1}{\underset{︸}{1 \dots 1}} M} & \dots & g_{\underset{N - 1}{\underset{︸}{L_{1} \dots L_{N - 1}}} M} \end{matrix}] \\ = [\begin{matrix} G_{1 Row}^{(N)} \\ ⋮ \\ G_{MRow}^{(N)} \end{matrix}] \in R^{M \times \prod_{i = 1}^{N - 1} J_{i}} \end{matrix}$ (8) Where the row vectors $G_{iRow}^{(N)}$ of $G^{(N)}$ is the dimensionality reduction result of $X_{iRow}^{(N)}$ .

4.2 Tensor version of local linear embedding (LLE)

Local linear embedding is an important idea for manifold learning. The so-called local linear embedding is to first decompose the high-dimensional data set into several parts, then calculate the linear pattern of each part, and finally solve the reduced dimensional dataset according to the criteria maintained by the local linear pattern. This paper applies the idea of local linear embedding to the dimension-reduced of high-dimensional datasets.

4.2.1 Local linear approximation

For any high-dimensional data $X_{iRow}^{(N)}$ , let $X_{i_{1} Row}^{(N)}, \dots, X_{i_{k} Row}^{(N)}$ be the K nearest neighbors of $X_{iRow}^{(N)}$ , We want to approximate $X_{iRow}^{(N)}$ with a linear combination of $X_{i_{1} Row}^{(N)}, \dots, X_{i_{k} Row}^{(N)}$ , that is: $\begin{matrix} | (X_{iRow}^{(N)})^{T} - \sum_{k = 1}^{K} w_{i_{k}} (X_{i_{k} Row}^{(N)})^{T} |^{2} \\ = & | - (X_{iRow}^{(N)})^{T} + \sum_{k = 1}^{K} w_{i_{k}} (X_{i_{k} Row}^{(N)})^{T} |^{2} \end{matrix}$ $\begin{matrix} = & | [(X_{i_{1} Row}^{(N)}) (X_{iRow}^{(N)})^{T}) \dots (X_{i_{K} Row}^{(N)}) (X_{iRow}^{(N)})^{T})] [\begin{matrix} w_{i_{1}} \\ ⋮ \\ w_{i_{K}} \end{matrix}] |^{2} \\ = & | {\tilde{X}}_{i}^{(N)} w_{i} |^{2} = w_{i}^{T} (X_{i}^{(N)})^{T} X_{i}^{(N)} w_{i} \underset{w_{i}}{⟶} \min \end{matrix}$ (9) Where $w_{i} = [\begin{matrix} w_{i_{1}} \\ ⋮ \\ w_{i_{K}} \end{matrix}] \in R_{K}, \sum_{k = 1}^{K} w_{i_{k}} = w_{i}^{T} Γ_{K} = 1, Γ_{K} = [\begin{matrix} 1 \\ ⋮ \\ 1 \end{matrix}] \in R_{K}$ , ${\tilde{X}}_{i}^{(N)} = [(X_{i_{1} Row}^{(N)}) - (X_{iRow}^{(N)})^{T}) \dots (X_{i_{K} Row}^{(N)}) - (X_{iRow}^{(N)})^{T})]$ . $X_{i_{1} Row}^{(N)}$ and its K nearnest neighbors $X_{i_{1} Row}^{(N)}, \dots, X_{i_{k} Row}^{(N)}$ are called a local of the high-dimensional dataset $X$ , and the linear approximation coefficient w_i of these near neighbors $X_{i_{1} Row}^{(N)}, \dots, X_{i_{k} Row}^{(N)}$ to $X_{iRow}^{(N)}$ is called this local linear pattern. Using Lagrange method, we can get w_i in Equation (10). $\begin{matrix} ϕ (w_{i}, λ) = w_{i}^{T} ({\tilde{X}}_{i}^{(N)})^{T} {\tilde{X}}_{i}^{(N)} w_{i} + λ (w_{i}^{T} Γ_{K} - 1) \\ \underset{choose w_{i}}{⟶} \min \end{matrix}$ (10) ${\begin{matrix} \frac{d ϕ (w_{i}, λ)}{d λ} = w_{i}^{T} Γ_{K} - 1 = 0 \\ \nabla_{w_{i}} ϕ (w_{i}, λ) = 2 ({\tilde{X}}_{i}^{(N)})^{T} {\tilde{X}}_{i}^{(N)} w_{i} + λ Γ_{K} = 0 \end{matrix}$ $\Rightarrow {\begin{matrix} w_{i}^{T} Γ_{K} = 1 \\ w_{i} = - \frac{λ}{2} (({\tilde{X}}_{i}^{(N)})^{T} {\tilde{X}}_{i}^{(N)})^{-} 1 Γ_{K} \end{matrix}$ $1 = - \frac{λ}{2} Γ_{K}^{T} {\tilde{X}}_{i}^{(N)})^{-} 1 Γ_{K} \Rightarrow λ = - \frac{2}{Γ_{K}^{T} {\tilde{X}}_{i}^{(N)})^{-} 1 Γ_{K}}$ $w_{i} = \frac{{\tilde{X}}_{i}^{(N)})^{-} 1 Γ_{K}}{Γ_{K}^{T} {\tilde{X}}_{i}^{(N)})^{-} 1 Γ_{K}}$ (11)

4.2.2 Preservation of local linear patterns

We let $G_{iRow}^{(N)}$ , $G_{i_{1} Row}^{(N)}, \dots, G_{i_{k} Row}^{(N)}$ as the reduced dimensional tensor data corresponding to $X_{iRow}^{(N)}$ , $X_{i_{1} Row}^{(N)}, \dots, X_{i_{k} Row}^{(N)}$ . We hope that during the dimensionality reduction of the tensor data, the local linear mode remains unchanged, that is, $G_{iRow}^{(N)}$ , $G_{i_{1} Row}^{(N)}, \dots, G_{i_{k} Row}^{(N)}$ are selected according to the criteria in Equation (12). $\begin{matrix} | (G_{iRow}^{(N)})^{T} - \sum_{k = 1}^{K} w_{i_{k}} (G_{i_{k} Row}^{(N)})^{T} |^{2} \\ \underset{G_{i} {Row}^{(N), G_{i_{1} Row}^{(N)} \dots G_{i_{k} Row}^{(N)}}}{\to} \min \end{matrix}$ (12) In order to consider all the local $G_{iRow}^{(N)}$ , $G_{i_{1} Row}^{(N)}, \dots, G_{i_{k} Row}^{(N)}$ , i = 1, ⋯ , M, we let: $ζ_{ij} = {\begin{matrix} w_{i_{k}} j = i_{k}, k = 1, \dots, K \\ 0 j \neg = i_{k}, k = 1, \dots, K \end{matrix} i, j = 1, \dots, M$ So there is the objective function in Equation (13) $\begin{matrix} LLE (G) = & \sum_{i = 1}^{M} | (G_{iRow}^{(N)})^{T} - \sum_{k = 1}^{K} w_{i_{k}} (G_{i_{k} Row}^{(N)})^{T} |^{2} \\ = & \sum_{i = 1}^{M} | (G_{iRow}^{(N)})^{T} - \sum_{j = 1}^{M} ζ_{ij} (G_{jRow}^{(N)})^{T} |^{2} \\ = & \sum_{i = 1}^{M} | (G^{(N)})^{T} s_{i} - \sum_{j = 1}^{M} ζ_{ij} (G^{(N)})^{T} s_{j} |^{2} \\ = & \sum_{i = 1}^{M} | (G^{(N)})^{T} (s_{i} - \sum_{j = 1}^{M} ζ_{ij} s_{j}) |^{2} \\ = & \sum_{i = 1}^{M} | (G^{(N)})^{T} θ_{i}) |^{2} = \sum_{i = 1}^{M} tr ((G^{(N)})^{T} θ_{i} θ_{i}^{T} G^{(N)}) \\ = & tr ((G^{(N)})^{T} (\sum_{i = 1}^{M} θ_{i} θ_{i}^{T}) G^{(N)}) \\ = & tr ((G^{(N)})^{T} Π G^{(N)}) \to \min \end{matrix}$ (13) Where s_i ∈ R^M is the selected matrix, in which the i-th element is 1 and the other elements are 0. From the above formula, we know that in Equation (14). $\begin{matrix} θ_{i} = s_{i} - \sum_{j = 1}^{M} ζ_{ij} s_{j} \in R^{M} Π = \sum_{j = 1}^{M} θ_{i} θ_{i}^{T} \in R^{M \times M} \end{matrix}$ (14) Obviously, Π is a symmetric semi-positive definite matrix. The constraint $(G^{(N)})^{T} G^{(N)} = I_{J_{1} \dots J_{N - 1}}$ ensures that J₁ ⋯ J_N-1 are standard orthogonal eigenvectors of the matrix Π.

4.3 Dimensionality reduction of tensor data based on mode product (MP-TDR)

This paper proposes to use the mode product of Tensor to achieve dimensionality reduction of tensor data. The matrix here is the projection matrix. Given a high-dimensional tensor data and the dimensionality reduction requirements of each order, the dimensionality reduction algorithm (referred to as MP-TDR) proposed in this paper is: as mentioned above, for a original tensor $X \in R^{L_{1} \times \dots \times L_{N - 1} \times M}$ , then to obtain the corresponding reduced-dimensionality tensor $G \in R^{J_{1} \times \dots \times J_{N - 1} \times M}$ , in which L_n ⩾ J_n, n = 1, ⋯ , N - 1, and $G$ is expressed as follows: $G = X \times_{1} A_{1}^{T} \dots \times_{N - 1} A_{N - 1}^{T}$ (15) According to the Lemma in Section 2 of this paper, there is: $\begin{matrix} G = X \times_{1} A_{1}^{T} \dots \times_{N - 1} A_{N - 1}^{T} ⟶ G^{(N)} \\ = X^{(N)} (A_{1} \otimes \dots \otimes A_{N - 1}) \end{matrix}$ (16) Where W = A₁ ⊗ ⋯ ⊗ A_N-1 ∈ R^{(L₁_cdotsLN-1)(J₁_cdotsJN-1)}. Furthermore, if we set $A_{n}^{T} A_{n} = I_{J_{n}}, n = 1, \dots, N - 1$ , then W^TW = (A₁ ⊗ ⋯ ⊗ A_N-1) ^T (A₁ ⊗ ⋯ ⊗ A_N-1) = I_{J₁⋯J_N-1}. Therefore, from the perspective of data reduction, the meaning of $G^{(N)} = X^{(N)} W$ is that the column vector of W is expanded into a subspace of $\prod_{n = 1}^{N - 1} L_{n}$ -dimensional Euclidean space $R^{\prod_{n = 1}^{N - 1} L_{n}}$ in $\prod_{n = 1}^{N - 1} J_{n}$ -dimensional subspace, that is $spanW \subseteq R^{\prod_{n = 1}^{N - 1} L_{n}}$ . The column vector of W is the standard orthogonal basis coordinates of subspace spanW, and the row vector of $X^{(N)}$ is the vector of Euclidean space $R^{\prod_{n = 1}^{N - 1} L_{n}}$ , the projection of these vectors in the subspace spanW is still a high-dimensional vector in $\prod_{n = 1}^{N - 1} L_{n}$ dimensions. But, the projected coordinates based on the orthonormal subspace spanW are $\prod_{n = 1}^{N - 1} J_{n}$ -dimensional low-dimensional vectors, and these low dimensional vectors are row vectors of $G^{(N)}$ . $G$ achieves the dimension-reduced of $X$ in this way.

The above dimensionality reduction algorithm has three characteristics:

This algorithm is global. It does not decompose $X$ into several parts and then constructs a dimensionality reduction algorithm. Instead, $X$ is used as a whole to construct a dimensionality reduction algorithm, so we call it a global dimensionality reduction algorithm;

This algorithm is subspace-dependent and is an algorithm constructed based on subspace A₁, ⋯ , A_N-1, so we call it a subspace-based dimensionality reduction algorithm;

Any vector in the subspace spanW is a high-dimensional vector of $\prod_{n = 1}^{N - 1} L_{n}$ -dimension, but its coordinates on the orthonormal basis of the subspace spanW are $\prod_{n = 1}^{N - 1} J_{n}$ -dimensional low-dimensional vector, which can be used as a dimensionality reduction result of the high-dimensional vector. This algorithm uses the coordinates of the high-dimensional vector in the subspace spanW projection as the dimensionality reduction result of the high-dimensional vector.

It must be pointed out that as long as the number of rows and columns of these matrices meet the specified dimensionality reduction requirements, the specific values of the matrix elements are open. Therefore, MP-TDR is not so much an algorithm as it is an algorithm framework. Any specific method for determining the mode product matrix will transform this framework into a specific algorithm. A natural and intuitive subspace selection method is to minimize the distance between high-dimensional tensor data and its projection in the subspace, that is: $∥ X - X \times_{1} A_{1}^{T} \dots \times_{N - 1} A_{N - 1}^{T} ∥^{2} \underset{A_{1}, \dots, A_{N - 1}}{⟶} \min$ (17) It can be proved that at this time, the column vector of W must be the principal element of the covariance matrix of the row vector of $X^{(N)}$ .

4.4 Tensor dimensionality reduction based on tensor mode product and LLE(LLEMP-TDR)

4.4.1 The propsed algorithm

There are two categories of dimensionality reduction algorithms, one is locally preserved dimensionality reduction algorithms, and the other is global preserving dimensionality reduction algorithms, such as PCA and LDA. LLE has two characteristics: one is partial, in which the dataset is decomposed into one by one, and then each part is constructed separately for dimensionality reduction algorithm; The other is that there is no clear relationship between high-dimensional tensor datasets and reduced dimension tensor datasets, so the out-of-sample problem cannot be solved. And MP-TDR also has two characteristics: one is global, it does not decompose the tensor dataset into one part when MP-TDR constructs the dimensionality reduction algorithm, the entire tensor dataset is unified to consider the construction of the dimensionality reduction algorithm; The other is that the high-dimensional tensor dataset has a clear relationship with the reduced dimension tensor dataset in MP-TDR, that is, the reduced dimension tensor is the mode product of the high-dimensional tensor. At present, the standard of a good data dimensionality reduction algorithm should be: reduced dimension dataset can not only maintain some global features of the high-dimensional dataset, but also maintain some local features of the high-dimensional dataset. In other words, the data dimensionality reduction algorithm is preferably an algorithm that maintains certain global and local features of a high-dimensional dataset at the same time. Besides, it is better to establish a clear relationship between high-dimensional data and reduced dimension data in order to solve the problem of insufficient samples. From the analysis of LLE and MP-TDR in this paper, it can be seen that LLE and MP-TDR can complement each other in these two aspects. Therefore, this paper proposes a combination of LLE and MP-TDR, referred to as LLEMP-TDR. The method of LLEMP-TDR determines the matrix in the light of the locally linear embedding criterion derived in Section 4.2 of this paper, that is: $\begin{matrix} tr (G^{{(N)}^{T}} Π G^{(N)})) \underset{G^{(N)}}{⟶} \min \underset{G^{(N)} = X^{(N)} W}{⟶} \\ tr (W^{T} (X^{{(N)}^{T}} Π X^{(N)} W)) \underset{W^{T} W = I_{J_{1} \dots J_{N - 1}}}{⟶} \min \end{matrix}$ (18) This algorithm in this paper is dimensionality reduction of tensor data based on local linear embedding and mode product, called LLEMP-TDR for short.

LLEMP-TDR not only considers the properties of local preservation but also the properties of global preservation. Therefore, LLEMP-TDR is the dimensionality reduction of tensor data with local and global attributes. The local preservation of LLEMP-TDR refers to the preservation of the local linear approximation patterns, while the global preservation refers to the preservation of the minimum distance between the high-dimensional tensor data and its subspace projection.

4.4.2 Solution to LLEMP-TDR

Since $X^{(N)})^{T} Π X^{(N)}$ is a symmetric positive semi-definite matrix, under the constraint of W^TW = I_{J₁⋯J_N-1}, the solution of LLEMP-TDR can be transformed into a Rayleigh quotient problem, and the column vector of W is made up of standard orthogonal eigenvectors of matrix $X^{(N)})^{T} Π X^{(N)}$ with relatively small eigenvalues.

The algorithm flow of LLEMP-TDR in the table Algorithm 1.

Algorithm 1 Procedures of LLEMP-TDR
1. bfInput: Dataset $X \in R^{L_{1} \times \dots \times L_{N - 1} M}$ , dimensionality of dimension reduction d, number of nearest neighbors K.
2. bfOutput:Tensor $G \in R^{J_{1} \times \dots \times J_{N - 1} M}$ .
3. Find the locality of each sample $X_{i} {Row}^{(N)}$ utilizing the K nearest neighborhoods method and calculate corresponding local linear pattern w_i;
4. Compute LLE matrix Π by Equation (14);
5. Compute projection matrix W ∈ R^I_n× J_n (n = 1, ⋯ N - 1) by Equation (18).
6. Return $G \in R^{J_{1} \times \dots \times J_{N - 1} M}$ .

4.5 Complexity analysis

The computational complexity of LLEMP-TDR mainly consists of the calculation of Π and $G$ . First, we analyze the time complexity of computing Π. Assuming the original tensor data $X \in R^{L_{1} \times \dots \times L_{N - 1} M}$ and the reduced dimension tensor data $G \in R^{J_{1} \times \dots \times J_{N - 1} M}$ . For Π, the time complexity of using K nearest neighbor algorithm to find the K nearest neighbor points of M data is o (M²L^N-1), and the time complexity of computing w_i is o (JMK³), the time complexity of obtaining Π is o (M³); For $G$ , the time complexity of calculating the eigenvalue decomposition of the projection matrix W is o ((N - 1) MJ²), and the time complexity of getting $G$ is o (MJ²), so the total time complexity is o (M²L^N-1 + JMK³ + M³ + (N - 1) MJ² + MJ²).

Compared with comparison algorithms (such as GLTD [12], etc.), our algorithm does not require iterative solution, and according to the complexity analysis of the algorithm we proposed above, it can be seen that the complexity of our algorithm is roughly the same as or even lower than most similar algorithms.

4.6 Comparison with other the state-of-the-art related works

In this section, we will introduce some of the latest works published in top journals that are most relevant to our work, and explain the difference between them and our work.

Matrix is actually a kind of second-order tensor, so the famous non-negative factorization of matrix is a special kind of Tucker decomposition for tensor. In [23], an algorithm of graph regularized non-negative matrix factorization (GNMF) is as follows: $min_{U ⩾ 0, Y ⩾ 0} | X - UY |^{2} + λ tr (Y Π Y^{T})$ (19) Where X ∈ R^D×N, U ∈ R^D×d, and Π ∈ R^N×N is the so-called graph Laplacian matrix and calculated from the similarity between the column vectors of X. If the column vectors of X are regarded as data, then the column vectors of Y are the compact representation of data, as claimed in [23]. GNMF is an improved algorithm based on Nonnegative Matrix Factorization, when λ is 0, it degenerates into Nonnegative Matrix Factorization. Moreover, it is a local preserving algorithm, which only considers the nearest neighbor information of the data and merges it into the regular term.

Zhang et al. proposed an algorithm called Low-Rank Matrix Approximation with Manifold Regularization (MMF) [24]. MMF is a low-rank decomposition model, which considers the neighborhood information of the data while seeking low-rank factorization, and learns the geometric information of the data through undirected graphs. The objective function of the algorithm is as follows: $min_{U^{T} U = I_{d}, Y} | X - UY |^{2} + λ tr ((Y Π Y^{T})$ (20) Where the requirement U^TU = I_d represents low-rank approximation. It is argued in [25] that MMF is better than GNMF in certain application scenarios. However, this algorithm is also a globally-maintained algorithm, which lacks effective retention of local information. Moreover, the effectiveness of low-rank decomposition depends on the ability of the decomposition to approximate low-rank and the ability to maintain nonlinearity, so the algorithm is not stable.

Orthographically constrained Tucker decomposition is another research topic of Tucker decomposition. In 2018, Zhang et al. proposed an algorithm of low-Rank regularized heterogeneous tensor decomposition for subspace clustering (LRR-HTD) [2]. LRR-HTD seeks a set of orthogonal projection matrices to map the original tensor data to a low-dimensional subspace, but for the last mode, the low-rank constraint needs to be satisfied, to obtain a lowest-rank representation that reveals the overall structure of the sample. The model of LRR-HTD is as follows: $min_{G; A_{n}^{T} A_{n} = I_{J_{n}}, n = 1, \dots, N - 1; A_{N} ⩾ 0} | X - G \times_{1} A_{1} \dots \times_{N} A_{N} |^{2}$ (21) LRR-HTD is a new heterogeneous Tucker decomposition model, which is only used for subspace clustering and may not be suitable for classification scenarios. This is one of its shortcomings.

In [24], an algorithm of manifold regularization non-negative Tucker decomposition (MRNTD) is proposed as follows: $\begin{matrix} min_{G ⩾ 0, A_{n} ⩾ 0, n = 1, \dots, N - 1} | X - G \times_{1} A_{1} \dots \times_{N - 1} A_{N - 1} |^{2} \\ + λ tr ((G^{(n)})^{T} Π G^{(n)}) \end{matrix}$ (22) Where $X \in R^{L_{1} \times \dots \times L_{N - 1} \times M}$ , $G \in R^{J_{1} \times \dots \times J_{N - 1} \times M}$ , A_n ∈ R^L_n×J_n, n = 1, ⋯ , N - 1, and again Π ∈ R^M×M is the so-called graph Laplacian matrix. If the row vectors of $X^{(N)}$ are regarded as data, then the row vectors of $G^{(N)}$ are the compact or dimension-reduced representation of data. MRNTD seems to be the tensor version of GNMF. The algorithm merges manifold structures by using a simple regularization. There may be other possibilities for using this diverse information. Like our algorithm, it not only considers the preservation of global information but also considers the preservation of local information.

Recently, a relatively novel dimensionality reduction algorithm for tensor has been proposed by Jiang et al. It is called an image representation and learning method based on graph-Laplacian tensor decomposition (GLTD) [12]. The difference between this method and other algorithms is that the properties and resemblances between the images are considered, and the LE regular term is addded to improve the robustness of occlusion or abnormal image processing. The objective function is shown in Equation 23. $\begin{matrix} \underset{A_{1}, \dots, A_{N}}{arg min} | X - G \times_{1} A_{1} \dots \times_{N - 1} A_{N - 1} \times_{N} A_{N} |^{2} \\ + λ tr (A_{N}^{T} {LA}_{N}) \end{matrix}$ (23) Among them, L is the graph Laplacian matrix. Compared with Tucker decomposition, GLTD representation performs more regular due to the Laplace regularization, which can continuously improve classification and clustering results. However, GLTD is a non-convex problem and needs to be solved iteratively. An optimization method for solving the GLTD algorithm is deduced in the paper, but the computational complexity is relatively high.

The algorithm proposed in this paper not only maintains global and local information in dimensionality reduction, but also takes into account lower computational complexity. Therefore, the algorithm in this paper has a slight advantage over the above algorithms. The experimental results between our algorithm and these algorithms are presented in Section 5 Experimental Results.

5 Experimental results

In the experiment, we use the PC as the tool, and its configuration is as follows:

(1) Hardware: CPU: Inter Core i5-6200U @2.30GHz; Memory (RAM): 4.00GB; Operating system: Windows 7, 64 bit;

(2) Software: MATLAB R2019a.

At present, there is no unified and widely-recognized criterion for evaluating various algorithms of dimensionality reduction. The evaluation of various dimensionality reduction algorithms is mainly indirect. After dimensionality reduction, classification or clustering experiments are carried out to see which algorithm can get the highest accuracy.

This section presents the experimental results of the LLEMP-TDR proposed in this paper and the other five related algorithms published in the top journal in recent years on five common datasets. The theoretical analysis of these five related algorithms has been given in Section 3. The performance of the LLEMP-TDR is fully demonstrated by evaluating the clustering accuracy, classification accuracy and normalized mutual information (NMI) on five standard data sets (Faces94 male, ETH80, ORL, Olivetti, USPS). In addition, this section also provides the influence of the embedding dimension on the performance of LLEMP-TDR.

5.1 Datasets

(1) bfETH80-1: ETH80 [26] is a multi-vision image dataset, including eight categories such as apples, cars, cows, etc,which is shown in Fig. 1. Each category is composed of 10 pictures, each picture is represented by 41 images from different angles, in which the original picture size is 128 × 128, and the size of each image is adjusted to 32 × 32 in our experiments. This experiment selected the image of the first item in each category, a total of 328 images.

Fig. 1

The example picture of ETH-80 dataset.

(2)bf ORL: As a face dataset, ORL [22] contains 400 images of 40 different people, varying in light, angle, facial expression, and whether wearing glasses. The original image size is 92 × 112. In this experiment, we adjust the size of each picture to 32 × 32. The ORL dataset is shown in Fig. 2.

Fig. 2

The example picture of ORL dataset.

(3) bfOlivetti Faces: The dataset of Olivetti Faces consists of 40 people, with a total of 400 faces, each of which has 10 face pictures, including front face, side face, and different expressions. And the size of each picture is adjusted to 64 × 64.

(4) bfFaces 94 male: Faces 94 male [12] contains 113 male subjects. Each subject was sitting in front of the camera and was asked to speak while taking 20 images. The speech was intended to introduce changes in facial expressions. The original image size is 20 × 180. In our experiment, each image was adjusted to 50 × 45. We selected 30 subjects for a total of 600 images.

(5) bfUSPS: USPS (United States Postal Service) [27] is a handwritten dataset shown in Fig. 3. The dataset contains ten types of handwritten images from 0 to 9, and each type has 1100 sample images. The size of each example image is 16 × 16. In this paper, we randomly select 200 images in each category for a total of 2000 images.

Fig. 3

The example picture of USPS dataset.

5.2 Experimental settings

The above five datasets use K-Means method of Euclidean distance metric to perform clustering experiments on the embedding results, and use KNN classifiers to perform classification experiments on the embedding results. At the same time, we evaluate the performance of the LLEMP-TDR and other algorithms by using three indicators of clustering accuracy, NMI, and classification accuracy to verify the effectiveness of LLEMP-TDR.

The parameters selection of all algorithms are set as follows: the neighbor number of all methods is set to 8; the LLE regular coefficient λ of LLEMP-TDR is set to 10; the number of cycles of GNMF [10] is set to 500; For MMF [23], the number of outer and internal loops is set to 40 and 25 respectively; the parameter γ of LRR-HTD [24] is 1; the parameter setting of MR-NTD [11] refers to [11]; the maximum number of iterations of NTD [28] is 50, and the maximum number of iterations of GLTD [12] is 50, too.

This paper uses five common datasets to verify the performance of the algorithm. The main statistical parameters are shown in Table 3.

Table 2
Abbreviation

Abbreviation Full name

LLE Local Linear Embedding

HOSVD High-Order Singular Value Decomposition

MR-NTD Manifold Regularization Non-negative Tucker Decomposition

GLTD Graph-Laplacian Tucker Tensor Decomposition

LPP-DR Locality Perserving Projections Algorithm for Hyperspectral Image Dimensionality Reduction

LRNTF Laplacian Regularized Non-negative Tensor Factorization

MK-SSLLE Modified Kernel Semi-Supervised Local Linear Embedding

HLLE Hessian Local Linear Embedding

ALLE Adaptive Local Linear Embedding

LRR-HTD Low-Rank Regularized Heterogeneous Tensor Decomposition

NTD Non-negative Decomposition

GNMF Graph Regularized Nonnegative Matrix Factorization

MMF Low-Rank Matrix Approximation with Manifold Regularization

Abbreviation	Full name
LLE	Local Linear Embedding
HOSVD	High-Order Singular Value Decomposition
MR-NTD	Manifold Regularization Non-negative Tucker Decomposition
GLTD	Graph-Laplacian Tucker Tensor Decomposition
LPP-DR	Locality Perserving Projections Algorithm for Hyperspectral Image Dimensionality Reduction
LRNTF	Laplacian Regularized Non-negative Tensor Factorization
MK-SSLLE	Modified Kernel Semi-Supervised Local Linear Embedding
HLLE	Hessian Local Linear Embedding
ALLE	Adaptive Local Linear Embedding
LRR-HTD	Low-Rank Regularized Heterogeneous Tensor Decomposition
NTD	Non-negative Decomposition
GNMF	Graph Regularized Nonnegative Matrix Factorization
MMF	Low-Rank Matrix Approximation with Manifold Regularization

Table 3

Statistical parameters of datasets

Datasets	Size (M)	Original dimensionality	Reduced dimensionality	Number of classes
ETH80	328	32×32	8×8	8
ORL	400	32×32	6×6	40
Olivetti	400	64×64	8×8	40
Faces94 male	600	50×45	10×8	30
USPS	2000	16×16	7×8	10

5.3 The result of clustering experiments

There are usually two processing methods for datasets: clustering and classification. In the clustering experiment, the dimensions of the five high-dimensional datasets are first reduced to a certain dimension using the dimensionality reduction algorithm mentioned in the paper (LLEMP-TDR, GNMF, MMF, LRR-HTD, MR-NTD, NTD, and GLTD). Then, the K-means method of Euclidean distance measure is used to cluster the dimensionality-reduced datasets. The number of clusters is the same as the number of categories of the datasets. Since the algorithm of K-means is sensitive to the initial number of clusters, we set different initial value and repeat it 50 times in each clustering experiment. Then, the average of these results for the 50 tests is used as the final result.

Finally, the clustering accuracy and normalized mutual information (NMI) of each dimension-reduced dataset [29] are calculated to indirectly judge the performance of each algorithm of dimensionality reduction.

The calculation of NMI is as follows: Let θ_i be the probability of the i-th class of the original-dimensional tensor dataset $X$ , ϕ_i be the probability of the i-th category after clustering the reduced-dimensional tensor dataset. NMI is calculated as follows: $NMI (X, G) = \frac{MI (X, G)}{\max {H (X), H (G)}}$ (24) $MI (X, G) = \sum_{j = 1}^{C} \sum_{i = 1}^{C} p (θ_{i}, ϕ_{j}) \log_{2} \frac{p (θ_{i}, ϕ_{j})}{p_{Θ} (θ_{i}) p_{Φ} (ϕ_{j})}$ (25) $H (Θ) = - \sum_{i = 1}^{C} p_{Θ} (θ_{i}) \log_{2} p_{Θ} (θ_{i})$ (26) $H (Φ) = - \sum_{i = 1}^{C} p_{Φ} (ϕ_{i}) \log_{2} p_{Φ} (ϕ_{i})$ (27) $p_{Θ} (θ_{i}) = \frac{P_{i}}{M}$ (28) $p_{Φ} (ϕ_{i}) = \frac{Q_{i}}{M}$ (29) Where P_i is the number of high-dimensional tensor data included in the i-th class, and Q_i is the number of dimension-reduced tensor data contained in the i-th class.

Here, we perform clustering experiments on a subset of the number of different classes divided by the above five datasets. LLEMP-TDR proposed in this paper has good performance in the clustering accuracy and the clustering NMI performance for different types of sub-datasets for the ETH80 dataset (see Figs. 4 and 5). The performance of LLEMP-TDR is far superior to GNMF and GLTD. The performance of MMF, MR-NTD and NTD on this dataset is also lower than that of LLEMP-TDR. In addition, the clustering accuracy and clustering NMI of LLEMP-TDR decrease as the number of classes in this dataset increases. This result may be caused by the similar authentic structure of different types of datasets, which makes the algorithm produce similarity errors in the learning process. Other comparison algorithms have similar trends.

Fig. 4

Clustering Accuracy on subsets of ETH80.

Fig. 5

Clustering NMI on subsets of ETH80.

On the ORL dataset (see Figs. 6 and 7), we can see that the experimental results of LLEMP-TDR and other comparison algorithms are both lower than 75% in clustering accuracy and clustering NMI. However, it can be clearly seen that the LLEMP-TDR outperformance other algorithms, and it is 6% -20% higher than other algorithms in all categories. Especially for NTD and MR-NTD, our algorithm is 23% higher than them. In addition, the curve fluctuation of LLEMP-TDR is smaller than other algorithms, which indicates that the algorithm is more robust than other algorithms.

Fig. 6

Clustering Accuracy on subsets of ORL

Fig. 7

Clustering NMI on subsets of ORL

The effect of LLEMP-TDR for the Olivetti dataset (see Figs. 8 and 9) is comparable to the ORL dataset. LLEMP-TDR obtains the best clustering effect on both clustering accuracy and clustering NMI, and it obviously outperforms other algorithms.

Fig. 8

Clustering Accuracy on subsets of Olivetti.

Fig. 9

Clustering NMI on subsets of Olivetti.

On the Faces 94 male dataset (see Figs. 10 and 11), LLEMP-TDR performs well on a subset of the dataset in each category, which is significantly better than MMF and MR-NTD. LLEMP-TDR, GNMF, NTD and GLTD all maintain high clustering accuracy and clustering NMI. Among them, LLEMP-TDR achieves the best results. Moreover, it can be seen from the clustering graph shown in the figure that the performance of LLEMP-TDR is relatively stable, which indicates that the algorithm has strong robustness to the dataset.

Fig. 10

Clustering Accuracy on subsets of Faces 94 male.

Fig. 11

Clustering NMI on subsets of Faces 94 male.

For the USPS dataset (see Figs. 12 and 13), the clustering results of NTD, MR-NTD and LRR-HTD on different types of sub-datasets show large fluctuations, so their accuracy is usually not high. And they are lower than LLEMP-TDR in clustering accuracy and clustering NMI. The clustering performance of LLEMP-TDR decreases as the increase of the number of classes. The performance of clustering accuracy is slightly lower than that of GLTD, but the performance of clustering NMI is better than GLTD. This may be due to the similar geometric structure of different classes of datasets, which makes the LLEMP-TDR cause similar errors in the learning process.

Fig. 12

Clustering Accuracy on subsets of USPS.

Fig. 13

Clustering NMI on subsets of USPS.

Tables 4 and 5 summarize the average clustering results of the algorithm on different subsets of the same datasets, and the highest precision value of the average clustering result of each dataset is indicated in red bold. As can be seen from the table, LLEMP-TDR performs better than other algorithms in most cases.

Table 4

Average clustering accuracy on subsets of each dataset

Methods	ETH80	ORL	Olivetti	Faces94 male	USPS
LLEMP-TDR	bf81.02%	55.84%	bf56.12%	bf86.82%	bf68.55%
GNMF	52.09%	49.04%	45.78%	82.41%	64.08%
MMF	74.9%	bf75.67%	50.80%	69.74%	68.39%
LRR-HTD	78.48%	45.56%	50.86%	84.71%	53.22%
MR-NTD	71.63%	32.34%	31.02%	47.21%	43.39%
NTD	47.75%	32.34%	44.39%	80.5%	47.52%

Table 5

Average clustering NMI on subsets of each dataset

Methods	ETH80	ORL	Olivetti	Faces94 male	USPS
LLEMP-TDR	bf76.82%	bf69.32%	bf69.15%	bf94.6%	55.05%
GNMF	33.08%	49.04%	60.11%	93.64%	58.96%
MMF	74.58%	58.29%	64.61%	87.76%	bf66.70%
LRR-HTD	75.82%	59.50%	55.42%	93.71%	35.84%
MR-NTD	70.66%	45.98%	44.45%	58.47%	29.11%
NTD	47.75%	42.92%	60.25%	86.19%	27.35%

5.4 The result of classfication experiments

In this paper, we use the classical KNN with K = 8 to test the reduced-dimensional data, and use cross-validation to randomly select 80% of the dimension-reduced data as the training set, and use the remaining 20% as the test set, in which each dataset is tested 10 times, and the average of the classification accuracy is used as the classification accuracy of the dataset. We compare the experimental results with GNMF, MMF, LRR-HTD, MR-NTD, and NTD. The algorithm parameter settings in the classification experiment are consistent with those described in 5.2. The results of the experiment are shown in the Figs. 14 to 18.

Fig. 14

Classification Results on subsets of Faces94 male.

Fig. 15

Classification Results on subsets of ETH80.

Fig. 16

Classification Results on subsets of ORL.

Fig. 17

Classification Results on subsets of Olivetti.

Fig. 18

Classification Results on subsets of USPS.

As can be seen from Figs. 14 to 18, LLEMP-TDR shows relatively optimal classification results in the number of categories on the five datasets, especially for the datasets of Faces94 male, ORL and Olivetti, the classification accuracy of LLEMP-TDR is much better than other comparison algorithms, which can reach up to 100%, and the classification accuracy of more than 90% is maintained on the ETH80 and USPS datasets. In general, LLEMP-TDR not only has the best classification effect on the five datasets, but also has strong robustness.

Table 6 summarizes the average of classification results for the algorithm on different subsets of the same datasets, where the highest accuracy of the average of classification results on each dataset is indicated in red bold. According to the table, the LLEMP-TDR proposed in this paper performs better than other algorithms in most cases.

Table 6

Average classification accuracy on subsets of each dataset

Methods	ETH80	ORL	Olivetti	Faces94 male	USPS
LLEMP-TDR	bf88.84%	bf85.89%	bf86.35%	99.94%	bf95.87%
GNMF	74.72%	41.56%	44.90%	bf 100%	92.03%
MMF	88.65%	55.96%	57.31%	99.89%	94.38%
LRR-HTD	90.22%	53.05%	61.69%	98.76%	89.46%
MR-NTD	84.87%	31.31%	29.75%	76.69%	78.27%
NTD	84.89%	28.56%	45.52%	94.83%	79.73%

5.5 Embedding dimensionality analysis

In this section, we analyze the accuracy of LLEMP-TDR in different embedding dimensions n × n of the test dataset. The experimental results are shown in Figs. 19–21, the performance of the USPS dataset in the clustering experiment is poor, and the performance in the classification experiment is better, that may be because the dataset is more suitable for classification tasks. On the Faces 94 male dataset, the performance of LLEMP-TDR in clustering and classification experiments is basically not affected by the embedding dimension and shows strong robustness, which indicates that our algorithm has strong robustness. Overall, LLEMP-TDR is robust to different embedding dimensions.

Fig. 19

Embedding dimensionality analysis results on Clustering Accuracy

Fig. 20

Embedding dimensionality analysis results on Clustering NMI.

Fig. 21

Embedding dimensionality analysis results on Classification Accuracy.

6 Conclusion

This paper proposes a new dimensionality reduction algorithm for tensor data.

A dimensionality reduction algorithm of tensor data based on LLE is proposed. Although LLE is popular in the dimensionality reduction of vector data, it is not suitable for tensor data. In the deduction of tenser LLE, some special skills such as the expression of tensor datasets are exploited.

A dimensionality reduction framework of tensor data based on tensor mode product is proposed. Tensor mode product is an operation in tensor algebra, in which a tensor is expressed as the mode product of another tensor and a number of matrices. Careful selection of the number of rows and columns of matrices can achieve the effect of tensor dimension reduction.

A new algorithm of tensor dimensionality reduction based on LLE and mode product is proposed in our manuscript, in which the matrices in mode product are determined based on LLE criterion. This algorithm is new and has the characteristics of preserving both local and global properties of high-dimensional tensor data while dimensionality reduction.

Footnotes

Acknowledgments

This work was supported in part by Science and Technology Program of Guangzhou, China under Grant 68000-42050001 and National Natural Science Foundation of China under Grant 61773022.

References

Cheng

Y.-C.W.

and Poor

H.V.

, Probabilistic Tensor Canonical PolyadicDecomposition With Orthogonal Factors, IEEE Trans SignalProcessing 65(3) (2017), 663–676.

Zhang

, Li

, Jing

, Liu

and Su

, Low-Rank RegularizedHeterogeneous Tensor Decomposition for Subspace Clustering, IEEE Signal Process Lett 25(3) (2018), 333–337.

Jiang

, He

, Yu

, Shao

and Peng

, Perceptual stereoscopicimage quality assessment method with tensor decomposition andmanifold learning, IET Image Process 12(5) (2018), 810–818.

H.-C.

, Liu

, Feng

X.-R.

and Zhang

S.-Q.

, Sparsity-ConstrainedCoupled Nonnegative Matrix–Tensor Factorization forHyperspectral Unmixing, in, IEEE Journal of Selected Topics inApplied Earth Observations and Remote Sensing 13 (2020), 5061–5073.

Xiong

, Qian

, Zhou

and Tang

Y.Y.

, Hyperspectral Unmixing viaTotal Variation Regularized Nonnegative Tensor Factorization, in, IEEE Transactions on Geoscience and Remote Sensing 57(4) (2019), 2341–2357.

Liavas

A.P.

, Kostoulas

, Lourakis

, Huang

and Sidiropoulos

N.D.

, Nesterov-Based Alternating Optimization forNonnegative Tensor Factorization: Algorithm and ParallelImplementation, in, IEEE Transactions on Signal Processing 66(4) (2018), 944–953.

Fonał

and Zdunek

, Fast Recursive Nonnegative Standard andHierarchical Tucker Decomposition, in, IEEE Signal ProcessingLetters 26(9) (2019), 1265–1269.

Zhou

, Cichocki

, Zhao

, et al., Efficient Nonnegative TuckerDecompositions: Algorithms and Uniqueness, IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety 24(12) (2014), 4990–5003.

Bergqvist

and Larsson

E.G.

, The higher-order singular valuedecomposition: theory and an application, IEEE Signal ProcessMag 27(3) (2010), 151–154.

10.

Sun

, Gao

, Hong

, Mishra

and Yin

, Heterogeneous TensorDecomposition for Clustering via Manifold Optimization, IEEETrans Pattern Anal Mach Intell 38(3) (2016), 476–489.

11.

, Ng

M.K.

, Cong

, Ye

and Wu

, Manifold RegularizationNonnegative Tucker Decomposition for Tensor Data Dimension Reductionand Representation,23, IEEE Trans Neural Netw Learn Syst 28(8) (2017), 1787–1800.

12.

Jiang

, Ding

, Tang

and Luo

, Image Representation andLearning With Graph- Laplacian Tucker Tensor Decomposition, IEEE Trans Cybern 49(4) (2019), 1417–1426.

13.

Wang

and He

, Locality perserving projections algorithm for hyperspectral image dimensionality reduction, 2011 19th International Conference on Geoinformatics Shanghai, (2011)

14.

Wang

, He

, Bu

, Chen

and Guan

, Imagerepresentation using Laplacian regularized nonnegative tensorfactorization, Pattern Recognit 44(10–11) (2011), 2516–2526.

15.

Flex

, Halperin

and Thomas

J.-C.

, Available online 2 September (2007)

16.

Roweis

S.T.

and Saul

L.K.

, Nonlinear Dimensionality Reduction byLocally Linear Embedding, Science 290 (2000), 2323–2326.

17.

Jain

, Verma

and Kumar

, Patch-Based LLE With SelectiveNeighborhood for Node Localization, in, IEEE Sensors Journal 18(9) (2018), 3891–3899.

18.

Zhang

, Fu

, Wang

and Feng

, Fault Detection Based onModified Kernel Semi-Supervised Locally Linear Embedding, in, IEEE Access 6 (2018), 479–48.

19.

Yao

, Liu

, Jiang

, Han

and Han

, LLE Score: A NewFilter-Based Unsupervised Feature Selection Method Based onNonlinear Manifold Embedding and Its Application to ImageRecognition, in, IEEE Transactions on Image Processing 26(11) (2017), 5257–5269.

20.

Prabhakar

S.K.

and Rajaguru

, Factor Analysis, Hessian Local Linear Embedding and Isomap for Epilepsy, 2016 Electrical Engineering Conference (EECon), University of Moratuwa, Sri Lanka, (2016), 1–6.

21.

Jain

, Verma

and Kumar

, Adaptive Locally Linear Embeddingfor Node Localization in Sensor Networks, in, IEEE SensorsJournal 17(9) (2017), 2949–2956.

22.

Rajaguru

and Kumar Prabhakar

, Performance Analysis of Local Linear Embedding (LLE) and Hessian LLE with Hybrid ABC-PSO for Epilepsy Classification from EEG signals, 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, (2018), 1084–1088.

23.

Cai

and Han

, Graph Regularized Nonnegative MatrixFactorization for Data Representation, IEEE Trans Pattern AnalMach Intell 33(8) (2011), 1548–1560.

24.

Zhang

and Zhao

, Low-Rank Matrix Approximation with ManifoldRegularization, IEEE Trans Pattern Anal Mach Intell 35(7) (2013), 1717–1729.

25.

Zhang

and Zhao

, Low-Rank Matrix Approximation with ManifoldRegularization, IEEE Trans Pattern Anal Mach Intell 35(7) (2013), 1717–1729.

26.

, Liu

and Gong

, Document Clustering Based on Non-Negative Matrix Factorization, Proc. Int’l Conf. Research and Development in Information Retrieval (2003), 267–273.

27.

Yao

, Liu

, Jiang

, Han

and Han

28.

Zhou

, Cichocki

, Zhao

, et al., Efficient Nonnegative TuckerDecompositions: Algorithms and Uniqueness, IEEE Transactions onImage Processing A Publication of the IEEE Signal ProcessingSociety 24(12) (2014), 4990–5003.

29.

[Online]. http://cmp.felk.cvut.cz/spacelib/faces/faces94.html.