Local quasi-linear embedding based on kronecker product expansion of vectors

Abstract

Locally Linear Embedding (LLE) is honored as the first algorithm of manifold learning. Generally speaking, the relation between a data and its nearest neighbors is nonlinear and LLE only extracts its linear part. Therefore, local nonlinear embedding is an important direction of improvement to LLE. However, any attempt in this direction may lead to a significant increase in computational complexity. In this paper, a novel algorithm called local quasi-linear embedding (LQLE) is proposed. In our LQLE, each high-dimensional data vector is first expanded by using Kronecker product. The expanded vector contains not only the components of the original vector, but also the polynomials of its components. Then, each expanded vector of high dimensional data is linearly approximated with the expanded vectors of its nearest neighbors. In this way, the proposed LQLE achieves a certain degree of local nonlinearity and learns the data dimensionality reduction results under the principle of keeping local nonlinearity unchanged. More importantly, LQLE does not increase computation complexity by only replacing the data vectors with their Kronecker product expansions in the original LLE program. Experimental results between our proposed methods and four comparison algorithms on various datasets demonstrate the well performance of the proposed methods.

Keywords

Dimensionality reduction locally linear embedding local quasi-linear

1 Introduction

Machine learning is a hot topic in artificial intelligence research, and also has been increasingly and successfully developed in various engineering problems, such as materials engineering [1], satellite control system [36], supply chain networks [3 , 8], Bioengineering [5], social engineering [6], energy vehicle [4], cloud model [2] and so on. Researches on machine learning [9, 37] usually face with a huge amount of high-dimensional data to be processed. Although the high-dimensional data could be contain plenty data information, it also includes a large amount of invalid or disturbing information, even causing the curse of dimensionality. Therefore, data dimensionality reduction is getting more and more attention. The task of data dimensionality reduction is to effectively transform the given high-dimensional data set into a meaningful low-dimensional data set. Roughly speaking, the algorithms of dimensionality reduction can be divided into two kinds: linear and nonlinear. Most of traditional algorithms of dimensionality reduction are linear, such as Principal Component Analysis (PCA) [10], Linear Discriminant Analysis (LDA) [11 –13] and so on, while the emerging manifold learning algorithms are nonlinear, such as LLE [14], LE [15], HLLE [16], LTSA [17], ISOMAP [18] and so on.

LLE (acronym for Local Linear Embedding), published in Science in 2000, is honored as the first algorithm for manifold learning. LLE claims to be a locally linear but globally nonlinear algorithm, in which each high-dimensional data is linearly approximated with its nearest neighbors. The approximation coefficients are called local learning patterns. LLE keeps the local linear patterns unchanged during dimensionality reduction. Since the core of local feature preservation is intuitive and effective, LLE is developed in many applications and methods [1 –35]. However, with the development of manifold learning, the defects of LLE have been exposed. For example, the original LLE is strict for local selection (neighborhood selection). If the neighborhood is too small, the local geometry structure of manifold cannot be well preserved. If the neighborhood is too large, the local reconstruction error will be too large, causing the algorithm fail. In order to solve this problem, a series of algorithms are proposed, [20 –23]. The learning of local linear patterns of LLE is actually a local reconstruction process. The local reconstruction process may cause a relatively large error when the local curvature of the original data is too large. And when the center data of neighborhood is far away from the plane composed of neighbors, the local geometric information also will be lost. Therefore, the improvements of the local reconstruction process focus on finding more reasonable methods to reconstruction weights (approximation coefficients) [26]. LLE linearly reconstructs a high-dimensional data by utilizing its neighborhood data and corresponding reconstruction weights for learning the local features of the high-dimensional data set. It is the first and critical step in LLE. But actually, most of the linear reconstruction is difficult to completely. In other words, the original high-dimensional cannot be linearly reconstructed by its neighbors in some cases. To solve this problem, we propose a local quasi-linear embedding method.

In theory, local linear patterns are just an approximation to local nonlinear patterns and therefore local nonlinear embedding is an intuitive improvement. Inspired by it, we propose a novel nonlinear dimensionality reduction algorithm, named local quasi-linear embedding (LQLE). LQLE utilizes Kronnecker product to nonlinearly expand each high-dimensional data vector, and learns the local geometry structure of data by using the merits of LLE. The novelty of our method can be summarized as follows:1) the proposed method can learn an effective low-dimensional embedded representation for high-dimensional data; 2) the proposed method achieves some form of local nonlinear patterns data and keeps the local nonlinear patterns unchanged during the process of dimensionality reduction; 3) the proposed method does not increase computation complexity by only replacing the data vectors with their Kronecker product expansions in the original LLE program; 4) the proposed method can be easily transplanted to the application fields of LLE.

The remaining part of the paper is organized as follows. In the Section 2, some related works are introduced. Our mainly work are proposed in Section 3, including the extended vectors of data vectors, the local quasi-linearity for high-dimensional data, and the novel LQLE. Section 4 demonstrates the experimental results and analysis. Finally, Section 5 gives the summary of our work.

2 Related work

LLE has been concerned by researchers since it was proposed. It has a high value in research and application. Since it has been proposed, the improvement algorithms on it have emerged in an endless stream. Besides, researches have made detailed overviews of it [19].

2.1 Local linear embedding

The main idea of LLE is to approximate the data point with linear weight of its neighbors and preserve the linear weight in low-dimensional space. Let X = {x₁, ⋯ , x_N} ⊆ R^D be the N points high-dimensional data set of D dimensions. The purpose of LLE is to find the low-dimensional embedding of X in the embedding space R^d, (d ⪡ D). We denote the N points low-dimensional embedding by Y = {y₁, ⋯ , y_N} ⊆ R^d.

LLE firstly find the k-nearest neighbors for each data point x_i, i = 1, ⋯ , N. The neighbor set is defined as N_i = {x_i1, ⋯ , x_ik}. For each sample x_i, the linear weight {w_ij, j = 1, ⋯ , k} can be computed by solving the following problem:

$\begin{matrix} \min_{W} \sum_{i = 1}^{N} | | x_{i} - \sum_{j = 1}^{k} w_{ij} x_{j} | |_{F}^{2} \\ s . t . \sum_{j = 1}^{k} w_{ij} = 1 \end{matrix}$ (1)

After the linear weight W = {w_ij, j = 1, ⋯ , k, i = 1, ⋯ , N} is learned, the low-dimensional embedding can be found, which best preserves the geometric properties of the original space. That is to minimize:

$\begin{matrix} \min_{W} \sum_{i = 1}^{N} | | y_{i} - \sum_{j = 1}^{k} w_{ij} y_{j} | |_{F}^{2} \\ s . t . \sum_{i = 1}^{N} y_{i} = 0, \frac{1}{N} \sum_{i = 1}^{N} y_{i} y_{i}^{T} = I \end{matrix}$ (2)

2.2 Improvements of LLE

LLE reduces the dimension of data by preserving the relationship between data point and neighborhood locally, so the selection of neighborhood plays an important role in the performance of the algorithm. The Euclidean distance is used directly in the original LLE to find the neighbors. Some later works attempt to introduce other method to measure the distance for selecting neighbors.

Kernel Local Linear Embedding (KLLE) [20] proposed by Dennis Deste finds the k-nearest neighbor point with kernel distance (linear kernel) instead of Euclidean distance in original LLE algorithm. The KLLE proved the feasibility of substituting Euclidean distance with kernel distance. Varini believed that selecting neighbor points with Euclidean distance would make two points with short Euclidean distance but long geodesic distance select as neighborhood points. To avoid this situation, LLE with geodesic distance (ISO-LLE) [21] was proposed, which searches for the neighbor points with geodesic distance. Experiments show that ISO-LLE can solve the above situation effectively. Weighted Local Linear Embedding (WLLE) [22] adjusts the LLE algorithm based on weighted distance [23], which gives the original data with high density a smaller weight scaling while those with low density a larger weight scaling. WLLE works well in preventing the selected neighbor points from being over congregating in uneven distributed locality. Rank-Order Distance [24] was proposed to solve the problem that different sparsity in different clusters leads to inaccurate image classification in face recognition. Rank-Order Distance was applied in the selection of neighbors in LLE. Similar to the before algorithm, LLE based on Rank-Order Distance [25] was proposed, and the application of Rank- Order Distance also solve the problem of uneven density to some extend.

Some work focus on use different reconstruction weights to improve LLE. As the weight matrix is set, one direction of improvement is to find more reasonable methods to reconstruction weights. Nonlinear Embedding Preserving Multiple Local-Linearities (NEML) [26] proposed a nonlinear pattern with multiple weights and proved that multiple sets of local construction weights that are approximately optimal and can be used to improve the stability of LLE which can improve the performance of it.

2.3 Out-of-sample extension of LLE

The Original algorithm of LLE didn’t provide an effective solution to out-of-sample problem. To solve this problem, NPE [27], NPP [28] and ONPP [29] are proposed. These algorithms tried to match the result of LLE with linear projection from high-dimensional space to low-dimensional space. Given the high-dimensional data set X ⊆ R^D, these algorithm attempt to find out its low-dimensional embedding Y ⊆ R^d and the explicit projection from R^D to R^d at the same time. That is to find a projection matrix V ∈ R^D×d that satisfy Y = V^TX. To solve this problem, the optimization problem 2 should be transformed: $\begin{matrix} \min_{W} \sum_{i = 1}^{N} | | V^{T} x_{i} - \sum_{j = 1}^{k} w_{ij} V^{T} x_{j} | |_{F}^{2} \\ {\begin{matrix} s . t . \sum_{i = 1}^{N} V^{T} x_{i} = 0, \frac{1}{N} \sum_{i = 1}^{N} V^{T} x_{i} x_{i}^{T} V = I \\ (NPP, NPE) \\ s . t . \sum_{i = 1}^{N} V^{T} x_{i} = 0, \frac{1}{N} \sum_{i = 1}^{N} V^{T} V = I \\ (ONPP) \end{matrix} \end{matrix}$ (3)

After training, the out-of-sample points x_o can be directly projected into low-dimensional space by y_o = V^Tx_o.

Based on these work, Hong Qiao proposed a non-linear projection method NPPE and its simplified algorithm SNPPE [23], which expanded the projection coefficients to vector polynomials. The coefficients of y_i are not simply linear combination of coefficients of x_i but a polynomial instead. The form of projection y_i = V^Tx_i can be rewritten as $y_{i} = V_{p}^{T} x_{ip}$ , x_ip is defined as: $x_{i p} = {\begin{array}{l} x_{i} \\ x_{i} \otimes x_{i} \\ ⋮ \\ \underset{p}{\underset{︸}{x_{i} \otimes \dots \otimes x_{i}}} \end{array}$ where ⊗ stands for the Kronecker product defined on matrices. And V_p means V being extended into corresponding row numbers. The idea of NPPE is similar to NPE or NPP expect x_ip should be computed from x_i before embedding or projection.

In recent years, besides the improvements to LLE algorithm, the development of LLE mostly lies in the application of the algorithm [24, 31]. Proposed a sparse linear embedding based on LLE and ONPP. Besides manifold learning itself, another important branch of LLE application is image processing, such as feature extraction [25, 32]. Moreover, the value of LLE algorithm is not limited in algorithm research, but also showed in various engineering problems, which had proved its practicability [1 , 33– 35].

3 Local quasi-linear embedding (LQLE)

Given the high-dimensional dataset X = [x₁ ⋯ x_N] ∈ R^D×N, the task of dimensionality reduction is to get a low-dimensional dataset Y = [y₁ ⋯ y_N] ∈ R^d×N, where y_n is the low-dimensional representation of x_N and n = 1, ⋯ , N, (d ⪡ D). In this paper, we proposed a local quasi-linear embedding (preserving) algorithm for data dimensionality reduction. Note: In the section, the dataset is represented by a matrix, and each column of it represents a data vector.

3.1 The extended vectors of data vectors

Let x = [x⁽¹⁾ ⋯ x^(D)] ^T ∈ R^D denotes a data vector, in which x⁽ⁱ⁾ represents the ith component of x, i = 1, ⋯ , D, then $\underset{p}{\underset{︸}{x_{i} ⊙ \dots ⊙ x_{i}}} \in R^{D_{p}}$ represent the p-order simplified Kronecker product of x, where $D_{p} = D + \frac{D!}{p! (D - p)!}$ , p ≥ 2, By ’simplified’, it means to remove the same component. For example,

$\begin{matrix} x \otimes x = [\begin{matrix} x^{(1)} [\begin{matrix} x^{(1)} \\ x^{(2)} \end{matrix}] \\ x^{(2)} [\begin{matrix} x^{(1)} \\ x^{(2)} \end{matrix}] \end{matrix}] = [\begin{matrix} x^{(1)} x^{(1)} \\ x^{(1)} x^{(2)} \\ x^{(2)} x^{(1)} \\ x^{(2)} x^{(2)} \end{matrix}] \\ \Rightarrow_{simplified} \\ x ⊙ x = [\begin{matrix} x^{(1)} x^{(1)} \\ x^{(2)} x^{(2)} \\ x^{(1)} x^{(1)} \end{matrix}] \end{matrix}$ (4)

The p-order extended vector x^p of x is then defined as

$\begin{matrix} x = [\begin{matrix} x^{(1)} \\ ⋮ \\ x^{(D)} \end{matrix}] \in R^{D} \Rightarrow x^{p} = [\begin{matrix} x \\ x ⊙ x \\ ⋮ \\ \underset{p}{\underset{︸}{x_{i} ⊙ \dots ⊙ x_{i}}} \end{matrix}] \in R^{D_{p}} \end{matrix}$ (5) where $D_{p} = D + \sum_{k = 2}^{p} \frac{D!}{p! (D - k)!}$ , p ≥ 2. Furthermore, we set x⁽¹⁾ = x, then D_p = D.

3.2 The local quasi-linearity for high-dimensional data

For all x_n, let x_n1, ⋯ , x_nK be its K nearest neighbors in the given data set, and then $x_{n}^{p}$ and $x_{n 1}^{p}, \dots, x_{nK}^{p}$ represent their extended vectors respectively. In LQLE, $x_{n}^{p}$ is linearly approximated by

$\begin{matrix} \min_{w_{n}} | | x_{n}^{p} - \sum_{k = 1}^{K} w_{nk} x_{nk}^{p} | |_{F}^{2} \\ s . t . \sum_{k = 1}^{K} w_{nk} = 1 \end{matrix}$ (6) where w_n = [w_n1 ⋯ w_nK] ^T. It is clear that w_n reflects some nonlinear relation between x_n and its nearest neighbors {x_n1, ⋯ , x_nK}.

The derivation of w_n is as follows:

$\begin{matrix} | | x_{n}^{p} - \sum_{k = 1}^{K} w_{nk} x_{nk}^{p} | |_{F}^{2} = | | - x_{n}^{p} + \sum_{k = 1}^{K} w_{nk} x_{nk}^{p} | |_{F}^{2} \\ = | | [\begin{matrix} x_{n 1}^{p} - x_{n}^{p} & \dots & x_{nK}^{p} - x_{n}^{p} \end{matrix}] [\begin{matrix} w_{n 1} \\ ⋮ \\ w_{nK} \end{matrix}] | |_{F}^{2} \\ = | | {\bar{X}}_{n}^{p} w_{n} | |_{F}^{2} = w_{n}^{T} ({\bar{X}}_{n}^{p})^{T} {\bar{X}}_{n}^{p} w_{n} \end{matrix}$ (7) where ${\bar{X}}_{n}^{p} = [x_{n 1}^{p} - x_{n}^{p} \dots x_{nK}^{p} - x_{n}^{p}] \in R^{D_{p} \times K}$ . Then Eq.(6) can be solved by minimizing

$φ (w_{n}) = w_{n}^{T} ({\bar{X}}_{n}^{p})^{T} {\bar{X}}_{n}^{p} w_{n} + λ (w_{n}^{T} Γ_{K} - 1)$ (8) By using the Lagrange method, i.e., taking the partial derivative of the above function (8) with respect to w_n and λ respectively, we get

$w_{n} = \frac{{(({\bar{X}}_{n}^{p})^{T} {\bar{X}}_{n}^{p})}^{- 1} Γ_{K}}{Γ_{K} {(({\bar{X}}_{n}^{p})^{T} {\bar{X}}_{n}^{p})}^{- 1} Γ_{K}}$ (9) where Γ_K = [1 ⋯ 1] ^T ∈ R^K. It should be noted that ${\bar{X}}_{n}^{p}$ seems to be a huge matrix, while $({\bar{X}}_{n}^{p})^{T} {\bar{X}}_{n}^{p} \in R^{K \times K}$ is a smaller matrix. And ${\bar{X}}_{n}^{p}$ is generated by known operations with x_n and {x_n1, ⋯ , x_nK} (see Eq.(7) and (5)). Therefore, it is unnecessary to generate ${\bar{X}}_{n}^{p}$ in practical operations. We can computed $({\bar{X}}_{n}^{p})^{T} {\bar{X}}_{n}^{p}$ directly and take inverse of it.

LQLE takes into consideration the fitting relationship between the high order component, so the computed W in LQLE is different from the weights matrix in LLE.

3.3 Data dimensionality reduction based on local quasi-linear representation

Corresponding to the local data x_n and {x_n1, ⋯ , x_nK} in high-dimensional space, the dimension-reduced data in low-dimensional space are y_n and {y_n1, ⋯ , y_nK}. LQLE proposed in this paper preserves the same locally linear pattern w_n between y_n and {y_n1, ⋯ , y_nK}, i.e.,

$\begin{matrix} \min_{y_{n}, y_{n 1}, \dots, y_{nK}} | | y_{n}^{p} - \sum_{k = 1}^{K} w_{nk} y_{nk}^{p} | |_{F}^{2} \\ s . t . \sum_{k = 1}^{K} w_{nk} = 1 \end{matrix}$ (10)

Making further derivation,

$\begin{matrix} | | y_{n}^{p} - \sum_{k = 1}^{K} w_{nk} y_{nk}^{p} | |_{F}^{2} \\ = | | [\begin{matrix} y_{n}^{p} & y_{n 1}^{p} & \dots & y_{nK}^{p} \end{matrix}] [\begin{matrix} 1 \\ - w_{n 1} \\ ⋮ \\ - w_{nK} \end{matrix}] | |_{F}^{2} \\ = | | Y_{n}^{p} [\begin{matrix} 1 \\ - w_{n} \end{matrix}] | |_{F}^{2} = | | Y^{p} S_{n} [\begin{matrix} 1 \\ - w_{n} \end{matrix}] | |_{F}^{2} \\ = | | Y^{p} L_{n} | |_{F}^{2} \\ = tr (Y^{p} L_{n} L_{n}^{T} (Y^{p})^{T}) \end{matrix}$ (11) where $Y_{n}^{p} = [\begin{matrix} y_{n}^{p} & y_{n 1}^{p} & \dots & y_{nK}^{p} \end{matrix}]$ , $Y^{p} = [\begin{matrix} y_{1}^{p} & \dots & y_{N}^{p} \end{matrix}]$ , and $L_{n} = S_{n} [\begin{matrix} 1 \\ - w_{n} \end{matrix}] \in R^{N}$ . Here S_n is selection matrix, which makes $Y_{n}^{p} = Y^{p} S_{n}$ . Eq. (11) only considers the local nonlinear preservation of one sample data. Then we take into account the local nonlinear preservation of all data, and get

$\begin{matrix} \min \sum_{n = 1}^{N} | | y_{n}^{p} - \sum_{k = 1}^{K} w_{nk} y_{nk}^{p} | |_{F}^{2} \\ = \min \sum_{n = 1}^{N} tr (Y^{p} L_{n} L_{n}^{T} (Y^{p})^{T}) \\ = \min tr (Y^{p} (\sum_{n = 1}^{N} L_{n} L_{n}^{T}) (Y^{p})^{T}) \\ = \min tr (Y^{p} {LL}^{T} (Y^{p})^{T}) \\ s . t ., Y^{p} (Y^{p})^{T} = I_{d} \end{matrix}$ (12) Notes that although Y^p is a huge matrix, we only need a low dimension data matrix Y consist of the first few d rows of Y^p. Therefore, we just need to find out the matrix constituted by the d smallest eigenvectors corresponding to the d smallest eigenvalue of LL^T. The learning problem of (12) is a typical generalized Rayleigh quotient problem. Therefore, Eq. (12) can be learned in a closed form, i.e, Y can be easily obtained by the eigenvalue decomposition of LL^T.

4 Experimental results

In this section, we verify the effectiveness of the proposed algorithm by conducting the dimensionality reduction experiments and classification experiments with LLE [14], WLLE [22], and ONPP [29] algorithms on both toy and real-world data sets. Among these algorithms, LLE [14] is the classical local linear embedding algorithm, WLLE [22] is the representative improved LLE algorithm based on neighborhood selection, and ONPP [29] is the typical out-of-sample extension of LLE.

4.1 Datasets description and experimental setting

Toy datasets: two data sets are generated by randomly selecting 2000 data samples from the Swiss Roll and Swiss Hole, respectively.

USPS [38]: a famous handwritten digit dataset (Fig. 1), contains handwritten images of 10 classes corresponding to the digits 0 to 9, with 1100 sample images per class. The size of each sample image is 16×16. We randomly select 200 images from each class to construct the experimental dataset with a total of 2000 images.

Binary Alphadigits [39]: another handwritten dataset (Fig. 2), contains binary 20×16 digits of 0 through 9 and capital A through Z, 39 examples of each class. We only use the capital A through Z, and randomly select 30 images from each class to construct the experimental data set with a total of 780 images.

Fig. 1

Partial data of USPS data set.

Fig. 2

Partial data of Binary Alphadigits data set.

COIL20 [40]: an object data set (Fig. 3), contains 20 objects taken from different angles, and 72 images per object. Each image is uniformly processed as the size of 32×32.

Fig. 3

Partial data of COIL20 data set.

Table 1 lists the parameter settings of the three real-world datasets in the experiments.

Table 1

Parameters settings of data sets

Datasets Parameter	USPS	Binary Alphadigits	COIL20
Number	2000	780	1400
Original Dimensionality	256	320	256
Number of Categories	10	26	20
Number of Train Set	1000	390	700
Number of Testing Set	1000	390	700

4.2 Dimensionality reduction on toy datasets

We first compare the LQLE algorithm with the LLE algorithms on two toy data sets. The number of neighbor is set to be K = 10 and the polynomial order is set to be p = 2. The experiment results are showed in Fig. 4. It can be seen that the LLE algorithm has a poor performance when K = 10. But the proposed LQLE would achieve a better result. It means that our LQLE introduces more local information to data dimensionality reduction.

Fig. 4

The dimensionality reduction results of LLE and LQLE on toy dataset.

4.3 Classification experimental results

In the classification experiments, we adopt the KNN classifier with K = 8 on reduced-dimensional data sets, and use the cross-validation method for computing the classification accuracy. We randomly select 50% of the dimensionality reduced data from each category of the data set as the training set, and the remaining 50% as the test set. Each process repeats 5 times and the average of the classification accuracy of the 5 tests is used as the final. Generally, the number of neighborhood points for each compared algorithm is selected according to the principle that the number of neighborhood points is about 1% of the number of data in the corresponding data set. Dim . denotes the dimension of the dimensionality reduction results. The value of Dim . is setting from 2 to 100 on USPS and Binary Alphadigits data sets, and is setting from 2 to 200 on COIL20 data set. The polynomial order in LQLE is set to be p = 2.

Table 2 lists the average classification results of four comparison algorithms on the USPS data set. In order to show more direct observation, the average classification results of all experiments on USPS data set are also shown in Fig. 5 for visualization. Seen from them, when Dim. is less than 60, ONPP has the worst performances on USPS, which maybe caused by strict global linear constraints. In most cases, WLLE has the best results, our LQLE is second, and the last is LLE. But when Dim. is less than 15, our method works better than WLLE. When the dimension of dimensionality reduction is less than 50, the performance of all the algorithms almost become better as the Dim. increasing. Overall, the performances of our LQLE are better than LLE. It proves that the proposed method is an improvement to LLE. And with increasing of the dimension, the performances of LQLE maintain a certain degree of stability. It means that our algorithm is robust.

Table 2
The average classification results of four compared algorithms on USPS data set. The classification performances of four compared algorithms versus different dimension (Dim .) on USPS data set are shown here. LQLE is the proposed algorithm

Dim. LLE WLLE ONPP LQLE

2 0.601 0.398 0.159 0.579

5 0.743 0.647 0.251 0.705

10 0.73 0.747 0.323 0.77

20 0.697 0.828 0.382 0.816

30 0.729 0.821 0.456 0.797

40 0.734 0.842 0.577 0.807

50 0.745 0.825 0.691 0.786

60 0.756 0.827 0.781 0.781

80 0.701 0.819 0.866 0.778

100 0.626 0.801 0.898 0.779

Average 0.7062 0.7555 0.5384 0.7598

Dim.	LLE	WLLE	ONPP	LQLE
2	0.601	0.398	0.159	0.579
5	0.743	0.647	0.251	0.705
10	0.73	0.747	0.323	0.77
20	0.697	0.828	0.382	0.816
30	0.729	0.821	0.456	0.797
40	0.734	0.842	0.577	0.807
50	0.745	0.825	0.691	0.786
60	0.756	0.827	0.781	0.781
80	0.701	0.819	0.866	0.778
100	0.626	0.801	0.898	0.779
Average	0.7062	0.7555	0.5384	0.7598

Fig. 5

The average classification results of four compared algorithms on USPS data set. LQLE is the proposed algorithm.

Table 3 exhibits the average classification results of the four comparison algorithms on the Binary Alphadigits data set when the dimension of the dimensionality reduction is changing. Fig. 6 shows the classification performances of all compared methods on Binary Alphadigits data set. Obviously, when the dimension of dimensionality reduction is less than 60, WLLE has the worst performances on Binary Alphadigits. It means that on this data set, using weighted distance in WLLE cannot improve the performance of LLE. When Dim. is less than 10, ONPP works worse than LLE and our LQLE. And when Dim. is larger than 40, LQLE works better than LLE. And overall, with increasing of the dimension, LQLE still maintains better performances. And all the average results of the four compared methods reported in the last column of Table 3 further illustrate the robustness of our algorithm.

Table 3

The average classification results of four compared algorithms on Binary Alphadigits data set. The classification performances of four compared algorithms versus different dimension (Dim .) on Binary Alphadigits data set are shown here. LQLE is the proposed algorithm

Dim.	LLE	WLLE	ONPP	LQLE
2	0.78	0.0986	0.3357	0.78
5	0.8086	0.2643	0.73	0.79
10	0.8757	0.5	0.8814	0.8743
20	0.9429	0.6571	0.95	0.93
30	0.9486	0.7386	0.9557	0.9471
40	0.9586	0.7629	0.9557	0.9629
50	0.9586	0.7971	0.9629	0.95
60	0.9543	0.85	0.9657	0.9586
80	0.9471	0.95	0.9686	0.9629
100	0.91	0.9529	0.9714	0.95
Average	0.9084	0.6572	0.8677	0.9106

Fig. 6

The average classification results of four compared algorithms on Binary Alphadigits data set. LQLE is the proposed algorithm.

Table 4 lists the average classification results of the four comparison algorithms on the COIL20 data set when the dimension of the dimensionality reduction is changing. And Fig. 7 shows the classification performances of all compared algorithms on COIL20. From them, we can observed that different from the results on the Binary Alphadigits, when Dim. is less than 70, WLLE almost has the best performances. Follow it, our algorithm also works well on COIL20 data set. But the performances of WLLE are not stable as the Dim. increasing. Overall, the performances of the proposed LQLE are better than LLE in most cases. And LQLE still maintains better performances on COIL20 as the Dim. increasing. Furthermore, the total average results of the four compared methods reported in the last column of Table 4 show that our algorithm has the best result. To sum up, our LQLE algorithm can maintain better performance on three different data sets. It further proves that the proposed algorithm has good robustness.

Fig. 7

The average classification results of four compared algorithms on COIL20 data set. LQLE is the proposed algorithm.

Table 4

The average classification results of four compared algorithms on COIL20 data set. The classification performances of four compared algorithms versus different dimension (Dim .) on COIL20 data set are shown here. LQLE is the proposed algorithm

Dim.	LLE	WLLE	ONPP	LQLE
2	0.3513	0.3487	0.0667	0.3308
5	0.5128	0.5205	0.1077	0.5231
10	0.6179	0.6359	0.1128	0.6
20	0.5897	0.6462	0.2026	0.6462
30	0.5897	0.6641	0.2949	0.6205
40	0.6	0.6615	0.341	0.6231
50	0.5974	0.641	0.3974	0.6231
60	0.6103	0.6385	0.4667	0.6308
80	0.5846	0.6179	0.5564	0.6026
100	0.5846	0.5641	0.6256	0.6128
150	0.6179	0.4487	0.7026	0.5769
200	0.559	0.3872	0.7026	0.5744
Average	0.5679	0.5645	0.3814	0.5804

5 Conclusions

In the paper, we propose a novel nonlinear dimensionality reduction, named local quasi-linear embedding (LQLE). The proposed method nonlinearly expands each data vector by using Kronecker product and then performs local linear embedding for these nonlinearly-expanded data vectors. LQLE explores the local nonlinear structure of data and learns an effective low-dimensional embedded representation for the high-dimensional data. In general, the nonlinear improvement to the local linear patterns of LLE may lead to a significant complexity. Compared with LLE, LQLE hardly increases any computational complexity and inherits the merits of LLE. Therefore, it can be easily transplanted to the application fields of LLE. We also conduct the dimensionality reduction and classification experiments on both toy and real-world data sets to demonstrate the effectiveness of the proposed method.

Footnotes

Acknowledgment

This work is supported in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2019A1515111143 and by the Natural Science Foundation of China under Grant 62006042.

References

Lopez

, Gonzalez

, Aguado

J.V.

, Abisset-Chavanne

, Cueto

, Binetruy

, Chinesta

, A Manifold Learning Approach for Integrated Computational Materials Engineering, Archives of Computational Methods in Engineering 25(1) (2018), 59–68.

Zhang

, Tian

G.D.

, Fathollahi-Fard

A.M.

, Wang

W.J.

, Wu

, Li

Z.W.

, Interval-Valued Intuitionistic Uncertain Linguistic Cloud Petri Net and Its Application to Risk Assessment for Subway Fire Accident, IEEE Transactions on Automation Science and Engineering (2020). DOI: 10.1109/TASE.2020.3014907

Fathollahi-Fardab

A.M.

, Ahmadia

, Mirzapour Ale-Hashemac

S.M.J.

, Sustainable Closed-loop Supply Chain Network for an Integrated Water Supply and Wastewater Collection System under Uncertainty, Journal of Environmental Management 275(111277) (2020).

Yua

H.J.

, Dai

H.L.

, Tian

G.D.

, Wu

B.B.

, Zhang

T.Z.

, Fathollahi-Fard

A.M.

, He

, Tang

, Key technology and application analysis of quick coding for recovery of retired energy vehicle battery, Renewable and Sustainable Energy Reviews 135(110129) (2021).

Fathollahi-Fard

A.M.

, Hajiaghaei-Keshteli

, Tavakkoli-Moghaddam

, Red deer algorithm (RDA): a new nature-inspired meta-heuristic, Soft Computing 24 (2020), 14637–14665.

Fathollahi-Fard

A.M.

, Hajiaghaei-Keshteli

, Tavakkoli-Moghaddam

, The social engineering optimizer (SEO), Engineering Applications of Artificial Intelligence 72 (2018), 267–293.

Moosavi

, Naeni

L.M.

, Fathollahi-Fard

A.M.

, Blockchain in supply chain management: a review, bibliometric, and network analysis, Environmental Science and Pollution Research (2021), 1–15.

Islam

M.R.

, Ali

S.M.

, Fathollahi-Fard

A.M.

, Kabir

, A novel particle swarm optimization-based grey model for the prediction of warehouse performance, Journal of Computational Design and Engineering 8(2) (2021), 705–727.

Izonin

, Kryvinska

, Tkachenko

, Zub

, Vitynskyi

, An Extended-Input GRNN and its Application, Procedia Computer Science 160 (2019), 578–583.

10.

Turk

, Pentland

, Eigenfaces for recognition, J Cognit Neurosci 3(1) (1991), 71–86.

11.

Belhumeur

P.N.

, Hespanha

J.P.

, Kriegman

, Eigenfaces vs. Fisherfaces: Recognition using class specifific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7) (1997), 711–720.

12.

Fraley

, Raftery

A.E.

, Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association 97(458) (2002), 611–631.

13.

Liu

, Lu

, Ma

, Improving kernel Fisher discriminant analysis for face recognition, IEEE Transactions on Circuits and Systems for Video Technology 14(1) (2004), 42–49.

14.

Roweis

S.T.

, Saul

L.K.

, Nonlinear dimensionality reduction by locally linear embedding, Science 290(5500) (2000), 23230–2326.

15.

Belkin

, Niyogi

, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computing 15(6) (2003), 1373–1396.

16.

Donoho

D.L.

, Grimes

, Hessian Eigenmaps: Locally Linear Embedding Techniques for High Dimensional Data, Proceedings of the National Academy of Sciences of the United States of America 100(10) (2003), 5591–5596.

17.

Zhang

, Zha

, Principal manifold and nonlinear dimensionslity reduction via tangent space alignment, SAIM Journal on Scientific Computing 26(1) (2004), 313–338.

18.

Tenenbaum

J.B.

, Silva

V.d.

, et al., A global geometric framework for nonlinear dimensionality reduction, Science 290(5500) (2000), 2319–2323.

19.

Chen

, Ma

Z.M.

, Locally Linear Embedding: A Review, International Journal of Pattern Recognition and Artificial Intelligence 25(7) (2011), 985–1008.

20.

De Coste

, Visualizing Mercer kernel feature spaces via kernelized locallylinear embeddings, Shanghai, China, In Proceedings of the Eighth International Conference on Neural Information Processing (2001), 14–18.

21.

Varini

, Degenhard

, Nattkemper

T.W.

, ISOLLE: LLE with geodesic distance, Neurocomputing 69(13) (2006), 1768–1771.

22.

Pan

, Ge

S.S.

, Al Mamun

, Weighted locally linear embedding for dimension reduction, Pattern Recognition 42(5) (2009).

23.

Zhou

C.Y.

, Chen

Y.Q.

, Improving nearest neighbor classification with cam weighted distance, Pattern Recognition 39(4) (2006), 635–645.

24.

Zhang

, Wen

, Sun

, A rank-order distance based clustering algorithm for face tagging, IEEE Conference on Computer Vision and Pattern Recognition (2011), 481–488.

25.

Sun

X.L.

, Lu

Y.G.

, Locally Linear Embedding based on Rank-order Distance, International Conference on Pattern Recognition and Methods (2016), 162–169.

26.

Wang

, Zhang

Z.Y.

, Nonlinear embedding preserving multiple local-linearities, Pattern Recognition 43(4) (2010), 1257–1268.

27.

, Cai

, Yan

, Zhang

, Neighborhood preserving embedding, Proceedings in International Conference on Computer Vision (ICCV) 2 (2005), 1208–1213.

28.

Pang

, Zhang

, Liu

, Yu

, Li

, Neighborhood preserving projections (NPP): A novel linear dimension reduction method, Proceedings in ICIC (1) (2005), 117–125.

29.

Kokiopoulou

, Saad

, Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique, IEEE Transactions on Pattern Analysis and Machine Intelligence 29(12) (2007), 2143–2156.

30.

Qiao

, Zhang

, et al., An Explicit Nonlinear Mapping for Manifold Learning, IEEE Transactions on Cybernetics 43(1) (2013), 51–63.

31.

Lai

Z.H.

, Wong

W.K.

, Xu

, Yang

, Zhang

, Approximate Orthogonal Sparse Embedding for Dimensionality Reduction, IEEE Transactions on Neural Networks and Learning Systems 27(4) (2015), 723–735.

32.

Wang

M.D.

, Yu

, Niu

L.J.

, Sun

W.D.

, Unsupervised feature extraction for hyperspectral images using combined low rank representation and locally linear embedding, IEEE ICASSP (2017).

33.

, Chen

X.F.

, Zhang

X.L.

, Ding

B.Q.

, Wang

S.B.

, Locally Linear Embedding on Grassmann Manifold for Performance Degradation Assessment of Bearings, IEEE Transactions on Reliability 66(2) (2017), 467–477.

34.

Cheng

Y.H.

, Jiang

, Lu

N.Y.

, Wang

, Xing

, Incremental locally linear embedding- based fault detection for satellite attitude control systems, Journal of The Franklin Institute 353(1) (2016), 17–36.

35.

Liu

Y.H.

, Zhang

Y.S.

, Yu

Z.W.

, Zeng

, Incremental supervised locally linear embedding for machinery fault diagnosis, Engineering Applications of Artificial Intelligence 50 (2016), 60–70.

36.

, Xu

, Chen

Z.G.

, He

, Xie

Y.H.

, Liu

M.M.

, Li

, Han

, Fault Detection Method of Luojia1-01 Satellite Attitude Control System Based on Supervised Local Linear Embedding, IEEE Access 7 (2019), 105489–105502.

37.

Zhang

, Ma

Z.M.

, Gan

, Dimensionality reduction for tensor data based on local decision marginmaximization, IEEE Transactions on Image Processing 30 (2020), 234–248.

38.

https://cs.nyu.edu/roweis/data/.

39.

http://www.cs.toronto.edu/roweis/data.html.

40.

http://dataju.cn/Dataju/web/datasetInstanceDetail/62.