Abstract
To solve the shortcomings of local linear embedding (LLE), such as sensitive to noise and poor generalization ability for new samples, an improved weighted local linear embedding algorithm based on Laplacian eigenmaps (IWLLE-LE) is proposed in this paper. In the proposed algorithm, Laplacian eigenmaps are used to reconstruct the objective function of dimensionality reduction. The weights of it are introduced by combining the geodesic distance with Euclidean distance, which can effectively represent the manifold structure of nonlinear data. Compared the existing LLE algorithm, the proposed one better maintains the original manifold structure of the data. The merit of the proposal is enhanced by the theoretical analysis and numerical experiments, where the classification recognition rate is 2%–8% higher than LLE.
Introduction
Many problems in pattern recognition begin with the pre-processing of raw dimensional signals, such as face images, speech spectrograms, medical image signals for medical diagnose [1, 2, 3]. For convenience of subsequent operations such as classification, regression, image processing and error detection, the reprocessing should extract and highlight the inherent properties hidden in the high-dimensional observations and represent the intrinsic structures in a more compact and efficient way. However, the representations need be learned or discovered automatically in the case that no prior knowledge about the data is known. Dimension reduction is an unsupervised learning method, which can achieve effective dimension reduction. Dimensionality reduction methods are mainly divided into two types, linear dimensionality reduction and nonlinear dimensionality reduction. Linear dimensionality reduction includes principal component analysis (PCA) [4] and multidimensional scaling (MDS) [5].
However, with the improvement of data acquisition technology, the data that needs to be processed is mostly non-linear in reality. As a result, nonlinear dimensionality reduction methods such as kernel-PCA (KPCA) [6], generalized discriminant analysis (GDA), and kernel direct discriminant analysis (KDDA) [7] were proposed to solve the problem of nonlinearity in data distribution. However, these methods may have trouble when very large number of observations are needed to be processed. All of these kernel-based algorithms have some further problems of large training sample, overfitting, and finding suitable kernel functions for each specific data set.
Another important nonlinear dimensionality reduction is mainly represented by manifold learning. It assumes that the data is located at or close to one or more smooth manifolds, and the dimensionality reduction is performed by maintaining the geometric topology implied by the data. Classical manifold learning has locally linear embedding (LLE) [8], Laplacian eigenmaps (LE) [9, 10], Isometric mapping (ISOMAP) [11] and Local tangent space alignment (LTSA) [12].
As a classic nonlinear learning algorithm, the LLE algorithm is intuitive, easy to implement, and globally optimal without the need for iteration. Unlike clustering methods for local dimension reduction, LLE maps its inputs to a low dimensional space, and its optimizations do not involve local minima. By keeping the local symmetries of local linear relation, LLE is able to learn the global structure of nonlinear manifolds.
However, LLE also has many problems. The algorithm is sensitive to noise and has poor generalization ability for new samples. It does not make good use of category information at the same time. In recent years, researchers have proposed a number of improved algorithms. Zou et al. have proposed an improved LLE algorithm [13] using geodesic distance instead of Euclidean distance, which can better describe the flow between neighbours. Donoho et al. proposed a Hessian local linear embedding [14], which replaced the Laplacian operator with a Hessian operator, and improved the ability to process data with unevenness. Kouropteva et al. proposed a supervised local linear embedding algorithm (SLLE) using class labels [15]. By adding weights, they considered the influence of category information on the neighbour distance, where it made the same-class samples closer in the low-dimensional space and non-similar samples farther apart.
LE is also a nonlinear dimensionality reduction algorithm [16, 17, 18] which uses the concept of Laplacian Spectrum to construct the local nearest neighbour relationship of sample points, and map local domain relations into low-dimensional space. By keeping the local distance relation in the low-dimensional space, LE can learn the whole structure of the non-linear manifold. In [19], a Laplacian eigenmaps of adaptive distance weights was proposed, which was better than the original Laplacian eigenmaps for the distance of the manifold structure.
Though there are many improved LLE at present, it still uses K-nearest neighbour method to select points instead of choosing the nearest neighbour points. Hence the shortcomings such as sensitive to noise and sparse sampling are not fully overcame. It can not achieve good dimension reduction effect in these complex data environments. How to solve this shortcoming is a challenge at present. LE is insensitive to noise and has a lower time complexity for obtaining the distance weight by the thermonuclear function. In this paper, an improved weighted local linear embedding algorithm based on Laplacian eigenmaps (IWLLE-LE) is proposed with the characteristics of Laplacian map reconstruction. Furthermore, the geodesic distance is introduced in the IWLLE-LE, which can effectively describe the distance of the sample neighbours on the manifold.
Main related technology introduction
Locally linear embedding
LLE algorithm consists of three steps: firstly, the K-nearest neighbour method is used to get the nearest neighbour points of each sample. Then the weight coefficients of linear representation of each sample point in the original space are solved. Finally, the image of the sample points in the low-dimensional space is obtained by using the weight coefficients of the sample points in the original space. They are described as follows.
Considering the data set
Assuming that the projection of D-dimension sample set
where
where
Given the data set
Let
Utilizing the Lagrangian multiplier method, the optimization problem Eq. (2.2) can be translated into the following generalized eigenvalue problem
The feature vector corresponding to the second to (
In this section, LLE algorithm is improved from the following aspects: (1) Combining with the LE algorithm, we construct a new optimization problem in low-dimensional mapping, which can improve the robustness of the algorithm when the amount of data is insufficient or noisy. (2) To overcome the difficulties of LLE parameter selection, we introduce a new method of weight construction by combining geodesic distance with Euclidean distance. It can reduce the error caused by sparsity and large curvature problems.
Selection of adjacent points for manifold learning. (a) Euclidean distance. (b) Geodesic distance.
In a low-dimensional objective function of LLE, two weights are used. However, only the distance factor is considered and the structural factor is ignored in the two weights. In addition, only the thermonuclear weight is used in the second weight. We take into account both the distance factor and the structural factor in the second weight, and define the weight
In Eq. (7),
Processing results of sparse Swiss Roll data by different algorithms. (a) Sparse Swiss Roll, (b) IWLE-LLE, (c) LLE, (d) LE.
For the distance weight
Processing results of noisy Swiss Roll data by different algorithms. (a) Swiss Roll data with Gauss noise, (b) IWLE-LLE, (c) LLE, (d) LE.
Mapping high-dimensional data to low-dimensional space requires not only maintaining a local linear structure, but also introducing a local neighbor relationship. By adding the weight of Eq. (7) above to increase the robustness of the model, we construct the following optimization problem:
where
where
Following the Lagrange multiplier method, Eq. (3.2) can be transformed into a generalized eigenvalue problem:
The feature vector corresponding to the second to the
To evaluate the performance of dimensionality reduction of the IWLLE-LE, several experiments are made to compare IWLLE-LE to two manifold learning algorithms LLE and LE on manifold datasets. In the first experiment, the artificial dataset Swiss roll of manifold learning is used to verify the performance of IWLLE-LE for sparse samples and anti-noise points. In the second experiment, five real-world data sets including two face datasets, Wine dataset, heart dataset and Iris dataset [20], are used to demonstrate the performance of dimensionality reduction and classification about the three algorithms. In the experiments, the geodesic distance is calculated by Dijkstra method.
Artificial data experiment
In the first experiment, the performance of sparse samples is tested by 400 data from the Swiss roll dataset, and 800 noise-added data are used to test the anti-noise performance of IWLLE-LE.
Figure 2 shows the results that 400 sparse Swiss roll data are processed by the three different algorithms. The IWLLE-LE algorithm is used to reduce the dimensionality of the sparse Swiss roll data, and it is well developed in the two-dimensional space. Although the uppermost yellow sample points and the blue-green sample points have partial overlap, there is still a good data neighbour relationship. The colour gradation is more gradual, and the sample points of each colour are more evenly distributed in the low-dimensional space. The original characteristics of the data and the relationship between the neighbours are well maintained. The colour mixing phenomenon doesn’t occur, and the structural information of the data is well preserved. However, the LLE and LE algorithms don’t maintain well relationship between the neighbours no matter how we adjust the parameter of neighbour points
Figure 3 shows results that 800 data with Gaussian noise are processed by three algorithms. The Fig. 3b shows that after dimensionality reduction, the data points of IWLLE-LE are evenly distributed, the colour of the sample points is gradually expanded, only the yellow part overlaps slightly, and the structure information of the original data remains intact. The Gaussian noise makes the neighbour relationship between data more confusing, therefore, the dimensionality reduction results of LLE and LE algorithms are still not ideal, but the IWLLE-LE algorithm is minimally affected by Gaussian noise and still has good results and strong robustness.
Separability results on the heart dataset.
Separability results on the wine dataset.
ORL face image example.
Yale face image example.
In this part, there are two different methods to measure the performance of dimensionality reduction. One is to use the separability criterion to measure the performance of dimensionality reduction and the other is to use support vector machine to classify dimensionality reduction data.
First, we define the following separability criterion:
where
It can be seen from the Figs 4 and 5 that the separability of IWLLE-LE algorithm is almost higher than LLE, HLLE regardless of the selection of the number of nearest neighbors
The second, we use support vector machine to classify samples after dimensionality reduction. Two face databases and two classification databases are used to verify the performance of the proposed method, which are Olivetti Research Laboratory Database (ORL), Yale Face Database (Yale), wine dataset and Iris dataset. As shown in Table 1, the information description of each dataset is given.
Information description of relevant datasets
ORL database contains 400 images of 40 persons (10 images per person). Facial expressions and facial details are different variations among images. Yale database contains 165 images of 15 persons (11 images per person). Some samples of ORL and Yale are listed in Figs 6 and 7. Each database is divided into two parts as training sets and testing sets. Although testing sets contain all persons, the sample images between the training set and testing set don’t overlap. For the ORL database testing, the first 50% of samples per person are selected as training samples and the rest of samples are used for test. For the Yale face images, the first six samples per person are used for training and the remaining five samples per person are used for testing.
Wine dataset contains 178 samples of three types and Iris dataset contains 150 samples of three types. Every sample from Wine and Iris is 13-dimensional and 4-dimensional, respectively. Hundred samples of both wine and Iris are randomly selected as training samples and the rest of samples are used for test.
Real data dimension reduction recognition rate
Table 2 shows the best results and the optimal dimensionality which are obtained by LLE, HLLE and IWLLE-LE. Compared with the other algorithms, IWLLE-LE algorithm has much highest classification accuracy on the four data sets. It can be seen that our IWLLE-LE algorithm is average 4%–6% higher than LLE and HLLE. IWLLE-LE also has certain advantages in the wine dataset and the Iris dataset.
The local linear embedding (LLE) algorithm is sensitive to the number of nearest neighbors
Future studies should include considering the sample category information in designing classifier after dimensionality reduction and research on dimension reduction of real-time data.
Footnotes
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants (61472307, 51875457), the Key Research Project of Shaanxi Province (2018GY-018, 2019GY-061), the International S&T Cooperation Program of Shaanxi Province (2019KW-056), and the Xi’an Science and Technology Plan Project (2020KJRC0109).
