Image recognition method based on supervised multi-manifold learning

Abstract

In image recognition, the within-class matrix in some multi-manifold learning algorithms is singular, which affects the recognition effectiveness. To solve the problem, a supervised multi-manifold learning method is proposed, which extracts multi-manifold features of images by maximizing the between-class Laplacian graph and hides the minimization of the within-class Laplacian graph in the maximization of the between-class Laplacian graph by introducing the class labels. This method provides an explicit mapping between the high dimensional images and the low dimensional features, which can project samples out of the training set into the low dimensional space and also overcomes the singular problem of the within-class matrix. The proposed algorithm is tested on the pavement distress images, ORL and FERET face images. Experiments show that the recognition accuracy is greatly improved, and the dimension of the low dimensional features is determined. And the influence of Euclidean distance and the angle cosine distance on the recognition results is compared by using KNN.

Keywords

Multi-manifold discriminant analysis image recognition Laplacian graph singular matrix

1 Introduction

Nowadays, with the rapid development of multimedia technology and Internet, the number of images is explosively increasing. Though it is convenient for people to use these large number of images, it has been a problem to deal with them quickly. Image recognition, as one of image processing technologies, is very important in computer vision and has been being a hot research topic. Images are typical high-dimensional data, which have hundreds, thousands or even ten thousands of dimensions in the pixel space. It will lead to the dimension curse when we treat these high dimensional data. Reducing dimensions of images is an important method to process images effectively. Manifold learning, as a nonlinear dimensionality reduction technique, can discover the intrinsic structure hidden in the high dimensional data, which may overcome the dimension curse.

At present, there are some representative manifold learning algorithms such as isometric mapping (ISOMAP) [1], local linear embedding (LLE) [2], Laplacian eigenmaps (LE) [3], local tangent space alignment (LTSA) [4], and so on. These methods can effectively map the high dimensional data into the low dimensional space, which have been widely applied in image recognition, image retrieval, text classification, biological information processing and other fields [5 –15]. Because all these methods use batch mode and have no explicit functions between two spaces, it is impossible to project samples out of the training set into the low dimensional space. At the same time, all data are mapped on single manifold in these algorithms. However, data in the real pattern classification usually belongs to multiple sub-manifolds, so that these methods suffer from great limitations in classification. Therefore, researchers have proposed some methods [16 –34] to solve these problems, such as locality preserving projections (LPP) [16], multi-manifold discriminant analysis (MMDA) [22], and so on. Among these methods, LPP can preserve the local structure of data and provides an explicit mapping relationship between two spaces. However, it doesn’t use the class labels, and the matrix is usually singular. When the distances between classes are relatively close, it may cause the sub-manifolds of different classes to intersect together in the low dimensional space. MMDA is a supervised method, which maps samples of different classes into various sub-manifolds by maximizing the between-class Laplacian matrix, and minimizing the within-class Laplacian matrix. Nevertheless, the within-class Laplacian matrix is usually singular.

In image recognition, images from different categories can be correctly separated as long as the between-class distinction is large enough. A classification method based on supervised multi-manifold learning (SMML) is presented, which extracts multi-manifold features through maximizing the between-class Laplacian graph and projects samples from different classes into the respective sub-manifolds. Experimental results on face recognition and pavement crack recognition show that SMML effectively improves the recognition accuracy.

2 Related works

2.1 Locality preserving projections

LPP can better preserve the local structure of data and project data from different classes into multiple sub-manifolds when the distances between classes are relatively far. The basic idea is to map the data set X in the high dimensional space R^D into the low dimensional space R^d to reduce the dimension of data by using a mapping matrix P. Let X = [x₁, x₂, …, x_N], x_i ∈ R^D be the training set, Y = [y₁, y₂, …, y_N], y_i ∈ R^d be its corresponding low dimensional embedding where d << D, the mapping can be written as Y = P^TX. The projection matrix P can be obtained by minimizing the following objective function. $\sum_{i, j} {(y_{i} - y_{j})}^{2} w_{ij} \Rightarrow \sum_{i, j} {(P^{T} x_{i} - P^{T} x_{j})}^{2} w_{ij}$ (1) where w_ij is the weight between x_i and x_j. It is defined with k neighborhood. If x_i is one of the nearest neighbors of x_j or x_j is one of the nearest neighbors of x_i, w_ij is calculated with Equation (2), otherwise it is 0. $w_{ij} = exp (- | | x_{i} - x_{j} | |^{2} / t)$ (2) where t > 0. It is an experience value. Equation (1) can be rewritten to:

$\begin{matrix} \frac{1}{2} \sum_{i, j} {(P^{T} x_{i} - P^{T} x_{j})}^{2} w_{ij} \\ = \sum_{i, j} P^{T} x_{i} D_{ii} x_{i}^{T} P - \sum_{i, j} P^{T} x_{i} w_{ij} x_{j}^{T} P \\ = P^{T} X (D - W) X^{T} P \\ = P^{T} {XLX}^{T} P \end{matrix}$ (3) where D is a N * N diagonal matrix whose entries are the sum of the corresponding column or row of W, namely, D_ii = ∑_jw_ij. L = D - W is the Laplacian matrix. Also, there is a constraint: YDY^T = 1 ⇒ P^TXDX^TP = 1. The matrix P can be gained by solving the following generalized eigenvalue problem. ${XLX}^{T} P = λ {XDX}^{T} P$ (4)

The matrix P consists of the corresponding eigenvectors of the smallest d nonzero eigenvalues in Equation (4), i.e. P = [p₁, p₂, …, p_d].

LPP provides an explicit mapping between two spaces, which can project samples out of the training set into the low dimensional space. It eliminates the problem of LLE, LE, ISOMAP, and other methods that the new samples cannot be mapped into the low dimensional space. However, LPP doesn’t use the class labels and is an unsupervised method. It may cause the sub-manifolds of different classes to intersect together in the low dimensional space when the distances between classes are relatively close, which will affect the final recognition results. And when the number of the training samples is smaller than the dimension of data, the matrix is often singular while solving the generalized eigenvalue problem. To solve the problem, principal components analysis (PCA) is usually used to reduce the dimension of data at first.

2.2 Linear discriminant analysis

Unlike LPP, linear discriminant analysis (LDA) is a supervised method, which fully utilizes the class labels of the training samples. The basic idea is to extract the low dimensional features with the most discriminative ability from the high dimensional feature space by maximizing the between-class distances and minimizing the within-class distances. LDA needs to build the within-class scatter matrix and the between-class scatter matrix. Let X = [x₁, x₂, …, x_N], x_i ∈ R^D be the sample set, N be the number of samples, c be the number of classes, and n_i be the number of samples in the i-th class, then the within-class scatter matrix S_W is defined as follows: $S_{W} = \sum_{i = 1}^{c} \sum_{x_{k} \in {class}_{i}} (x_{k} - u_{i}) {(x_{k} - u_{i})}^{T}$ (5) where $u_{i} = \frac{1}{n_{i}} \sum_{x_{k} \in {class}_{i}} x_{k}$ is the mean of samples in i-th class. The between-class scatter matrix S_B is defined as follows: $S_{B} = \sum_{i = 1}^{c} n_{i} (u_{i} - u) {(u_{i} - u)}^{T}$ (6) where $u = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$ is the mean of all samples.

LDA uses Fisher criterion to find the optimal projection matrix P, namely, $J (P_{opt}) = \underset{P}{arg max} | P^{T} S_{B} P | / | P^{T} S_{W} P |$ (7)

The matrix P can be obtained by solving the following generalized eigenvalue problem. $S_{B} P = λ S_{W} P$ (8)

If S_W is not singular, Equation (8) can be rewritten to $S_{W}^{- 1} S_{B} P = λ P$ . The matrix P is the corresponding eigenvectors of the first largest d eigenvalues of $S_{W}^{- 1} S_{B}$ , i.e. P = [p₁, p₂, …, p_d].

LDA can greatly reduce the dimension of the pattern space. And the low dimensional features have the maximum between-class scatter and the minimum within-class scatter. However, the within-class scatter matrix is usually singular, which affects the quality of the extracted features. In this case, the method has poor stability. In addition, the method utilizes the arithmetic average, which does not reflect the importance of samples.

2.3 Multi-manifold discriminant analysis

The basic idea of MMDA is that the points from the same class are still close and the samples from different classes are as far as possible after reducing the dimension. To do this, MMDA uses the between-class Laplacian matrix and the within-class Laplacian matrix to express the between-class separation and the within-class compactness respectively. The projection matrix can be obtained by maximizing the between-class matrix and minimizing the within-class matrix together.

Let X = [x₁, x₂, …, x_N], x_i ∈ R^D be the sample set, N be the number of samples, c be the number of classes, and l_i ∈ {1, 2, …, c} be the class labels. Suppose that there is a projection matrix P, the low dimensional embedding Y = [y₁, y₂, …, y_N], y_i ∈ R^d can be achieved through mapping X into the low dimensional space with P, namely, Y = P^TX. The algorithm is as follows:

1) Construct the within-class Laplacian graph. If x_i and x_j belong to the same class, an edge is built between the two nodes. The weight of the edge is computed with the following equation. $C_{ij} = {\begin{matrix} exp (- | | x_{i} - x_{j} | |^{2} / t), l_{i} = l_{j} \\ 0, otherwise \end{matrix}$ (9)

The matrix P is calculated according to the within-class graph preserving criterion, which makes the manifold of the within-class matrix more compact. It is defined as follows: $\underset{P}{arg min} P^{T} {XL}_{C} X^{T} P$ (10) where L_C = D_C - C is Laplacian matrix and D_C is a diagonal matrix. In the matrix D_C, each element on the diagonal is the sum of the corresponding column or row of C, i.e. D_ii = ∑_jC_ij. The matrix D_C can be segmented into c sub-diagonal matrix D₁, D₂, …, D_c. Each sub-diagonal matrix is corresponding to a class.

2) Build the between-class Laplacian graph. According to the sub-diagonal matrix D₁, D₂, …, D_c, the sample mean of each class can be calculated. The mean of the k-class is denoted by $m_{k} = \sum_{i} D_{kii} x_{ki} / \sum_{i} D_{kii}$ (11)

The means of all classes can be written as M = [m₁, m₂, …, m_c]. The mean emphasizes the important samples to the within-class matrix. Then the weight between classes can be defined as follows: $B_{ij} = exp {- | | m_{i} - m_{j} | |^{2} / t}$ (12)

B_ij represents the similarity between the i-th class and the j-th class. The bigger it is, the smaller the distance between the two classes is. The projection matrix P is calculated according to the between-class Laplacian graph penalizing criterion, which maximizes the distances between all sub-manifolds. The definition is as follows: $arg max_{P} P^{T} {ML}_{B} {MP}^{T}$ (13) where L_B = D_B - B is Laplacian matrix and D_B is a diagonal matrix where D_ii = ∑_jB_ij.

3) MMDA satisfies the following two optimization criteria:

$\begin{matrix} {\begin{matrix} arg min_{P} P^{T} {XL}_{C} X^{T} P \\ arg max_{P} P^{T} {ML}_{B} M^{T} \end{matrix} \Rightarrow \\ P = arg max_{P} \frac{P^{T} {ML}_{B} M^{T} P}{P^{T} {XL}_{C} X^{T} P} \end{matrix}$ (14)

MMDA extracts features under the framework of Fisher discriminant analysis. The within-class Laplacian operator can be defined as $J_{c} (P) = P^{T} α_{c} P$ (15) where α_C ∝ ∑_ijC_ij (x_i - x_j) (x_i - x_j) ^T ∝ XL_CX^T. The between-class Laplacian operator can be defined as $J_{B} (P) = P^{T} α_{B} P$ (16) where $α_{B} \propto \sum_{i = 1}^{c} B_{ij} ({\tilde{m}}_{i} - {\tilde{m}}_{j}) {({\tilde{m}}_{i} - {\tilde{m}}_{j})}^{T} \propto {ML}_{B} M^{T}$ . To maximize the between-class Laplacian matrix and minimize the within-class Laplacian matrix together, the objective function can be defined as $J (P) = arg max_{P} \frac{J_{B} (P)}{J_{C} (P)} = \frac{P^{T} α_{B} P}{P^{T} α_{C} P}$ (17)

The matrix P can be obtained by solving the eigenvalues of α_BP = λα_CP. It includes the corresponding eigenvectors of the first largest d eigenvalues.

In MMDA, the within-class matrix emphasizes the contribution of each sample to the corresponding sub-manifold. The distances between different sub-manifolds are maximized. It uses the class labels and is a supervised multi-manifold learning method. Better results from MMDA have achieved in face recognition [22]. The within-class matrix is minimized while maximizing the between-class matrix in MMDA. However, the within-class matrix is usually singular when the number of samples is much smaller than the dimension of data, which will lose some effective information and affect the recognition results. So the principal components are often extracted with PCA, which possibly miss some information with discriminatory ability. At the same time, some between-class information will also be missed while maximizing the between-class matrix and minimizing the within-class matrix together. These possibly affect the final recognition results.

3 Supervised multi-manifold learning method

3.1 The principle of the algorithm

In pattern classification, the category information of some samples is usually known. However, LPP does not use the information. In LDA and MMDA, the class information of the training samples is utilized while maximizing the between-class distances. But the matrix is often singular during minimizing the within-class matrix, which affects the classification results. Also, the importance of samples is ignored because LDA uses the arithmetic mean of samples to compute the between-class scatter. To avoid the singular matrix, we only maximize the between-class matrix and hide the minimization of the within-class matrix in the procedure of maximizing the between-class matrix. Next, we construct the between-class Laplacian scatter matrix.

Let X = [x₁, x₂, …, x_N], x_i ∈ R^D be the sample set, N be the number of samples, c be the number of classes, n_i be the sample number of the i-th class and l_i ∈ {1, 2, …, c} be the class labels. Suppose that the low dimensional embedding Y = [y₁, y₂, …, y_N], y_i ∈ R^d can be gained by the linear projection matrix P, namely, Y = P^TX. Suppose that M = [m₁, m₂, …, m_c] are the means of all classes in the high dimensional space and B_ij denotes the weight between the i-th class and the j-th class. B_ij is defined as follows: $B_{ij} = exp {- | | m_{i} - m_{j} | |^{2} / t}$ (18)

Let $\tilde{M} = [{\tilde{m}}_{1}, {\tilde{m}}_{2}, \dots, {\tilde{m}}_{c}]$ be the projection of M in the low dimensional space. The between-class Laplacian graph is maximized by maximizing the following sum function: $\frac{1}{2} \sum_{i, j} {({\tilde{m}}_{i} - {\tilde{m}}_{j})}^{2} B_{ij} = {\tilde{ML}}_{B} {\tilde{M}}^{T}$ (19) where L_B = D_B - B is the Laplacian matrix and D_B is a diagonal matrix where D_ii = ∑_jB_ij. Therefore, the between-class graph penalizing criterion can be defined as $arg max_{\tilde{M}} {\tilde{ML}}_{B} {\tilde{M}}^{T}$ (20)

${\tilde{m}}_{i}$ is the mean of samples of the i-th class in the low dimensional space, which can be figured out with the arithmetic mean or the weighted mean. Here, we adopt the weighted mean to increase the coupling of samples in the same class. Let ${\tilde{m}}_{i} = \sum_{k = 1}^{n_{i}} w_{ik} y_{ik}$ . Because y_ik = P^Tx_ik, we can conclude that

$\begin{matrix} {\tilde{m}}_{i} & = & \sum_{k = 1}^{n_{i}} w_{ik} y_{ik} = \sum_{k = 1}^{n_{i}} w_{ik} P^{T} x_{ik} \\ = & P^{T} \sum_{k = 1}^{n_{i}} w_{ik} x_{ik} = P^{T} m_{i} \end{matrix}$ (21) where m_i is the weighted mean of samples in the i-th class in the high dimensional space. The equation can further be deduced to $\tilde{M} = P^{T} M$ (22)

The following equation can be gained by introducing Equation (22) into Equation (19) ${\tilde{ML}}_{B} {\tilde{M}}^{T} = P^{T} {ML}_{B} M^{T} P$ (23)

Therefore, the between-class graph penalizing criterion can be transformed into: $arg max_{P} P^{T} {ML}_{B} M^{T} P$ (24)

The above maximizing problem can be solved by solving the following generalized eigenvalue problem, ${ML}_{B} M^{T} P = λ P$ (25)

The matrix P contains the corresponding eigenvectors of the first largest d eigenvalues of ML_BM^T. From Equation (21), the mean of samples in the i-th class in the high dimensional space can be computed with the following equation. $m_{i} = \sum_{k = 1}^{n_{i}} w_{ik} x_{ik}$ (26)

w_ik is the weight of the sample x_ik in the i-th class, which determines the contribution of x_ik to the mean. Then how should it be calculated? Obviously, the more important the sample is, the bigger the weight should be. The weight should be small for those less important samples. However, how should we determine whether the sample is important? The importance of samples can be distinguished by minimizing the within-class Laplacian graph. To build the within-class Laplacian graph, the similarity between x_i and x_j is defined as follows: $C_{ij} = {\begin{matrix} exp (- | | x_{i} - x_{j} | |^{2} / t, l_{i} = l_{j} \\ 0, otherwise \end{matrix}$ (27)

The bigger the distance between x_i and x_j is, the smaller the weight between two points is. The weight function is a strictly monotonically decreasing function with respect to the distances between nodes, and 0 ≤ C_ij ≤ 1 always works. The within-class graph penalizing criterion can be defined as: $\underset{P}{arg min} P^{T} {XL}_{C} X^{T} P$ (28) where L_C = D_C - C is Laplacian matrix and D_C is a diagonal matrix where D_ii = ∑_jC_ij. The matrix D_C can be segmented into c sub-diagonal matrix D₁, D₂, …, D_c. Each sub-diagonal matrix belongs to one class. The diagonal elements in D_C indicate the importance of the corresponding samples. The bigger the value is, the more important the sample is. The more important the sample is, the bigger the corresponding weight should be, otherwise the weight should be smaller. Thus, the weight w_ik can be defined as follows: $w_{ik} = D_{ikk} / \sum_{l}^{n_{i}} D_{ill}$ (29)

3.2 Description of the algorithm

The proposed algorithm is described as follows:

Calculate the similarity between x_i and x_j with Equation (27), and construct the weight matrix of each class. Let N be the number of the training samples, the size of matrix C of each class is N * N and these c matrixes are named as C₁, … C_c. We can derive a matrix D_c from each matrix C, i.e. D₁, D₂, …, D_c.

Compute the weight w_ik of each sample with Equation (29) according to the diagonal matrix D₁, D₂, …, D_c. Then the mean of each class is calculated with Equation (26). Finally, the matrix M = [m₁, m₂, …, m_c] is gained.

Construct the between-class Laplacian matrix. The matrix B can be calculated according to Equation (18), and its order is c * c. The matrix L_B is obtained with the formula L_B = D_B - B.

Solve the first largest d eigenvalues of ML_BM^T. Their corresponding eigenvectors are taken as the projection matrix P.

Figure out the low dimensional representation of the training set with Y = P^TX.

Gain the low dimensional features of the testing samples with Y′ = P^TX′.

In the above algorithm, multi-manifold features of each class are extracted with the between-class Laplacian graph by introducing the class labels. The projection matrix is obtained by maximizing the between-class Laplacian scatter matrix where the samples from different classes are separated as far as possible. In the method, the between-class Laplacian graph is different from the traditional linear discriminant analysis, which considers the structure of the sub-manifold and emphasizes the importance of different samples. It improves the within-class compactness and preserves the local adjacency relationship between samples. And it eliminates the problem that the within-class matrix is singular in MMDA and LDA. It can extract the most discriminative features from the high dimensional data, which improves the recognition accuracy.

The time cost of the algorithm mainly consists of three parts: computing the matrix C, calculating the matrix B, and solving the eigenvalues. Let N be the number of samples, c be the number of classes, and D be the dimension of samples, the time complexity of calculating C is O (N²D), the time complexity of computing B is O (c²D) and the time complexity of solving the eigenvalues of ML_BM^T is O (D³). The time consumption of figuring out B can be ignored for c is usually much smaller than N. Therefore, the total time complexity is O (N²D + D³). In the method, the matrix C and ML_BM^T needs to be stored; therefore the space complexity is O (N² + D²).

3.3 The method of determining the dimension d of the low dimensional space

The matrix P consists of the corresponding eigenvectors of the first largest d eigenvalues of ML_BM^T. Therefore, it is very important for the following classification to select a proper d. At present, there is not a better method to determine d. Most of the methods are both to select d in a certain range through experiments. In PCA, d is estimated by calculating the cumulative contribution rate of eigenvalues. Here, we use the idea of PCA to choose d. Firstly, all eigenvalues are sorted by the descending order. Then the contribution rate of each eigenvalue is calculated. Lastly, the corresponding eigenvectors can express the most important information to distinguish classes when the cumulative contribution rate is more than 90%. In next section, the experimental results also show that the accuracy is relatively high when the cumulative contribution rate is more than 95%.

4 Experimental results

To verify the effectiveness of SMML, it is tested on FERET face image database, ORL (Olivetti research laboratory) face image database and the pavement distress image database. And we compare the method with MMDA, LDA and LPP. FERET contains 200 classes. Each class has 7 samples, and each image size is 80*80. ORL includes 400 images, which consists of different face images of 40 people. Each image is 256 gray levels and its size is 112*92. The pavement distress image database includes 360 samples, which is divided into 4 categories: transverse cracks, longitudinal cracks, alligator cracks, and block cracks. Each category has 90 images and each image size is 70*93.

Experiments include four parts. The first part is to test the effectiveness of the method of determining the dimension of the low dimensional features. Other three parts are to validate the effectiveness of SMML on FERET, ORL and the pavement distress image database. On the pavement distress image data set, SMML is compared with the 2D projection, the fusion density factor and the invariant moment besides MMDA, LDA and LPP. In experiments, we firstly obtain the matrix P with SMML, MMDA, LDA and LPP. Then the low dimensional features of the training set and the testing set are computed according to P. Lastly, samples are recognized on the low dimensional features with KNN and SVM. And the results from different methods are compared.

4.1 Testing the method to determine the dimension of the low dimensional features

In the section, the method of determining the dimension of the low dimensional features is verified on ORL. In experiments, the first 5 images of each class are used as the training samples and the remains are taken as the testing samples. Thus, there are 200 training samples and 200 testing samples. Each image is rescaled to 40*40. Firstly, the eigenvalues of ML_BM^T is calculated. Then the cumulative contribution rate of the eigenvalues is figured out, and the corresponding eigenvectors are taken as the low features to be recognized with KNN. Finally, the recognition accuracy is calculated. The results are shown in Fig. 1. In KNN, K = 1. In Fig. 1, the vertical coordinate denotes the recognition accuracy, and the horizontal coordinate is the cumulative contribution rate. Figure 1(a) gives the recognition accuracy of the cumulative contribution rate from 0% to 100%. Figure 1(b) is the local zoom of Fig. 1(a). It gives the growth trend of the recognition accuracy of the accumulation contribution rate from 80% to 100%.

From Fig. 1(a), it could be concluded that the accuracy stepwise increases with the increasing cumulative contribution rate and it basically tends to stable when the cumulative contribution rate reaches to 80%. From Fig. 1(b), the accuracy gradually increases when the rate is from 80% to 90%, and it becomes stable when the rate is from 90% to 95%. The accuracy reaches the highest when the rate arrives at 95%. Therefore, the number of eigenvalues is taken as the dimension of the low dimensional features in next experiments when the cumulative contribution rate is higher than 95%. The corresponding eignvectors of the first largest d eigenvalues are taken as the low dimensional features. According to the method, d = 39 on ORL, d = 185 on FERET, and d = 3 on the pavement distress images in next experiments.

4.2 Recognition results on ORL

ORL face data set is divided into two subsets: the training set and the testing set. We respectively select the first 3,4,5,6 samples as the training set. The remaining samples are as the testing set. Firstly, the matrix P is obtained by mapping the training samples into the low dimensional space with SMML, MMDA, LDA and LPP. Then the low dimensional coordinates of the testing samples are calculated according to P. Finally, the testing samples are classified with SVM and KNN. In SMML and MMDA t = 0.03*1600. And d = 39 in SMML, MMDA and LDA. The data are reduced the dimension with PCA because the within-class matrix is singular in LPP. The cumulative contribution rate is 80%, d = 20 and k = 4 in LPP + PCA. Experimental results are shown in Tables 1 to 5.

The number K of the nearest neighbors and the distance measure need to be determined while using KNN to identify the samples. To improve the recognization results, Euclidean distance and the angle cosine distance are compared. And the results from different K are compared. Here, K takes from 1 to 5. In Tables 2 to 5, the first column of each method is the results from Euclidean distance and the second column is the results from the angle cosine distance. “ED” denotes Euclidean distance and “CD” is the angle cosine distance.

From the experimental results, SSML obtains the best results for l = 3, 5, 6 when SVM is used to classify the low dimensional samples. And the results from SSML are close to those from MMDA and LDA for l = 4. The accuracy from four methods gradually decreases with the increasing K in KNN. In most cases, the accuracy is the highest when K = 1. The highest accuracy for l = 3, 4, 5, 6 is in Table 6. From Table 6, the accuracy from SMML is the highest for most cases on ORL. But the advantage is not obvious. The results from Euclidean distance are superior to those from the angle cosine distance. In SMML, there exist no singular matrixes.

4.3 Recognition results on FERET

On FERET, the first 3,4,5,6 samples of each class are selected as the training samples and the remaining samples are as the testing samples. Each image is rescaled to 80*80. The experimental method is the same as that in the Section 4.2. In SMML and MMDA, t = 0.03*6400 and d = 185. In LPP+PCA, the cumulative contribution rate is 90%, k = 7 and d = 68. In LDA d = 193. In KNN, K is from 1 to 5. Euclidean distance and the angle cosine distance is used. Results are shown in Tables 7 to 11.

From the experimental results, the results from SSML are the highest for l = 3, 4, 5, 6 when SVM is used to classify the low dimensional samples. And the results from SSML are great superior to those from other three feature extracted methods. When the low dimensional samples are classified with KNN, the highest accuracy of the various methods is achieved for K = 1 and the results from the angle cosine distance are better than those from Euclidean distance. The highest accuracy of the various methods is given in Table 12 for different cases.

As shown in Table 12, the results from SMML are superior to those from other methods, especially the advantage is more obvious when the number of the training samples is bigger than 3. The main reason is that it doesn’t suffer from the singular matrix while solving the eigenvalues with SMML. The sub-manifolds of different classes are separated as far as possible while projecting the face images into the low dimensional space.

4.4 Recognition results on the pavement distress images

The pavement distress image data set is divided into two subsets: the training set and the testing set. The former consists of 280 samples and each class has 70 samples. The latter includes 80 samples and each class has 20 samples. Experimental method is the same as that in the Section 4.2. In SMML, t = 0.03*6510 and d = 3 because the cumulative contribution rate of the first three eigenvalues reaches to 99%. In MMDA, the values of t and d are the same as those in SMML. In LPP + PCA, k = 7 and d = 62. In LDA d = 3. In KNN, K takes from 1 to 5. Euclidean distance and the angle cosine distance are compared. The results are shown in Tables 13 and 14.

From the results, the recognition accuracy from SMML is higher than that from other methods when SVM and KNN are used to classify the low dimensional samples. And the results from Euclidean distance are greatly superior to those from the angle cosine distance in KNN. The accuracy is much higher than other methods while using Euclidean distance and the results are close when K takes from 1 to 5. The results from SMML are close to those from LLP + PCA when the angle cosine distance is used in KNN. And the results are superior to those from other methods. It shows that SMML can extract more effective features from the pavement distress images.

We call the low dimensional feature extracted with SMML as multi-manifold features. To further verify the effectiveness of the multi-manifold feature, we compare SMML with the traditional pavement feature extraction methods. Firstly, the multi-manifold feature is compared with coordinate projection [35], fusion density factor [36] and 2-order invariant moment [37]. Then these features are fused and the results are compared. In experiments, we firstly extract two projection features, four fusion density factors and two 2-order invariant moments. Then they are combined with different means. KNN is used and K = 5. The results are shown in Table 15.

As shown in Table 15, the accuracy is improved by fusing several kinds of features. But it is far lower than that from SMML. The main reason is that the traditional pavement feature extraction method only extracts one aspect of features from pavement images. For example, the two dimensional projection only considers the projection difference; the fusion density factors only account for the damage density, and 2-order invariant moment only takes into account the geometrical features of the image region. These methods cannot all adequately extract the effective information of pavement features. To further discuss the effect of different feature extraction methods on the results, we give the distribution of coordinate projection, fusion density factor, 2-order invariant moment and multi-manifold features in the low dimensional space. The results are shown in Fig. 2.

Figure 2 gives the within-class compactness and the between-class distance of the feature distribution, which will both affect the classification results. The bigger the distance between the classes is, the more obvious the boundaries between the classes are and the better the recognition results are. The low dimensional features from coordinate projection, 2-order invariant moment and fusion density factor have a bigger crossover between various classes and the within-class distribution is looser, which affect the classification results. The multi-manifold learning features decrease the crossover of various sub-manifolds in the low dimensional space by maximizing the between-class distance and increasing the within-class compactness. Therefore, SMML can obtain higher accuracy.

5 Conclusion

In the era of big data, images have very high dimension, which has seriously affected the recognition results and efficiency of big image data. It is an important method to deal with the high dimensional data by reducing the dimension of images. A supervised multi-manifold learning method is put forward, which builds an explicit mapping between two spaces by maximizing the between-class Laplacian scatter matrix to seek the projection matrix. It overcomes the problem of the singular matrix existing in MMDA and LDA. Experimental results on the pavement images and face images show that the recognition accuracy of SMML is superior to that of other methods. Experimental results on ORL also demonstrate that it is effective to estimate the dimension of the low dimensional features by using the cumulative contribution rate of eigenvalues. Next, we will use the method in other fields such as text classification and further validate its effectiveness.

Footnotes

Acknowledgments

This work was supported in part by a grant from NSF of Hebei province of China (No. F2016202144), TSTC of Tianjin of China (Nos. 14JCZDJC31600 and 13JCQNJC00200) and KPSTRHE of Hebei province of China (No. ZD2014030).

References

Tenenbaum

J.B.

, de Silva

and Langford

J.C.

, A global geometric framework for nonlinear dimensionality reduction, Science 290(5500) (2000), 2319–2323.

Rowei

S.T.

and Saul

L.K.

, Nonlinear dimensionality reduction by locally linear embedding, Science 290(5500) (2000), 2323–2326.

Belkin

and Niyogi

, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation 15(6) (2003), 1373–1396.

Zhang

and Zha

, Principal manifolds and nonlinear dimensionality reduction via tangent space alignment, SIAM Journal of Scientific Computing 26(1) (2005), 313–338.

Kouropteva

, Okun

and Pietikäinen

, Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine, in Proceeding of ESANN, 2003, pp. 229–234.

Pless

and Souvenir

, A survey of manifold learning for images, IPSJ Transactions on Computer Vision and Applications 1 (2009), 83–94.

Venna

, Peltonen

, Nybo

, Aidos

and Kaski

, Information retrieval perspective to nonlinear dimensionality reduction for data visualization, The Journal of Machine Learning Research 11 (2010), 451–490.

Elhamifar

and Vidal

, Sparse manifold clustering and embedding, in Proceedings of NIPS, 24, 2011, pp. 55–63.

Criminisi

, Shotton

and Konukoglu

, Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Foundations and Trends in Computer Graphics and Vision Found 7(2-3) (2011), 81–227.

10.

Huang

and Feng

, Gene classification using parameter-free semi-supervised manifold learning, IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(3) (2012), 818–827.

11.

Cai

and He

, Manifold adaptive experimental design for text categorization, IEEE Transactions on Knowledge and Data Engineering 24(4) (2012), 707–719.

12.

Cheng

, Cheng

and Guo

, Supervised ISOMAP based on pairwise constraints, LNCS 7663 (2012), 447–454.

13.

Cui

, Zheng

and Yang

, Dimensionality reduction for microarray data using local mean based discriminant analysis, Biotechnology Letters 35(3) (2013), 331–336.

14.

Tomar

V.S.

and Rose

R.C.

, Efficient manifold learning for speech recognition using locality sensitive hashing, in Proceedings of ICASSP, 2013, pp. 6995–6999.

15.

Golchin

and Maghooli

, Overview of manifold learning and its application in medical data set, International Journal of Biomedical Engineering and Science (IJBES) 1(2) (2014), 23–33.

16.

and Niyogi

, Locality preserving projections, in Proceedings of NIPS, 2003, pp. 153–160.

17.

Geng

, Zhan

and Zou

, Supervised nonlinear dimensionality reduction for visualization and classification, IEEE Transactions on Systems, Man and Cybernetics 35(6) (2005), 1098–1107.

18.

Zhang

, Yang

, Zhao

and Ge

, Linear local tangent space alignment and application to face recognition, Neurocomputing 70(7-9) (2007), 1547–1553.

19.

Zhang

, Tao

and Yang

, Discriminative locality alignment, in Proceedings of ECCV: Part I, 2008, pp. 725–738.

20.

Wang

, Tiňo

and Fardal

M.A.

, Multiple manifolds learning framework based on hierarchical mixture density model, in Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II, Springer-Verlag, 2008, pp. 566–581.

21.

, Wang

and Song

, Multi-manifold learning using locally linear embedding (LLE) nonlinear dimensionality reduction, Qinghua Daxue Xuebao/journal of Tsinghua University 48(4) (2008), 582–585.

22.

Yang

, Sun

and Zhang

, A multi-manifold discriminant analysis method for image feature extraction, Pattern Recognition 44(8) (2011), 1649–1657.

23.

Valencia-Aguirre

, Álvarez-Meza

, Daza-Santacoloma

, et al., Multiple manifold learning by nonlinear dimensionality reduction, Progress in Pattern Recognition Image Analysis Computer Vision & Applications 7042 (2011), 206–213.

24.

Fan

, Qiao

, Zhang

, et al., Isometric multi-manifold learning for feature extraction, 2012 IEEE 12th International Conference on Data Mining, IEEE Computer Society, 2012, pp. 241–250.

25.

Jain

V.K.

, Tapaswi

and Shukla

, Location estimation based on semi-supervised locally linear embedding (SSLLE) approach for indoor wireless networks, Wireless Personal Communications 67(4) (2012), 879–893.

26.

Xing

, Yu

, Jiang

, et al., A multi-manifold semi-supervised Gaussian mixture model for pattern classification, Pattern Recognition Letters 34(16) (2013), 2118–2125.

27.

Gao

and Liang

, Manifold learning algorithm DC-ISOMAP of data lying on the well-separated multi-manifold with same intrinsic dimension, Journal of Computer Research and Development 50(8) (2013), 1690–1699.

28.

Yan

, Lu

, Zhou

, et al., Multi-feature multi-manifold learning for single-sample face recognition, Neurocomputing 143(16) (2014), 134–143.

29.

Lunga

, Prasad

, Crawford

M.M.

, et al., Manifold-learning-based feature extraction for classification of hyperspectral data: A review of advances in manifold learning, IEEE Signal Processing Magazine 31(1) (2014), 55–66.

30.

Huang

, Lu

, Tan

, et al., Multi-manifold metric learning for face recognition based on image sets, Journal of Visual Communication & Image Representation 25(7) (2014), 1774–1783.

31.

Fan

, Zhang

, Lin

, et al., A regularized approach for geodesic-based semisupervised multimanifold learning, IEEE Transactions on Image Processing 23(5) (2014), 2133–2147.

32.

, Li

and Zhang

, Nonparametric discriminant multi-manifold learning, Intelligent Computing Theory (2014), 113–119.

33.

Hettiarachchi

and Peters

J.F.

, Multi-manifold LLE learning in pattern recognition, Pattern Recognition 48(9) (2015), 2947–2960.

34.

Kellkr

H.H.

, Semi-supervised dimensionality reduction of hyperspectral image based on sparse multi-manifold learning, Journal of Computer & Communications 3(11) (2015), 33–39.

35.

Zuo

, Research on the key technology of intelligent detection system for pavement distress, Changchong, China, Jilin University, 2013.

36.

Xiao

, Yan

and Zhang

, Research on the automatic pavement distress recognition based on synthetically distress density factor, Journal of Transportation Engineering and Information 3(2) (2005), 19–26.

37.

Liu

and Jiang

, Recognition of porcelain bottle crack based on modified ART-2 network and invariant moment, Chinese Journal of Scientific Instrument 30(7) (2009), 1420–1425.