Abstract
In image recognition, the within-class matrix in some multi-manifold learning algorithms is singular, which affects the recognition effectiveness. To solve the problem, a supervised multi-manifold learning method is proposed, which extracts multi-manifold features of images by maximizing the between-class Laplacian graph and hides the minimization of the within-class Laplacian graph in the maximization of the between-class Laplacian graph by introducing the class labels. This method provides an explicit mapping between the high dimensional images and the low dimensional features, which can project samples out of the training set into the low dimensional space and also overcomes the singular problem of the within-class matrix. The proposed algorithm is tested on the pavement distress images, ORL and FERET face images. Experiments show that the recognition accuracy is greatly improved, and the dimension of the low dimensional features is determined. And the influence of Euclidean distance and the angle cosine distance on the recognition results is compared by using KNN.
Introduction
Nowadays, with the rapid development of multimedia technology and Internet, the number of images is explosively increasing. Though it is convenient for people to use these large number of images, it has been a problem to deal with them quickly. Image recognition, as one of image processing technologies, is very important in computer vision and has been being a hot research topic. Images are typical high-dimensional data, which have hundreds, thousands or even ten thousands of dimensions in the pixel space. It will lead to the dimension curse when we treat these high dimensional data. Reducing dimensions of images is an important method to process images effectively. Manifold learning, as a nonlinear dimensionality reduction technique, can discover the intrinsic structure hidden in the high dimensional data, which may overcome the dimension curse.
At present, there are some representative manifold learning algorithms such as isometric mapping (ISOMAP) [1], local linear embedding (LLE) [2], Laplacian eigenmaps (LE) [3], local tangent space alignment (LTSA) [4], and so on. These methods can effectively map the high dimensional data into the low dimensional space, which have been widely applied in image recognition, image retrieval, text classification, biological information processing and other fields [5–15]. Because all these methods use batch mode and have no explicit functions between two spaces, it is impossible to project samples out of the training set into the low dimensional space. At the same time, all data are mapped on single manifold in these algorithms. However, data in the real pattern classification usually belongs to multiple sub-manifolds, so that these methods suffer from great limitations in classification. Therefore, researchers have proposed some methods [16–34] to solve these problems, such as locality preserving projections (LPP) [16], multi-manifold discriminant analysis (MMDA) [22], and so on. Among these methods, LPP can preserve the local structure of data and provides an explicit mapping relationship between two spaces. However, it doesn’t use the class labels, and the matrix is usually singular. When the distances between classes are relatively close, it may cause the sub-manifolds of different classes to intersect together in the low dimensional space. MMDA is a supervised method, which maps samples of different classes into various sub-manifolds by maximizing the between-class Laplacian matrix, and minimizing the within-class Laplacian matrix. Nevertheless, the within-class Laplacian matrix is usually singular.
In image recognition, images from different categories can be correctly separated as long as the between-class distinction is large enough. A classification method based on supervised multi-manifold learning (SMML) is presented, which extracts multi-manifold features through maximizing the between-class Laplacian graph and projects samples from different classes into the respective sub-manifolds. Experimental results on face recognition and pavement crack recognition show that SMML effectively improves the recognition accuracy.
Related works
Locality preserving projections
LPP can better preserve the local structure of data and project data from different classes into multiple sub-manifolds when the distances between classes are relatively far. The basic idea is to map the data set X in the high dimensional space R
D
into the low dimensional space R
d
to reduce the dimension of data by using a mapping matrix P. Let X = [x1, x2, …, x
N
], x
i
∈ R
D
be the training set, Y = [y1, y2, …, y
N
], y
i
∈ R
d
be its corresponding low dimensional embedding where d << D, the mapping can be written as Y = PTX. The projection matrix P can be obtained by minimizing the following objective function.
The matrix P consists of the corresponding eigenvectors of the smallest d nonzero eigenvalues in Equation (4), i.e. P = [p1, p2, …, p d ].
LPP provides an explicit mapping between two spaces, which can project samples out of the training set into the low dimensional space. It eliminates the problem of LLE, LE, ISOMAP, and other methods that the new samples cannot be mapped into the low dimensional space. However, LPP doesn’t use the class labels and is an unsupervised method. It may cause the sub-manifolds of different classes to intersect together in the low dimensional space when the distances between classes are relatively close, which will affect the final recognition results. And when the number of the training samples is smaller than the dimension of data, the matrix is often singular while solving the generalized eigenvalue problem. To solve the problem, principal components analysis (PCA) is usually used to reduce the dimension of data at first.
Unlike LPP, linear discriminant analysis (LDA) is a supervised method, which fully utilizes the class labels of the training samples. The basic idea is to extract the low dimensional features with the most discriminative ability from the high dimensional feature space by maximizing the between-class distances and minimizing the within-class distances. LDA needs to build the within-class scatter matrix and the between-class scatter matrix. Let X = [x1, x2, …, x
N
], x
i
∈ R
D
be the sample set, N be the number of samples, c be the number of classes, and n
i
be the number of samples in the i-th class, then the within-class scatter matrix S
W
is defined as follows:
LDA uses Fisher criterion to find the optimal projection matrix P, namely,
The matrix P can be obtained by solving the following generalized eigenvalue problem.
If S W is not singular, Equation (8) can be rewritten to . The matrix P is the corresponding eigenvectors of the first largest d eigenvalues of , i.e. P = [p1, p2, …, p d ].
LDA can greatly reduce the dimension of the pattern space. And the low dimensional features have the maximum between-class scatter and the minimum within-class scatter. However, the within-class scatter matrix is usually singular, which affects the quality of the extracted features. In this case, the method has poor stability. In addition, the method utilizes the arithmetic average, which does not reflect the importance of samples.
The basic idea of MMDA is that the points from the same class are still close and the samples from different classes are as far as possible after reducing the dimension. To do this, MMDA uses the between-class Laplacian matrix and the within-class Laplacian matrix to express the between-class separation and the within-class compactness respectively. The projection matrix can be obtained by maximizing the between-class matrix and minimizing the within-class matrix together.
Let X = [x1, x2, …, x N ], x i ∈ R D be the sample set, N be the number of samples, c be the number of classes, and l i ∈ {1, 2, …, c} be the class labels. Suppose that there is a projection matrix P, the low dimensional embedding Y = [y1, y2, …, y N ], y i ∈ R d can be achieved through mapping X into the low dimensional space with P, namely, Y = P T X. The algorithm is as follows:
1) Construct the within-class Laplacian graph. If x
i
and x
j
belong to the same class, an edge is built between the two nodes. The weight of the edge is computed with the following equation.
The matrix P is calculated according to the within-class graph preserving criterion, which makes the manifold of the within-class matrix more compact. It is defined as follows:
2) Build the between-class Laplacian graph. According to the sub-diagonal matrix D1, D2, …, D
c
, the sample mean of each class can be calculated. The mean of the k-class is denoted by
The means of all classes can be written as M = [m1, m2, …, m
c
]. The mean emphasizes the important samples to the within-class matrix. Then the weight between classes can be defined as follows:
B
ij
represents the similarity between the i-th class and the j-th class. The bigger it is, the smaller the distance between the two classes is. The projection matrix P is calculated according to the between-class Laplacian graph penalizing criterion, which maximizes the distances between all sub-manifolds. The definition is as follows:
3) MMDA satisfies the following two optimization criteria:
MMDA extracts features under the framework of Fisher discriminant analysis. The within-class Laplacian operator can be defined as
The matrix P can be obtained by solving the eigenvalues of α B P = λα C P. It includes the corresponding eigenvectors of the first largest d eigenvalues.
In MMDA, the within-class matrix emphasizes the contribution of each sample to the corresponding sub-manifold. The distances between different sub-manifolds are maximized. It uses the class labels and is a supervised multi-manifold learning method. Better results from MMDA have achieved in face recognition [22]. The within-class matrix is minimized while maximizing the between-class matrix in MMDA. However, the within-class matrix is usually singular when the number of samples is much smaller than the dimension of data, which will lose some effective information and affect the recognition results. So the principal components are often extracted with PCA, which possibly miss some information with discriminatory ability. At the same time, some between-class information will also be missed while maximizing the between-class matrix and minimizing the within-class matrix together. These possibly affect the final recognition results.
The principle of the algorithm
In pattern classification, the category information of some samples is usually known. However, LPP does not use the information. In LDA and MMDA, the class information of the training samples is utilized while maximizing the between-class distances. But the matrix is often singular during minimizing the within-class matrix, which affects the classification results. Also, the importance of samples is ignored because LDA uses the arithmetic mean of samples to compute the between-class scatter. To avoid the singular matrix, we only maximize the between-class matrix and hide the minimization of the within-class matrix in the procedure of maximizing the between-class matrix. Next, we construct the between-class Laplacian scatter matrix.
Let X = [x1, x2, …, x
N
], x
i
∈ R
D
be the sample set, N be the number of samples, c be the number of classes, n
i
be the sample number of the i-th class and l
i
∈ {1, 2, …, c} be the class labels. Suppose that the low dimensional embedding Y = [y1, y2, …, y
N
], y
i
∈ R
d
can be gained by the linear projection matrix P, namely, Y = PTX. Suppose that M = [m1, m2, …, m
c
] are the means of all classes in the high dimensional space and B
ij
denotes the weight between the i-th class and the j-th class. B
ij
is defined as follows:
Let be the projection of M in the low dimensional space. The between-class Laplacian graph is maximized by maximizing the following sum function:
is the mean of samples of the i-th class in the low dimensional space, which can be figured out with the arithmetic mean or the weighted mean. Here, we adopt the weighted mean to increase the coupling of samples in the same class. Let . Because y ik = PTx ik , we can conclude that
The following equation can be gained by introducing Equation (22) into Equation (19)
Therefore, the between-class graph penalizing criterion can be transformed into:
The above maximizing problem can be solved by solving the following generalized eigenvalue problem,
The matrix P contains the corresponding eigenvectors of the first largest d eigenvalues of ML
B
MT. From Equation (21), the mean of samples in the i-th class in the high dimensional space can be computed with the following equation.
w
ik
is the weight of the sample x
ik
in the i-th class, which determines the contribution of x
ik
to the mean. Then how should it be calculated? Obviously, the more important the sample is, the bigger the weight should be. The weight should be small for those less important samples. However, how should we determine whether the sample is important? The importance of samples can be distinguished by minimizing the within-class Laplacian graph. To build the within-class Laplacian graph, the similarity between x
i
and x
j
is defined as follows:
The bigger the distance between x
i
and x
j
is, the smaller the weight between two points is. The weight function is a strictly monotonically decreasing function with respect to the distances between nodes, and 0 ≤ C
ij
≤ 1 always works. The within-class graph penalizing criterion can be defined as:
The proposed algorithm is described as follows: Calculate the similarity between x
i
and x
j
with Equation (27), and construct the weight matrix of each class. Let N be the number of the training samples, the size of matrix C of each class is N * N and these c matrixes are named as C1, … C
c
. We can derive a matrix D
c
from each matrix C, i.e. D1, D2, …, D
c
. Compute the weight w
ik
of each sample with Equation (29) according to the diagonal matrix D1, D2, …, D
c
. Then the mean of each class is calculated with Equation (26). Finally, the matrix M = [m1, m2, …, m
c
] is gained. Construct the between-class Laplacian matrix. The matrix B can be calculated according to Equation (18), and its order is c * c. The matrix L
B
is obtained with the formula L
B
= D
B
- B. Solve the first largest d eigenvalues of ML
B
MT. Their corresponding eigenvectors are taken as the projection matrix P. Figure out the low dimensional representation of the training set with Y = PTX. Gain the low dimensional features of the testing samples with Y′ = P
T
X′.
In the above algorithm, multi-manifold features of each class are extracted with the between-class Laplacian graph by introducing the class labels. The projection matrix is obtained by maximizing the between-class Laplacian scatter matrix where the samples from different classes are separated as far as possible. In the method, the between-class Laplacian graph is different from the traditional linear discriminant analysis, which considers the structure of the sub-manifold and emphasizes the importance of different samples. It improves the within-class compactness and preserves the local adjacency relationship between samples. And it eliminates the problem that the within-class matrix is singular in MMDA and LDA. It can extract the most discriminative features from the high dimensional data, which improves the recognition accuracy.
The time cost of the algorithm mainly consists of three parts: computing the matrix C, calculating the matrix B, and solving the eigenvalues. Let N be the number of samples, c be the number of classes, and D be the dimension of samples, the time complexity of calculating C is O (N2D), the time complexity of computing B is O (c2D) and the time complexity of solving the eigenvalues of ML B MT is O (D3). The time consumption of figuring out B can be ignored for c is usually much smaller than N. Therefore, the total time complexity is O (N2D + D3). In the method, the matrix C and ML B MT needs to be stored; therefore the space complexity is O (N2 + D2).
The method of determining the dimension d of the low dimensional space
The matrix P consists of the corresponding eigenvectors of the first largest d eigenvalues of ML B MT. Therefore, it is very important for the following classification to select a proper d. At present, there is not a better method to determine d. Most of the methods are both to select d in a certain range through experiments. In PCA, d is estimated by calculating the cumulative contribution rate of eigenvalues. Here, we use the idea of PCA to choose d. Firstly, all eigenvalues are sorted by the descending order. Then the contribution rate of each eigenvalue is calculated. Lastly, the corresponding eigenvectors can express the most important information to distinguish classes when the cumulative contribution rate is more than 90%. In next section, the experimental results also show that the accuracy is relatively high when the cumulative contribution rate is more than 95%.
Experimental results
To verify the effectiveness of SMML, it is tested on FERET face image database, ORL (Olivetti research laboratory) face image database and the pavement distress image database. And we compare the method with MMDA, LDA and LPP. FERET contains 200 classes. Each class has 7 samples, and each image size is 80*80. ORL includes 400 images, which consists of different face images of 40 people. Each image is 256 gray levels and its size is 112*92. The pavement distress image database includes 360 samples, which is divided into 4 categories: transverse cracks, longitudinal cracks, alligator cracks, and block cracks. Each category has 90 images and each image size is 70*93.
Experiments include four parts. The first part is to test the effectiveness of the method of determining the dimension of the low dimensional features. Other three parts are to validate the effectiveness of SMML on FERET, ORL and the pavement distress image database. On the pavement distress image data set, SMML is compared with the 2D projection, the fusion density factor and the invariant moment besides MMDA, LDA and LPP. In experiments, we firstly obtain the matrix P with SMML, MMDA, LDA and LPP. Then the low dimensional features of the training set and the testing set are computed according to P. Lastly, samples are recognized on the low dimensional features with KNN and SVM. And the results from different methods are compared.
Testing the method to determine the dimension of the low dimensional features
In the section, the method of determining the dimension of the low dimensional features is verified on ORL. In experiments, the first 5 images of each class are used as the training samples and the remains are taken as the testing samples. Thus, there are 200 training samples and 200 testing samples. Each image is rescaled to 40*40. Firstly, the eigenvalues of ML B MT is calculated. Then the cumulative contribution rate of the eigenvalues is figured out, and the corresponding eigenvectors are taken as the low features to be recognized with KNN. Finally, the recognition accuracy is calculated. The results are shown in Fig. 1. In KNN, K = 1. In Fig. 1, the vertical coordinate denotes the recognition accuracy, and the horizontal coordinate is the cumulative contribution rate. Figure 1(a) gives the recognition accuracy of the cumulative contribution rate from 0% to 100%. Figure 1(b) is the local zoom of Fig. 1(a). It gives the growth trend of the recognition accuracy of the accumulation contribution rate from 80% to 100%.
From Fig. 1(a), it could be concluded that the accuracy stepwise increases with the increasing cumulative contribution rate and it basically tends to stable when the cumulative contribution rate reaches to 80%. From Fig. 1(b), the accuracy gradually increases when the rate is from 80% to 90%, and it becomes stable when the rate is from 90% to 95%. The accuracy reaches the highest when the rate arrives at 95%. Therefore, the number of eigenvalues is taken as the dimension of the low dimensional features in next experiments when the cumulative contribution rate is higher than 95%. The corresponding eignvectors of the first largest d eigenvalues are taken as the low dimensional features. According to the method, d = 39 on ORL, d = 185 on FERET, and d = 3 on the pavement distress images in next experiments.
Recognition results on ORL
ORL face data set is divided into two subsets: the training set and the testing set. We respectively select the first 3,4,5,6 samples as the training set. The remaining samples are as the testing set. Firstly, the matrix P is obtained by mapping the training samples into the low dimensional space with SMML, MMDA, LDA and LPP. Then the low dimensional coordinates of the testing samples are calculated according to P. Finally, the testing samples are classified with SVM and KNN. In SMML and MMDA t = 0.03*1600. And d = 39 in SMML, MMDA and LDA. The data are reduced the dimension with PCA because the within-class matrix is singular in LPP. The cumulative contribution rate is 80%, d = 20 and k = 4 in LPP + PCA. Experimental results are shown in Tables 1 to 5.
The number K of the nearest neighbors and the distance measure need to be determined while using KNN to identify the samples. To improve the recognization results, Euclidean distance and the angle cosine distance are compared. And the results from different K are compared. Here, K takes from 1 to 5. In Tables 2 to 5, the first column of each method is the results from Euclidean distance and the second column is the results from the angle cosine distance. “ED” denotes Euclidean distance and “CD” is the angle cosine distance.
From the experimental results, SSML obtains the best results for l = 3, 5, 6 when SVM is used to classify the low dimensional samples. And the results from SSML are close to those from MMDA and LDA for l = 4. The accuracy from four methods gradually decreases with the increasing K in KNN. In most cases, the accuracy is the highest when K = 1. The highest accuracy for l = 3, 4, 5, 6 is in Table 6. From Table 6, the accuracy from SMML is the highest for most cases on ORL. But the advantage is not obvious. The results from Euclidean distance are superior to those from the angle cosine distance. In SMML, there exist no singular matrixes.
Recognition results on FERET
On FERET, the first 3,4,5,6 samples of each class are selected as the training samples and the remaining samples are as the testing samples. Each image is rescaled to 80*80. The experimental method is the same as that in the Section 4.2. In SMML and MMDA, t = 0.03*6400 and d = 185. In LPP+PCA, the cumulative contribution rate is 90%, k = 7 and d = 68. In LDA d = 193. In KNN, K is from 1 to 5. Euclidean distance and the angle cosine distance is used. Results are shown in Tables 7 to 11.
From the experimental results, the results from SSML are the highest for l = 3, 4, 5, 6 when SVM is used to classify the low dimensional samples. And the results from SSML are great superior to those from other three feature extracted methods. When the low dimensional samples are classified with KNN, the highest accuracy of the various methods is achieved for K = 1 and the results from the angle cosine distance are better than those from Euclidean distance. The highest accuracy of the various methods is given in Table 12 for different cases.
As shown in Table 12, the results from SMML are superior to those from other methods, especially the advantage is more obvious when the number of the training samples is bigger than 3. The main reason is that it doesn’t suffer from the singular matrix while solving the eigenvalues with SMML. The sub-manifolds of different classes are separated as far as possible while projecting the face images into the low dimensional space.
Recognition results on the pavement distress images
The pavement distress image data set is divided into two subsets: the training set and the testing set. The former consists of 280 samples and each class has 70 samples. The latter includes 80 samples and each class has 20 samples. Experimental method is the same as that in the Section 4.2. In SMML, t = 0.03*6510 and d = 3 because the cumulative contribution rate of the first three eigenvalues reaches to 99%. In MMDA, the values of t and d are the same as those in SMML. In LPP + PCA, k = 7 and d = 62. In LDA d = 3. In KNN, K takes from 1 to 5. Euclidean distance and the angle cosine distance are compared. The results are shown in Tables 13 and 14.
From the results, the recognition accuracy from SMML is higher than that from other methods when SVM and KNN are used to classify the low dimensional samples. And the results from Euclidean distance are greatly superior to those from the angle cosine distance in KNN. The accuracy is much higher than other methods while using Euclidean distance and the results are close when K takes from 1 to 5. The results from SMML are close to those from LLP + PCA when the angle cosine distance is used in KNN. And the results are superior to those from other methods. It shows that SMML can extract more effective features from the pavement distress images.
We call the low dimensional feature extracted with SMML as multi-manifold features. To further verify the effectiveness of the multi-manifold feature, we compare SMML with the traditional pavement feature extraction methods. Firstly, the multi-manifold feature is compared with coordinate projection [35], fusion density factor [36] and 2-order invariant moment [37]. Then these features are fused and the results are compared. In experiments, we firstly extract two projection features, four fusion density factors and two 2-order invariant moments. Then they are combined with different means. KNN is used and K = 5. The results are shown in Table 15.
As shown in Table 15, the accuracy is improved by fusing several kinds of features. But it is far lower than that from SMML. The main reason is that the traditional pavement feature extraction method only extracts one aspect of features from pavement images. For example, the two dimensional projection only considers the projection difference; the fusion density factors only account for the damage density, and 2-order invariant moment only takes into account the geometrical features of the image region. These methods cannot all adequately extract the effective information of pavement features. To further discuss the effect of different feature extraction methods on the results, we give the distribution of coordinate projection, fusion density factor, 2-order invariant moment and multi-manifold features in the low dimensional space. The results are shown in Fig. 2.
Figure 2 gives the within-class compactness and the between-class distance of the feature distribution, which will both affect the classification results. The bigger the distance between the classes is, the more obvious the boundaries between the classes are and the better the recognition results are. The low dimensional features from coordinate projection, 2-order invariant moment and fusion density factor have a bigger crossover between various classes and the within-class distribution is looser, which affect the classification results. The multi-manifold learning features decrease the crossover of various sub-manifolds in the low dimensional space by maximizing the between-class distance and increasing the within-class compactness. Therefore, SMML can obtain higher accuracy.
Conclusion
In the era of big data, images have very high dimension, which has seriously affected the recognition results and efficiency of big image data. It is an important method to deal with the high dimensional data by reducing the dimension of images. A supervised multi-manifold learning method is put forward, which builds an explicit mapping between two spaces by maximizing the between-class Laplacian scatter matrix to seek the projection matrix. It overcomes the problem of the singular matrix existing in MMDA and LDA. Experimental results on the pavement images and face images show that the recognition accuracy of SMML is superior to that of other methods. Experimental results on ORL also demonstrate that it is effective to estimate the dimension of the low dimensional features by using the cumulative contribution rate of eigenvalues. Next, we will use the method in other fields such as text classification and further validate its effectiveness.
Footnotes
Acknowledgments
This work was supported in part by a grant from NSF of Hebei province of China (No. F2016202144), TSTC of Tianjin of China (Nos. 14JCZDJC31600 and 13JCQNJC00200) and KPSTRHE of Hebei province of China (No. ZD2014030).
