Application of image recognition in civil aviation security based on tensor learning

Abstract

Image recognition is a hot topic in the field of computer vision and pattern recognition, it is widely used in identification, automatic control, human-computer interaction systems. With the development of civil aviation, image recognition has become an important tool to ensure civil aviation security. In this article, firstly, tensor is used to represent the image, which can preserve more structure information of image than traditional vector representation. Then, combining a new tensor distance (NTD) and multilinear discriminant subspace analysis (MLDSA), a novel dimensionality reduction approach named NTD-MLDSA is proposed, and the transformation matrices can be obtained by employing an iterative strategy. Different from the Euclidean distance (ED), which bases on orthogonal assumption, NTD takes into account the spatial relationships of elements and can reflect the real distance between tensors. Experimental results show that the propose approach is more appropriate for dimensionality reduction of image objects than other classical dimension reduction methods, based on benchmark recognition databases Yale, ORL and USPS, the low dimensional data obtained by NTD-MLDSA improves the classification accuracy.

Keywords

Image recognition aviation security dimensionality reduction new tensor distance discriminant subspace analysis

1 Introduction

With the rapid development of civil aviation, civil aviation security also is facing many challenges, such as the increased passenger traffic, the huge airport security workload and the sharp increase in the number of flights. Owing to the traditional manual identification already unable to meet the current needs, how to recognize automatically has become the main issue for many researchers [27], and image recognition is an important part of it. The development of civil aviation is closely related to the progress of image recognition technology.

Image recognition is essentially a classification problem [19]. To improve the recognition accuracy of the image, it needs to solve the following two problems:

Which data form can be used to represent the image reasonably [8, 11].

Image exists in a high-dimensional space. Classification methods operating directly on this space suffer from the so-called curse of dimensionality [6]. Handling high dimensional samples is computationally expensive and need large storage requirement, it easily leads to high dimensionality small-sample-size (S3) problems, and many classifiers have poor performance in high dimensionality S3 problems.

In computer vision research, the natural representation of image is tensor, for example, as shown in Fig. 1: a gray image can be represented by a second order tensor (matrix).

However, the traditional image recognition methods always represented image by vector form, or scanned tensor into vector [5 , 22], they will increase the computational costs enormously and seriously destroy the structure information of the tensor data [7 , 17], as shown in Fig. 2, (a) is the row-based vectorization result, and obviously, the spatial relationship between pixels x₍₂₎ and x₍₃₎ is not well preserved. The similar result of x₍₁₎ and x₍₂₎ also appears in (b), which is after column-based vectorization.

Fig.1

A gray image can be represented by a second order tensor (matrix).

Fig.2

Vectorization of a matrix: (a) Row-based vectorization of an image; (b) Column-based vectorization of an image.

In order to avoiding the curse of high dimensionality, before image recognition, it needs to reduce the dimensionality of image adequately [1].

With the development of tensor theory and machine learning, making multilinear algebra theory into dimensionality reduction for tensor has been introduced [13]. Since then, many methods to reduce the dimensionality of tensor have been proposed. Several classical approaches are as follows: tensor principal component analysis (Tensor PCA) and tensor linear discriminant analysis (Tensor LDA) [2], both of them reduce the high-dimensional matrix to a low-dimensional matrix. As the extension from the second order to high order, Lu and Yan proposed multilinear principal component analysis (MPCA) [7] and multilinear discriminant analysis (MLDA) [17], respectively. Using different preserving embedding strategy, different methods are presented, such as, tensor-based locality preserving projections (TLPP) [4], tensor based neighborhood preserving embedding (TNPE) [2], tensor based graph embedding framework [26] and so on [15 , 25].

Otherwise, the performance of dimensionality reduction for tensor data does not only depend on the strategies but also closely relate to the distance metric [24]. Currently, there are many methods to define the distance between tensors, such as tangent distance [14] and the generalized Hausdorff distance [3]. Up to now among different tensor distance metrics, due to the simplicity of Euclidean distance (ED), all the dimensionality reduction methods mentioned above are based on ED which is constrained by the orthogonality assumption. Unfortunately, this orthogonality assumption ignores the correlations among the elements in the tensor data, such as the spatial relationships of pixels in images, thus affects the performance of the dimensionality reduction algorithm for tensor. In order to modify the deficiency of tensor distance (TD) for measuring the distance between tensors, in this paper, a new tensor distance (NTD) is proposed which can be regarded as a generalization of ED. Different from ED, NTD considers the relationships among different coordinates of the tensor. Based on NTD, it further presents a novel dimensionality reduction method for tensor called NTD based Multilinear Discriminant Subspace Analysis (NTD-MLDSA). NTD-MLDSA tries to find the most discriminative tensor subspace. By projecting the data points into the tensor subspace, the data points of different classes are farther from each other while data points of the same class are closer to each other. The proposed framework is shown in Fig. 3.

Fig.3

Block diagram of NTD-MLDSA system for image recognition.

The rest of this paper is organized as follows. Section 2 proposes the NTD metric and compares it with ED. The model and algorithm of NTD-MLDSA are presented in Section 3. The results of experiments are reported in Section 4. Finally, Section 5 gives conclusions and feature work.

2 Distance metric

In this section, it begins with a brief review of the Euclidean Dstance (ED) and analyzes its defect for defining the distance for gray images. Then, a new tensor distance (NTD) metric is introduced and an analysis of it is given.

2.1 Euclidean distance

The matrices X and Y represent two I₁ by I₂ gray images, respectively, x_(kl) and y_(kl) are the gray levels at location (k, l). Then ED between X and Y is as follows which is the same as the vectors x and y.

$\begin{matrix} d_{E} (X, Y) \\ = \sqrt{〈 X - Y, X - Y 〉} = \sqrt{(x - y)^{T} (x - y)} \\ = \sqrt{\sum_{k = 1}^{I_{1}} \sum_{l = 1}^{I_{2}} (x_{(kl)} - y_{(kl)})^{2}} = d_{E} (x, y), \end{matrix}$ (1) where x = vec (X) and y = vec (Y) are the vector-form representation of X and Y.

From the Definition of (1), it is obviously that the weight of (x_(kl) - y_(k′l′)) ² is one if the location (k, l) is the same as (k′, l′), otherwise, the weight is zero. From another point of view of (1), the vector x is represented by the elements x₍₁₎, ⋯ , x_(I₁×I₂) under the corresponding bases e₁, ⋯ , e_I₁×I₂, where ${\begin{matrix} e_{k}^{T} e_{l} = 1 & if k = l; \\ e_{k}^{T} e_{l} = 0 & otherwise . \end{matrix}$ (2)

In the Euclidean space, (2) indicates that any two different bases e_k and e_l are assumed to be mutually perpendicular, so that the elements x_(k) and x_(l) are independent of each other. Unfortunately, this orthogonality assumption ignores the correlation among different coordinates for data point, such as the spatial relationships of pixels in gray images. According to the follows example, it indicates that ED cannot reflect the real distance between tensor data.

Figure 4 shows three gray images which are normalized to a resolution of 32 × 32 pixels, and uses matrices A, B and C to represent them, where A and B are in the same class and C is in the other class. Based on the idea that the data points of the same class should be close to each other while the data points of different classes should be far from each other, a reasonable tensor distance should present smaller distance between A and B than that of A and C, but ED gives a counter result. Through the formula (1), it yields that d_E (A, B) =7.9804 and d_E (A, C) =7.6414. This phenomenon indicates that A and B in the same class have a larger distance. This result is unreasonable and caused by the orthogonal assumption of ED. In other words, for images, ED only considers the value of the pixels but not the spatial relationships between pixels.

Fig.4

Three grays images of two individuals in the Yale face dataset.

2.2 New tensor distance

In [12], Wang proposed a new Euclidean distance for images—IMED, which uses the vector to represent the image and considers the spatial relationships of pixels. Inspired by this work, a distance metric called New Tensor Distance (NTD) is proposed, which not only considers the value of the pixels but the spatial relationships between them.

Let p_i (i = 1, ⋯ , I₁ × I₂) represent the pixels of the image, if p_i is at location (k, l) (corresponding to the element x_(kl) of the matrix) and p_j is at location (k′, l′) (corresponding to the element x_(k′l′) of the matrix), then $| p_{i} - p_{j} | = \sqrt{(k - k^{'})^{2} + (l - l^{'})^{2}},$ (3) denotes the location distance between p_i and p_j, i.e. the location distance between x_(kl) and x_(k′l′).

In the following, it will introduce the metric coefficients. According to [12], the metric coefficients g_(ij) and |p_i - p_j| should satisfy the following relationship: $g_{(ij)} = f (| p_{i} - p_{j} |), i, j = 1, 2, \dots, I_{1} \times I_{2} .$ (4)

Then the NTD between X and Y is given by

$\begin{matrix} d_{NTD} (X, Y) & = & \sqrt{\sum_{i, j = 1}^{I_{1} \times I_{2}} g_{(ij)} (x_{(i)} - y_{(i)}) (x_{(j)} - y_{(j)})} \\ = & \sqrt{(x - y)^{T} G (x - y)}, \end{matrix}$ (5) where G = (g_(ij)) _{I₁I₂×I₁I₂} is the metric matrix, x_(i) is the ith element in the vector x = vec (X), which correspond to the element x_(kl) (at the location (k, l)) of matrix X, and x_(j) has the same interpretation. In order to preserve the positive definite property of the metric matrix G and satisfy the three conditions mentioned in [12], the function f must be the positive definite function [9], which for arbitrary n and p₁, ⋯ , p_n, the matrix (f (|p_i - p_j|)) _n×n is positive definite. The most popular and important positive definite function is the Gaussian function, $g_{ij} = f (| p_{i} - p_{j} |) = \frac{1}{2 π σ^{2}} e^{- \frac{{| p_{i} - p_{j} |}^{2}}{2 σ^{2}}},$ (6) where σ is the width parameter. Based on (6), d_NTD (X, Y) can be rewritten as

$\begin{matrix} d_{NTD} (X, Y) \\ = \sqrt{\frac{1}{2 π σ^{2}} \sum_{i, j = 1}^{I_{1} \times I_{2}} e^{- \frac{{| p_{i} - p_{j} |}^{2}}{2 σ^{2}}} (x_{(i)} - y_{(i)}) (x_{(j)} - y_{(j)})}, \end{matrix}$ (7)x_(i) and y_(i) are the ith component of x = vec (X) and y = vec (Y), respectively.

2.3 Analysis of the matrix distance

Actually, NTD can be seen as a generalized case of ED which not only contains the value of the elements but the location distance, in other words, when the metric matrix G is an identity matrix, NTD is reduced to ED. In order to save the computational cost for large database, based on the positive definite property of the matrix G, it can be easily decomposed as follows: $G = G^{1 / 2} G^{1 / 2},$ (8) applying the transformation G^1/-2 to (5), and defining x′ = G^1/-2x, y′ = G^1/-2y, the proposed NTD between X and Y is reduced to ED between x′ and y′:

$\begin{matrix} d_{NTD} (X, Y) & = & \sqrt{{(x - y)}^{T} G (x - y)} \\ = & \sqrt{{(x - y)}^{T} G^{1 / 2} G^{1 / 2} (x - y)} \\ = & \sqrt{{(x^{'} - y^{'})}^{T} (x^{'} - y^{'})} . \end{matrix}$ (9)

Using NTD (7), the new tensor distance in Fig. 4 yield d_NTD (A, B) =4.3198 and d_NTD (A, C) =6.1824, which indicates that the similar images have the smaller distance, in contrast, the dissimilar images have the larger distance. Therefore, the new metric—NTD provides intuitively reasonable results.

3 Multilinear discriminant subspace analysis based on the new tensor distance

In this section, based on the new tensor distance, it presents a novel supervised dimensionality reduction method NTD-MLDSA. To preserve the structure information of tensor, it works directly on tensor point and iteratively learns the transformation matrices. In order to maintain the manifold structure, NTD-MLDSA considers both the similarity and dissimilarity between the whole data points.

Given a set $𝕏$ contains $\bar{L}$ data points in the 2nd-order tensor space $ℝ^{I_{1}} \otimes ℝ^{I_{2}}$ coming from C classes: $𝕏 = {\begin{matrix} X_{1}^{1}, X_{2}^{1}, \dots, X_{L_{1}}^{1}, X_{1}^{2}, X_{2}^{2}, \dots, X_{L_{2}}^{2}, \\ \dots, X_{1}^{C}, X_{2}^{C}, \dots, X_{L_{C}}^{C} \end{matrix}},$ (10) where $X_{m}^{c} \in ℝ^{I_{1} \times I_{2}}$ is the mth point in the cth class, L_c is the number of points in the cth class, and $\sum_{c = 1}^{C} L_{c} = \bar{L}$ . NTD-MLDSA aims to find two transformation matrices $U_{1} \in ℝ^{I_{1}} \otimes ℝ^{L_{1}}$ and $U_{2} \in ℝ^{I_{2}} \otimes ℝ^{L_{2}} (L_{r} ⪡ I_{r}, r = 1, 2)$ such that $\bar{L}$ low-dimensional data points $Y_{1}^{1}, \dots, Y_{L_{C}}^{C}$ in the space $ℝ^{L_{1}} \otimes ℝ^{L_{2}}$ can be obtained by

$\begin{matrix} Y_{m}^{c} & = & U_{1}^{T} X_{m}^{c} U_{2}, m = 1, 2, \dots, L_{c}, \\ c = 1, 2, \dots, C . \end{matrix}$ (11)

Base on the idea that the data points of different classes should be far from each other while data points of the same class should be close to each other. If the two points $X_{m}^{c}$ and $X_{n}^{c}$ in the same class, then the corresponding projected points $Y_{m}^{c}$ and $Y_{n}^{c}$ should be close. Then the following optimization problem is obtained: $min_{U_{1}, U_{2}} \sum_{c = 1}^{C} \sum_{m, n = 1}^{L_{c}} {∥ Y_{m}^{c} - Y_{n}^{c} ∥}_{F}^{2} W_{(mn)}^{c},$ (12)

where $W_{(mn)}^{c} = exp (- {∥ X_{m}^{c} - X_{n}^{c} ∥}_{NTD}^{2} / t)$ is the similarity between the samples $X_{m}^{c}$ and $X_{n}^{c}$ , where ∥ · ∥ _NTD is the new tensor distance (NTD) between tensors. $W^{c} = {[W_{(mn)}^{c}]}_{L_{c} \times L_{c}}$ is the within-class similarity matrix of cth class. Additionally, for the different classes, the reasonable criterion is that: $max_{U_{1}, U_{2}} \sum_{α, β = 1}^{C} {∥ {\bar{Y}}_{α} - {\bar{Y}}_{β} ∥}_{F}^{2} B_{(α β)},$ (13) where $B_{(α β)} = exp (- {∥ {\bar{X}}_{α} - {\bar{X}}_{β} ∥}_{NTD}^{2} / t)$ is the similarity between the mean points ${\bar{X}}_{α} = \frac{1}{L_{α}} \sum_{m = 1}^{L_{α}} X_{m}^{α}$ and ${\bar{X}}_{β} = \frac{1}{L_{β}} \sum_{m = 1}^{L_{β}} X_{m}^{β}$ . B = [B_(αβ)] _L×L is the between-class similarity matrix.

The optimization problem of NTD-MLDSA can be deduced from the above: $max_{U_{1}, U_{2}} \frac{\sum_{α, β = 1}^{C} {∥ {\bar{Y}}_{α} - {\bar{Y}}_{β} ∥}_{F}^{2} B_{(α β)}}{\sum_{c = 1}^{C} \sum_{m, n = 1}^{L_{c}} {∥ Y_{m}^{c} - Y_{n}^{c} ∥}_{F}^{2} W_{(mn)}^{c}},$ (14)

According to ${∥ A ∥}_{F}^{2} = tr (A^{T} A)$ , the objective function of (12) can be rewritten as

$\begin{matrix} \frac{1}{2} \sum_{c = 1}^{C} \sum_{m, n = 1}^{L_{c}} {∥ Y_{m}^{c} - Y_{n}^{c} ∥}_{F}^{2} W_{mn}^{c} \\ = \frac{1}{2} \sum_{c = 1}^{C} \sum_{m, n = 1}^{L_{c}} tr [(Y_{m}^{c} - Y_{n}^{c})^{T} (Y_{m}^{c} - Y_{n}^{c})] W_{mn}^{c} \\ = tr [\frac{1}{2} \sum_{c = 1}^{C} \sum_{m, n = 1}^{L_{c}} (U_{1}^{T} X_{m}^{c} U_{2} - U_{1}^{T} X_{n}^{c} U_{2})^{T} (U_{1}^{T} X_{m}^{c} U_{2} - U_{1}^{T} X_{n}^{c} U_{2}) W_{mn}^{c}] \\ = tr {U_{2}^{T} [\frac{1}{2} \sum_{c = 1}^{C} \sum_{m, n = 1}^{L_{c}} (U_{1}^{T} X_{m}^{c} - U_{1}^{T} X_{n}^{c})^{T} (U_{1}^{T} X_{m}^{c} - U_{1}^{T} X_{n}^{c}) W_{mn}^{c}] U_{2}} \\ = tr {U_{2}^{T} {\sum_{c = 1}^{C} \sum_{m, n = 1}^{L_{c}} [(U_{1}^{T} X_{m}^{c})^{T} (U_{1}^{T} X_{n}^{c}) - (U_{1}^{T} X_{m}^{c})^{T} (U_{1}^{T} X_{n}^{c})] W_{mn}^{c}} U_{2}} \\ = tr {U_{2}^{T} {\sum_{c = 1}^{C} [\sum_{m = 1}^{L_{c}} (U_{1}^{T} X_{m}^{c})^{T} (U_{1}^{T} X_{m}^{c}) \sum_{n = 1}^{L_{c}} W_{mn}^{c} - \sum_{m, n = 1}^{L_{c}} (U_{1}^{T} X_{m}^{c})^{T} (U_{1}^{T} X_{n}^{c}) W_{mn}^{c}]} U_{2}} \\ = tr {U_{2}^{T} {\sum_{c = 1}^{C} [(P_{U_{1}}^{c})^{T} (D^{c} \otimes I_{L_{1}}) P_{U_{1}}^{c} - (P_{U_{1}}^{c})^{T} (W^{c} \otimes I_{L_{1}}) P_{U_{1}}^{c}]} U_{2}} \\ = tr {U_{2}^{T} P_{U_{1}}^{T} [(D - W) \otimes I_{L_{1}}] P_{U_{1}} U_{2}} \\ = tr {U_{2}^{T} P_{U_{1}}^{T} [\tilde{L} \otimes I_{L_{1}}] P_{U_{1}} U_{2}} \\ = tr {U_{2}^{T} S_{L}^{U_{1}} U_{2}} \end{matrix}$ (15) where $\begin{matrix} S_{L}^{U_{1}} & = & P_{U_{1}}^{T} (\tilde{L} \otimes I_{L_{1}}) P_{U_{1}}, \\ P_{U_{1}} & = & {[(P_{U_{1}}^{1})^{T}, \dots, (P_{U_{1}}^{C})^{T}]}^{T}, \\ P_{U_{1}}^{c} & = & {[(U_{1}^{T} X_{1}^{c})^{T}, \dots, (U_{1}^{T} X_{L_{c}}^{c})^{T}]}^{T}, \end{matrix}$ I_{L
₁} is a L₁ × L₁ identity matrix, ⊗ is the Kronecker product [18], D^c is a diagonal matrix and its entries are $D_{(mm)}^{c} = \sum_{n} W_{(mn)}^{c}$ , D = diag (D¹, D², ⋯ , D^C), W = diag (W¹, W², ⋯ , W^C) and $\tilde{L} = D - W$ .

Similarly, using the same technique, the objective function of (13) can be rewritten as $\frac{1}{2} \sum_{α, β = 1}^{C} {∥ {\bar{Y}}_{α} - {\bar{Y}}_{β} ∥}_{F}^{2} B_{(α β)} = tr (U_{2}^{T} S_{H}^{U_{1}} U_{2}),$ (16) where $\begin{matrix} S_{H}^{U_{1}} & = & Q_{U_{1}}^{T} (H \otimes I_{L_{1}}) Q_{U_{1}}, \\ Q_{U_{1}} & = & {[(U_{1}^{T} {\bar{X}}_{1})^{T}, \dots, (U_{1}^{T} {\bar{X}}_{C})^{T}]}^{T}, \end{matrix}$ E is a diagonal matrix and its entries are E_(αα) = ∑_βB_(αβ), B = diag (B¹, B², ⋯ , B^C) and H = E - B.

Therefore, the optimization problem (14) can be converted into the following problem $max_{U_{1}, U_{2}} \frac{tr (U_{2}^{T} S_{H}^{U_{1}} U_{2})}{tr (U_{2}^{T} S_{L}^{U_{1}} U_{2})} .$ (17)

For a given U₁, it is easy to see that the optimal U₂ can be obtained by the following generalized eigenvalues problem: $S_{H}^{U_{1}} U_{2} = λ S_{L}^{U_{1}} U_{2},$ (18) the L₂ generalized eigenvectors corresponding to the largest L₂ generalized eigenvalues consists the matrix $U_{2} = [\begin{matrix} u_{2}^{1}, & u_{2}^{2}, & \dots, & u_{2}^{L_{2}} \end{matrix}]$ .

In the same way, for a given U₂, the optimal U₁ can be obtained by $S_{H}^{U_{2}} U_{1} = λ S_{L}^{U_{2}} U_{1},$ (19) the L₁ generalized eigenvectors corresponding to the largest L₁ generalized eigenvalues consists the matrix $U_{1} = [\begin{matrix} u_{1}^{1}, & u_{1}^{2}, & \dots, & u_{1}^{L_{1}} \end{matrix}]$ ,

where $\begin{matrix} S_{H}^{U_{2}} & = & Q_{U_{2}} (H \otimes I_{L_{2}}) Q_{U_{2}}^{T}, \\ S_{L}^{U_{2}} & = & P_{U_{2}} (\tilde{L} \otimes I_{L_{2}}) P_{U_{2}}^{T}, \\ P_{U_{2}} & = & [P_{U_{2}}^{1}, \dots, P_{U_{2}}^{C}], \\ P_{U_{2}}^{c} & = & [X_{1}^{c} U_{2}, \dots, X_{L_{c}}^{c} U_{2}], \\ Q_{U_{2}} & = & [{\bar{X}}_{1} U_{2}, \dots, {\bar{X}}_{C} U_{2}], \end{matrix}$ I_{L
₂} is a L₂ × L₂ identity matrix, ⊗ is the Kronecker product.

From the above analysis, it is easy to see that the optimizations of U₁ and U₂ depend on each other. Therefore, an alternating iterative algorithm can be utilized to get the optimal U₁ and U₂. Table 1 describes the procedure of the algorithm for NTD-MLDSA.

Table 1

The procedure of the alternating iterative algorithm for NTD-MLDSA

Input: Training dataset

𝕏 = {X_{m}^{c} \in ℝ^{I_{1} \times I_{2}}}_{m = 1, \dots, L_{c}}^{c = 1, \dots, C}

, maximum iteration number T_max, width parameter σ, low dimensions L₁ and L₂, the threshold parameter ɛ.

Output: Transformation matrices

U_{1} \in ℝ^{I_{1}} \otimes ℝ^{L_{1}}

and

U_{2} \in ℝ^{I_{2}} \otimes ℝ^{L_{2}}

1. Construct

{W_{(mn)}^{c}}_{m, n = 1, \dots, L_{c}}^{c = 1, \dots, C}

{B_(αβ)} _{α,β=1,⋯,C} based on NTD (7);

2. Initialize t = 0,

U_{1}^{0} = I_{I_{1}}

;

3. For t = 0, ⋯ , T_max do;

4. According to

S_{H}^{U_{1}^{t}} U_{2} = λ S_{L}^{U_{1}^{t}} U_{2}

, it can get

U_{2}^{t}

;

5. According to

S_{H}^{U_{2}^{t}} U_{1} = λ S_{L}^{U_{2}^{t}} U_{1}

, it can get

U_{1}^{t + 1}

;

6. If

{∥ U_{1}^{t + 1} - U_{1}^{t} ∥}_{F} \leq ɛ

and

{∥ U_{2}^{t + 1} - U_{2}^{t} ∥}_{F} \leq ɛ

;

7. Break;

8. End for.

4 Experimental results

In this section, it evaluates NTD-MLDSA using the gray image binary pattern classification tasks on the following three benchmarking databases: Yale face database, Olivetti and Research Laboratory (ORL) face database, and the United State Postal Service (USPS) digit database. In order to verify the effectiveness of our proposed method, in some experiment, it compares the performance with other classic dimensionality reduction methods, such as tensor PCA, tensor LDA, PCA, LDA and so on, the best test accuracy are highlighted in bold type.

The experimental process is composed of two main steps: First, the sample points are embedded into $\hat{L}$ -dimensional subspace (vector-based methods) or (L₁ × L₂)-dimensional subspace (tensor-based methods). Second, k nearest-neighbor classifier is applied in low-dimensional subspace for classification. In order to facilitate a fair comparison, each attribute of the datasets is linearly scaled in the range [0, 1], the threshold parameter ɛ is set to 10^-5 and the maximum number of iterations T_max is 100, t in $W_{(mn)}^{c}$ and B_(αβ) is set as 1000, for our proposed NTD, it simply sets σ = 1. It should be noted that it sets L₁ = L₂ = L in all tenor-based methods. It repeats each experiment 20 times on randomly selects training and test datasets, and calculates the average recognition accuracy. The accuracy is defined as follows: $Accuracy = \frac{\sum_{c = 1}^{C} {NT}_{c}}{\sum_{c = 1}^{C} {NT}_{c} + \sum_{c = 1}^{C} {NF}_{c}},$ (20)

NT_c and NF_c represent the number of sample points in the cth class which are correctly and falsely classified, respectively.

All the experiments are carried out on a computer with Inter Core (TM) 2 1.8 GHz processor and 2 GB main memory running Microsoft Windows 7. All the programs are written in MATLAB language and compiled using the MATLAB 2010a.

4.1 Yale face database

The Yale gray face database contains 15 different individuals and each individual has 11 images with different illumination conditions, facial expressions and whether wearing glasses. All images are grayscale and normalized to a resolution of 32 × 32 pixels in our experiments. The sample images of two individuals from the database are shown in Fig. 5.

Fig.5

Sample images from the Yale database.

It conducts three experiments in this database. In the first two experiments, for each individual, p = 2 images with labels (hence, 30 images in total) are randomly selected to form the training set, the rest of the data are considered to be the testing set, and the neighborhood size k = 1.

In the first experiment, it compares NTD-MLDSA and MLDSA with other five representative dimensionality reduction algorithms: LDSA, PCA, PCA+LDA, tensor PCA (t-PCA) and tensor LDA (t-LDA). Table 2 lists the best recognition results and the corresponding optimal reduced dimensions of all algorithms.

Table 2

Comparison recognition accuracy (%) as well as optimal reduced dimensions on Yale database

	Dimension	Recognition accuracy (%)
NTD-MLDSA	4 × 4	76.19
MLDSA	4 × 4	73.44
LDSA	1	18
t-PCA	6 × 6	46.2
t-LDA	22 × 22	52.1
PCA	25	48.3
PCA+LDA	9	45.7

Obviously, NTD-MLDSA performs better than the other algorithms, and this demonstrates that the improvement in recognition accuracy is indeed due to the combination of the NTD and MLDSA, rather than NTD or MLDSA alone.

In the second experiment, in order to show the effectiveness of both the NTD and the tensor representation, it compares the classification accuracy of NTD-MLDSA, MLDSA, LDSA with different reduced dimensions. Figure 6 shows the comparing results.

Fig.6

Recognition accuracy of NTD-MLDSA, MLDSA, LDSA on the Yale database.

In Fig. 6, the horizontal ordinate represents the value of L, the reduced dimensions of the tenor-based methods NTD-MLDSA and MLDSA are from 1 × 1 to 31 × 31, otherwise, the reduced dimensions of the vector-based method LDSA is from 1² to 31², namely, $\hat{L} = L^{2}$ .

By Fig. 6, it can obtain the following conclusions in Fig. 7:

Fig.7

The conclusion of experiments between NTD-MLDSA, MLDSA and LDSA on the Yale database.

The main difference of NTD-MLDSA and MLDSA is the distance measure. To highlight the superiority of the NTD, with the corresponding optimal reduced dimension 4 × 4, the precision comparison between NTD-MLDSA and MLDSA is shown in Fig. 8 with various values of p. It is easy to see that NTD-MLDSA achieves the higher recognition accuracy than that of MLDSA, the maximum increment of the accuracies are 7.33%, and the greatest contribution lies in that NTD is more suitable than ED to measure the distance between the tensor data.

Fig.8

The precision comparison between NTD-MLDSA and MLDSA on the Yale database.

In the third experiment, for each individual, p = 5 images are randomly selected for training (hence, 75 images in total), and the remaining 90 images are used for testing. From Fig. 9, it can see that NTD-MLDSA achieve better results than both MLDSA and LDSA for different reduced dimensionality under various value of k.

Fig.9

Recognition accuracy of NTD-MLDSA, MLDSA and LDSA on the Yale database with various values of k (a) k = 1. (b) k = 2. (c) k = 3. (d) k = 4.

In Fig. 9, the reduced dimensionality of NTD-MLDSA, MLDSA and LDSA are L × L, L × L and $\hat{L} = L^{2}$ , respectively, where L = 1, 2, ⋯ , 31.

The results can be shown as follows:

The Image recognition accuracy of NTD-MLDSA and MLDSA has significant advantages that LDSA. The main reason for this phenomenon is that vector representation destroys the image structure information.

Based on the superior of NTD to measure the distance of tensor, the low dimensional data obtained by NTD-MLDSA can get better classification results that MLDSA.

The results 1 and 2 hardly affect by the neighborhood size k.

4.2 ORL face database

The ORL database contains 400 images of 40 different individuals (10 images for each). Some images have different variations including expression (open or closed eyes, smiling or non-smiling) and facial details (glasses or no glasses) and were captured at different times. All images are grayscale and normalized to a resolution of 64 × 64 pixels in our experiments. The sample images of two individuals from the database are shown in Fig. 10.

Fig.10

Sample images of two individuals from the ORL database.

There are two experiments in this database, with the neighborhood size k = 1, p = 2 images per individual (hence, 80 images in total) are randomly selected for training and the rest are used for test.

In the first experiment, it compares the recognize accuracy of NTD-MLDSA, MLDSA, LDSA, PCA, PCA+LDA, tensor PCA (t-PCA) and tensor LDA (t-LDA) with the corresponding optimal reduced dimensions. The results are shown in Table 3.

Table 3

Performance comparisons on the ORL database

	Dimension	Recognition accuracy (%)
NTD-MLDSA	13 × 13	95.63
MLDSA	8 × 8	92.5
LDSA	3	59.2
t-PCA	10 × 10	69.22
t-LDA	40 × 40	76.59
PCA	79	66.5
PCA+LDA	21	70.16

It is obvious that NTD-MLDSA significantly outperforms the other six representative methods. For the vector-based methods LDSA, PCA and PCA+LDA, the deviations between them and NTD-MLDSA are 36.43%, 29.13% and 25.47%, respectively. It indicates that the tensor representation of the image is more reasonable than vector. Since NTD is more suitable than ED to measure the distance of the tensor data, the deviations between NTD-MLDSA and other three tensor-based methods MLDSA, t-PCA and t-LDA are 3.13%, 26.41% and 19.04%, respectively.

In the second experiment, it mainly analyzes the recognition accuracy of NTD-MLDSA and MLDSA under different reduced dimensionality L × L. In Fig. 11(a), the performance of NTD-MLDSA and MLDSA is similar with various L, they achieve maximum precision 95.63% and 90.34% in L = 13, respectively. In order to reflect the differences between these two approaches, Fig. 11(b) shows the comparing results with the recognition accuracy above 80%, and it also illustrates that NTD is more superior that ED for tensor data.

Fig.11

Precision rate vs. dimensionality reduction of NTD-MLDSA and MLDSA on ORL database.

4.3 USPS digit database

The aircraft registered number is made up of letter and number. Arabic numbers are the major part of it, and from different angles, numbers show different shapes, then, in this section, it uses USPS database to carry out the experiment.

The USPS database of handwritten digital characters (0–9) contains 10 classes and 11000 normalized grayscale images of size 16 × 16, each class has 1100 images. To better understand the matrix structures of USPS, the grayscale images are shown in Fig. 12:

Fig.12

Illustration of the images of USPS recognition database.

It conducts two experiments in this database. For the following two experiments, the whole dataset are randomly split into 11 subgroups, and each subgroup includes 1000 images (100 images for each class). it carries out the experiments on randomly selecting one subgroup, a random subset with 50 images per handwritten digital (hence, 500 images in total) is taken with labels to form the training set, the rest of this group are considered to be the testing set.

In the first experiment, there are seven kinds of algorithms compare in recognition accuracy with the neighborhood size k = 1.

As shown in Fig. 13, NTD-MLDSA, MLDSA, t-LDA and t-PCA have similar recognition accuracy distribution, and NTD-MLDSA reflects a certain advantage. When the value of L is relatively small (L = 1, 2, ⋯ , 6), LDA outperforms NTD-MLDSA, but in the recognition accuracy stability, as show in Fig. 14, LDA is worst and NTD-MLDSA is best, in other word, NTD-MLDSA is most robust than the other six methods for tensor dimensionality reduction.

Fig.13

Recognition accuracy of the seven dimensionality reduction methods on the USPS database.

Fig.14

The standard deviation (SD) of the accuracies.

The quality of an algorithm is not only in the recognition accuracy, but also in the calculating time or the number of iterations. Table 4 formulates the total calculating time (training time and texting time), the size of the matrix of the generalized singular value problems in the algorithm process, and the maximum number of iterations T_max, with L = 1 and k = 1. For vector-based methods LDSA, PCA and LDA, although the size of the generalized singular value matrices is larger, the total calculating times are less than tensor-based methods NTD-MLDSA, MLDSA, t-PCA and t-LDA, the reason for this phenomenon is that LDSA, PCA and LDA is without iteration. Compared with the other three tensor-based methods, neither the total calculating time nor the iterative speed, NTD-MLDSA also has certain advantages. Since NTD-MLDSA needs to compute the transformation matrix G^1/-2, then it costs more total calculating time than MLDSA. For t-LDA, as proposed in [2], since it needs the calculations for inverse matrix in the process of solving the eigenvectors. then, any eigenvector sufficient reaching zero will greatly affect the convergence rate of t-LDA. For this reason, the maximum number of iterations T_max of t-LDA is large than other methods, and the total calculating time is longest.

Table 4

Comparison of the algorithm process

	Time	Size	Iteration	T _max
NTD-MLDSA	5.5 s	16 × 16	Yes	12
MLDSA	4.7 s	16 × 16	Yes	15
LDSA	1.5 s	256 × 256	No	1
t-PCA	6.3 s	16 × 16	Yes	12
t-LDA	2637.3 s	16 × 16	Yes	100
PCA	1.1 s	256 × 256	No	1
PCA+LDA	4.1 s	256 × 256	No	1

In the second experiment, as shown in Fig. 15, for different sizes of dimension reduction subspaces under various values of the neighborhood size k, NTD-MLDSA achieves better recognition accuracy than MLDSA. These results once again indicate that NTD is more reasonable than ED to measure the distance of images.

Fig.15

Recognition accuracy of NTD-MLDSA and MLDSA on the USPS database with various values of k. (a) k = 1. (b) k = 2. (c) k = 3. (d) k = 4.

5 Conclusions and feature work

In the field of civil aviation security, the status of image recognition is becoming increasingly important, it is widely used in the recognition of passenger and aircraft registered number. This paper proposes a novel method NTD-MLDSA for tensor-based dimensionality reduction, which using tensors to represent the images. The research contributions of this article can be expressed as follows:

The objects of NTD-MLDSA are tensor-based data, it can maintain intrinsic structure information of the image.

In order to reflect the accurate distribution and structure of the data, both similarity measure and dis-similarity measure are used in NTD-MLDSA, it not only keep the local information of the datasets, but also the global information.

The new tensor distance is introduced, which can maintain intrinsic structure information and reflect the reasonable distance relationships of images.

Through a series of numerical experiments, the results show that NTD-MLDSA has the better effect of dimension reduction, faster computing and convergence speed.

In this field, there are some works need to explore. Two significative directions are as follows:

Reducing the time cost of NTD-based dimensionality reduction algorithms.

Applying the NTD to other image processing problems.

Footnotes

Acknowledgments

This work was supported by China National Natural Science Foundation of Civil Aviation Joint Fund Project (No. U1533120), and the National Natural Science Foundation of China (Nos. 11371365, 11301535).

References

Raducanu

and Dornaika

, A supervised non-linear dimensionality reduction approach for manifold learning, Pattern Recognition45(6) (2012), 2432–2444.

Cai

, He

X.F.

and Han

J.W.

, Subspace learning based on tensor analysis, Technical report, Computer Science Department, UIUC, UIUCDCS-R-2005-2572, 2005.

Huttenlocher

D.P.

, Klanderman

G.A.

and Rucklidge

W.J.

, Comparing images using the hausdorff distance, IEEE Trans. Pattern Analysis and Machine Intelligence15(9) (1993), 850–863.

Dai

and Yeung

D.Y.

, Tensor embedding methods, In Proc 21st AAAI Conf Artificial Intell, 2006, pp. 330–335.

Foody

G.M.

and Mathur

, A relative evaluation of multiclass image classification by support vector machines, IEEE Transactions on Geoscience and Remote Sensing42(6) (2004), 1335–1343.

Shakhnarovich

Moghaddam

, “Face recognition in subspaces,”in Handbook of Face Recognition, Li

S.Z.

and Jain

A.K.

, Eds., Springer-Verlag, 2004, pp. 141–168.

, Plataniotis

K.N.

and Venetsanopoulos

A.N.

, MPCA: Multilinear principal component analysis of tensor objects, IEEE Trans Neural Netw19(1) (2008), 18–39.

Zhao

H.T.

and Sun

S.Y.

, Sparse tensor embedding based multispectral face recognition, Neurocomputing133 (2014), 427–436.

Schoenberg

I.J.

, Metric spaces and completely monotone functions, Annals of Mathematics39(4) (1938), 811–841.

10.

Kotsia

, Guo

W.W.

and Patras

, Higher rank support tensor machines for visual recognition, Pattern Recognition45 (2012), 4192–4203.

11.

Zhang

J.G.

, Han

Y.H.

and Jiang

J.M.

, Tensor rank selection for multimedia analysis, J Vis Commun Image R30 (2015), 376–392.

12.

Wang

L.W.

, Zhang

and Feng

J.F.

, On the Euclidean distance of images, IEEE Transactions on Pattern Analysis and Machine Intelligence27(8) (2005), 1334–1339.

13.

Vasilescu

M.A.O.

and Terzopoulos

, Multilinear subspace analysis of image ensembles, In Proc IEEE Comput Soc Conf Comput Vis Pattern Recog, 2003, pp. 93–99.

14.

Simard

, Cun

Y.L.

and Dender

, Efficient Pattern Recognition Using a New Transformation Distance, Advances in Neural Information Processing Systems, Hanson

, Cowan

and Giles

, eds., 1993, pp. 50–58.

15.

and Schonfeld

, Multilinear discriminant analysis for higher-order tensor data classification, IEEE Transactions on Pattern Analysis and Machine Intelligence36(12) (2014), 2524–2537.

16.

S.S.

, Jing

X.Y.

, Wei

Z.S.

, Yang

and Yang

J.Y.

, Learning image manifold via local tensor subspace alignment, Neurocomputing139 (2014), 22–33.

17.

Yan

, Xu

, Yang

, Zhang

, Tang

and Zhang

H.-J.

, Mutilinear discriminant analysis for face recognition, IEEE Trans Image Process16(1), 212–220.

18.

Kolda

and Bader

, Tensor decompositions and applications, SIAM Review51(3) (2009), 455–500.

19.

, Tran

and Ma

W.L.

, Tensor decomposition and application in image classification with histogram of oriented gradients, Neurocomputing165 (2015), 38–45.

20.

Gao

and Tian

, Multi-view face recognition based on tensor subspace analysis and view manifold modeling, Neurocomputing72(16–18) (2009), 3742–3750.

21.

Geng

, Smith-Miles

, Zhou

Z.H.

and Wang

, Face image modeling by multilinear subspace analysis with missing values, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics41(3) (2011), 881–892.

22.

and Niyogi

, Locality preserving projections, In Advances in Neural Information Processing Systems 16, MIT Press, Cambridge, MA, 2004.

23.

and Huang

T.S.

, Image classification using correlation tensor analysis, IEEE Trans Image Process2(17) (2008), 226–234.

24.

Lee

Y.K.

and Teoh

A.B.J.

, A Study on Distance Measures of Tensor Manifold for Face Recognition, International Conference on Electronics, Information and Communications, 2014, pp. 1–2.

25.

Liu

, Liu

and Chan

K.C.C.

, Multilinear isometric embedding for visual pattern analysis, In Proc 12th IEEE Int Conf Comput Vis, Workshop Subspace Methods, 2009, pp. 212–218.

26.

Yan

, Xu

, Zhang

H.J.

, Yang

and Lin

, Graph embedding and extensions: A general framework for dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence29(1) (2007), 40–51.

27.

Qian

Y.Z.

, Chen

W.B.

and Shen

I.F.

, Action recognition from pose signature in static image, International Journal of Pattern Recognition and Artificial Intelligence30(03) (2016), 1655010.