Kernel Analysis based on SVDD for Face Recognition from Image Set

Abstract

This paper addresses the problem of Face Recognition based on Image Set (FRIS) by kernel learning and proposed an extended kernel discriminant analysis framework for FRIS. By support vector machine learning, an image set from the original input space is mapped into the model space and described with Support Vector Domain Description (SVDD) to handle the underlying non-linearity in data space. In model space, a hyper-sphere encloses most of the mapped data, and the outliers lie outside the hyper-sphere. By exploring an efficient metric for the data domains in model space, we derive a kernel function maps the data from the model space to a high-dimensional feature space, to which many Euclidean algorithms can be generalized. The proposed method is evaluated on face recognition tasks. Comparisons with several state-of-the-art FRIS methods are performed on ChokePoint and CMU MoBo video database. The proposed methods have demonstrated promising performance.

Keywords

support vector domain description (SVDD)graph embedding discriminant analysis kernel method face recognition

1. Introduction

Recently, Face Recognition based on Image Set (FRIS) attracted increasing attention in the field of pattern recognition [9, 29]. FRIS methods take the entire image collection as a whole to make a global unified model, namely, line subspace, affine hull [4] or manifold [10] etc. Hence, they can fully utilize the information provided by multiple images to obtain better recognition accuracy.

Modelling image sets by linear subspaces has been shown to deliver improved performance in the presence of practical issues such as misalignment as well as variations in pose and illumination [9]. A convenient way of dealing with subspace is to represent them as points on Riemannian manifolds [9–11 , 24]. However, classifiers based on discriminative approaches are not directly applicable to features that lie on Riemannian manifolds. Hence, classification is often performed in an extrinsic manner by first mapping the manifold to an Euclidean space using kernels, and then learning classifiers in the new space [19]. One such popularly approach mappings the manifold to a reproducing kernel Hilbert space (RKHS) by kernels.

The key technology of FRIS include obtaining a suitable data description model to represent an image set and measuring the similarity between pairwise models. Generally, a subspace, Gaussian Mixture Model, or covariance matrix may be a suitable model. As discussed in a previous work [31], by mapping the data points from the input data space to a high dimensional feature space, Support Vector Domain Description (SVDD) [18] finds a smallest sphere enclosing most of the mapped data points in the feature space. However, SVDD is mainly utilized to describe the data set, and it emphasizes on data description ability without considering the discriminant ability.

In general, the success of kernel-based methods is often determined by the choice of kernel [19]. Hence, we address the issue of kernel learning from an image set and propose a kernel based learning framework for face recognition. With support vector machine learning, image set is mapped into high dimension model space, and a new representation model is established. To obtain a better separability between hyper-spheres, we define a novel kernel based on the distance between two domains to map samples into a feature space. Through this kernel function, a measure in the domain space is converted into distance metric in Euclidean space, and the kernel matrix is computed based on a similarity measure and embedded into the graph embedding discriminant analysis framework to enhance the classification performance.

The major contributions of the paper are as follow:

Kernel learning in feature space is discussed and a new kernel based on SVDD is proposed. By mapping the data points from the input data space to a high dimensional model space, it finds the minimal sphere enclosing most of the mapped data points in the model space, and it can generate an arbitrary shape of class boundary, and gives rise to an enhanced classification performance.

Classifiers based on discriminative approaches are not directly applicable to features that lie on model space. After mapping the model to an feature space using kernels, a graph embedding discriminant analysis framework is used to unify different discriminant analysis methods in the feature space.

The remainder of this paper is organized as follows. Section 2 briefly reviews some kernel function on the Grassmann and Riemannian manifold. Section 3 introduces how to model an image set with SVDD, and a new kernel based on SVDD is discussed. Section 4.1 introduced the graph embedding discriminant analysis for FRIS. Section 4.2 integrated Section 2, 3 and 4.1 into a kernel based learning framework for FRIS tasks. Section 5 reports the experimental results on two video based face databases. In this section, we compare the computational complexity of different methods and report the time cost for each method. Section 6 concludes this paper.

2. Image set modeling and kernel learning

Let $X = [x_{1}, x_{2}, . . ., x_{N}] \in ℝ^{D \times N}$ be the matrix representation of an image set consisting of N samples, where x_i ∈ $ℝ$ ^D is the i-th image. To describe the set, a conventional approach is to treat the subspace spanned by X as a point in a Grassman manifold, and model the set as the covariance matrix of X. According to the well known conclusion that all symmetric positive definite matrices(denote as ${Sym}_{d}^{+}$ for short) lie on a Riemannian manifold, hence, assuming the covariance matrix of X is positive definite, the below transformation maps X into a Riemannian manifold $R$ : $φ : X \mapsto cov (X) \in R,$ (1) as the result, for any two images of X, we can measure their similarity with the distance of them in $R$ .

In ${Sym}_{d}^{+}$ space, the distance between C_i and C_j is denoted as Log-Euclidean distance (LED), and calculated by $d_{l e d} (C_{i}, C_{j}) = {‖ \log (C_{i}) - \log (C_{j}) ‖}_{F}$ (2) where ||·|| _F denotes the matrix Frobenius norm [2]. C = UΛU^T denotes the SVD of C = φ (X) = cov (X), matrix C is symmetric, it can be calculated by log(C) = U log(Λ) U^T, where log(Λ) = diag (log λ₁, log λ₂, . . . , log λ_n) is the eigenvalue logarithms.

A differentiable manifold $R$ is a topological space that is locally similar to Euclidean space and has a globally defined differential structure. The tangent space at point p on the manifold, $T_{p} R$ , is a vector space that consists of the tangle vectors of all possible curves passing through p. A Riemannian manifold is a differentiable manifold equipped with a smoothly varying inner product on each tangent space. Specially, we consider the tangent space at point I on the manifold, the Log-Euclidean metric in Equation (2) can be understood as projecting a point C on the Riemannian manifold $R$ to a Euclidean space via the logarithm map, $Ψ_{log} : C \to log (C) \in R,$ (3) where C ∈ T_I.

For $\forall C_{i}, C_{j} \in R$ , we actually derive a Riemannian kernel function on the manifold $R$ : $\begin{matrix} κ_{log} (φ (X_{i}), φ (X_{j})) & = κ_{log} (C_{i}, C_{j}) \\ = tr (log (C_{i}) log (C_{j})) \end{matrix}$ (4)κ_log is a symmetric positive definiteness kernel [23].

3. SVDD based kernel learning

SVDD is a sphere shaped data description originally proposed for outlier detection. By mapping the data point from the input data space to a high dimensional feature space, it finds the smallest sphere enclosing most of the mapped data points in the feature space. SVDD is mainly used to describe the data set, and it emphasis on data description ability, without considering the discriminant ability.

In this paper, we discuss the face recognition problem from image set by kernel learning approach. Firstly, by support vector machine learning, image set are mapped into high dimension model space, and a new representation model is established, in which most positive samples are enclosed by a sphere of radius R. Then in the model space, we define a new measurement method to characterize the similarity between the two spheres. To obtain a better separability between hyper-spheres, based on this metric function, we derive a SVDD based kernel function on the data domain. Through this kernel function, a measure of the domain space is converted into distance metric in Euclidean space, and the kernel matrix is computed based on a similarity measure and embedded into the graph embedding discriminant analysis framework to achieve the purposes of classification. Since the definition of the kernel matrix is not a semi-positive definite kernel, this paper further discusses how to convert the general function to a semi-positive definite kernel function.

3.1 Support vector domain description

Let X = [x₁, x₂, . . ., x_N] be a data matrix of an image set with N samples, where x_i ∈ $ℝ$ ^D is the vectorization representation of the i-th image with D-dimensional feature description. SVDD is a data description technology which represents the image set s as an enclosing sphere of radius R by a nonlinear transformation φ to map the input data points into a high dimensional model space. This sphere contains most of the mapped data points in the model space, and the outlier lie outside the sphere. It is described by the following model [18], $min_{R} {R^{2} + J \sum_{i = 1}^{N} ξ_{i}}$ (5) subject to ||φ (x_i) - μ||² ≤ R² + ξ_i, ∀ i = 1, . . . , N, where ||·|| is the Euclidean norm, μ is the center of the sphere, R is the radium of the sphere, ξ_i ≥ 0 (i = 1, . . . , N) are slack variables that allow some data points lying outside the sphere, J is a constant, and $J \sum_{i = 1}^{N} ξ_{i}$ is a penalty term. By introducing the Lagrangian function $\begin{matrix} L = & R^{2} + J \sum_{i = 1}^{N} ξ_{i} - \sum_{i = 1}^{N} α_{i} ξ_{i} \\ - \sum_{i = 1}^{N} β_{i} (R^{2} + ξ_{i} - ∥ φ (x_{i}) - μ ∥^{2}) \end{matrix}$ (6) Setting ∂L/∂R = 0 and ∂L/∂μ = 0, respectively, leads to $\sum_{i = 1}^{N} β_{i} = 1$ and $μ = \sum_{i = 1}^{N} β_{i} φ (x_{i})$ . Using these relations, the objective function is transformed into a function of β_i. The solution of (5) can be obtained by solving the Wolfe dual problem [25] as follows,

$max_{β_{i}} {\sum_{i = 1}^{N} β_{i} κ (x_{i}, x_{i}) - \sum_{i, j = 1}^{N} β_{i} β_{j} κ (x_{i}, x_{j})}$ (7) subject to $\sum_{i = 1}^{N} β_{i} = 1, 0 \leq β_{i} \leq J, \forall i = 1, . . ., N$ , where κ (x_i, x_j) is a kernel function, e.g. Gaussian kernel.

According to the values of Lagrange multipliers β_i(i = 1, . . . , N), the data points in the input space are classified into three types: (1) Inner points (IPs): β_i = 0, the data points locate inside the sphere. (2) Support vectors (SVs): 0 < β_i < J, the data points locate on the surface of the sphere, which can be used to describe the cluster contour in the original data space. (3) Bounded support vectors (BSVs): β_i = J, the data points locate outside the feature space, it is also defined as the external points.

Note that if constant J ≥ 1, no external points will exist. Hence the value of J can be used to control the existence external points.

3.2 Modeling image sets

In this study, the Gaussian kernel 1 with width parameter q is adopted. The Gaussian kernel function is defined as κ (x_i, x_j) = exp(- q||x_i - x_j||²), where φ (x_i) is the mapped image of x_i in high dimension feature space, and q is the Gaussian kernel width parameter. The trained Gaussian kernel support function, which is defined by the squared radial distance of the image of x from the sphere center μ, is given by $\begin{matrix} f (x) & = {‖ φ (x) - μ ‖}^{2} = {‖ φ (x) - \sum_{i = 1}^{N} β_{i} φ (x_{i}) ‖}^{2} \\ = κ (x, x) - 2 \sum_{i = 1}^{N} β_{i} κ (x_{i}, x) + \sum_{i, j = 1}^{N} β_{i} β_{j} κ (x_{i}, x_{j}) \end{matrix}$ (8)

Let X = [x₁, x₂, . . . , x_N] be a data matrix of an image set with N samples, where x_i ∈ $ℝ$ ^D is the vectorised representation of the i-th image with D-dimensional feature description. An image set can be represented as a data domain designed by SVDD, which is described by radius R, its center μ and the squared Gaussian kernel distance function f (x). ∥μ∥² is the squared length of hyper-sphere center used to compute squared hyper-sphere center distance in Equation (12).

Let SV and BSV denote the collection of all support vectors and bounded support vectors respectively. The SVDD of this set is defined as follows, $D (μ, R, f (x)) = {SV, BSV, μ, f (x)}$ (9) where squared kernel distance function f (x) defined in Equation (8), the domain radius R is obtained by $R = max ({\sqrt{f (x_{i})} | x_{i} \in SV, \forall i})$ , and ∥μ∥² is the squared length of sphere center is computed as $‖ μ^{2} ‖ = {‖ \sum_{i = 1}^{N} β_{i} φ (x_{i}) ‖}^{2} = \sum_{i, j = 1}^{N} β_{i} β_{j} κ (x_{i}, x_{j})$ (10) where (x_i, β_i, . . .) , (x_j, β_j, . . .) ∈ SV ∪ BSV.

3.3 The distance metric between support vector domains

Let domain $D$ _p (μ_p, R_p, f_p (x)) and $D$ _q (μ_q, R_q, f_q (x)) represent the SVDDs of image set $X_{p} = [x_{1}^{(p)}, x_{2}^{(p)}, . . ., x_{N_{p}}^{(p)}]$ and $X_{q} = [x_{1}^{(q)}, x_{2}^{(q)}, . . ., x_{N_{q}}^{(q)}]$ . A natural measurement of the similarity between these two domains should take into account both of their sphere center distance and their sphere radiuses. The domain distance between $D$ _p and $D$ _q is defined as the ratio of the sphere center distance ∥μ_p - μ_q∥ to the sum of the sphere radiuses R_p + R_q, which can be computed as: $d_{d d} (D_{p}, D_{q}) = \frac{‖ μ_{p} - μ_{q} ‖}{R_{p} + R_{q}}$ (11) where SV, BSV and $β_{i}^{(p)}, β_{j}^{(q)}$ are the same as described in Section 301, and ∥μ_p - μ_q∥ can be computed as follows,

$\begin{array}{l} {‖ μ_{p} - μ_{q} ‖}^{2} = {‖ \sum_{i = 1}^{N_{p}} β_{i}^{(p)} φ (x_{i}^{(p)}) - \sum_{j = 1}^{N_{q}} β_{j}^{(q)} φ (x_{j}^{(q)}) ‖}^{2} \\ = \sum_{i, j = 1}^{N_{p}} β_{i}^{(p)} β_{j}^{(p)} κ (x_{i}^{(p)}, x_{j}^{p)}) + \sum_{i, j = 1}^{N_{q}} β_{i}^{(q)} β_{j}^{(q)} κ (x_{i}^{(q)}, x_{j}^{(q)}) \\ - 2 \sum_{i, j} β_{i}^{(p)} β_{j}^{(q)} κ (x_{i}^{(p)}, x_{j}^{(q)}) \\ = {‖ μ_{p} ‖}^{2} + {‖ μ_{q} ‖}^{2} - 2 \sum_{i, j} β_{i}^{(p)} β_{j}^{(q)} κ (x_{i}^{(p)}, x_{j}^{(q)}) \end{array}$ (12) where $(x_{i}^{(p)}, β_{i}^{(p)}, . . .) \in {SV}_{p} \cup {BSV}_{p}$ , and $(x_{j}^{(q)}, β_{j}^{(p)}, ...) \in S V_{q} \cup B S V_{q}$ .

3.4 The kernel matrix in support vector domain

Let $X_{p} = [x_{1}^{(p)}, . . ., x_{N_{p}}^{(p)}]$ and $X_{q} = [x_{1}^{(q)}, . . ., x_{N_{q}}^{(q)}]$ be two image sets containing N_p and N_q elements respectively in input space, where $x_{i}^{(p)} \in X_{p} \subset ℝ^{D}$ , $x_{i}^{(q)} \in X_{q} \subset ℝ^{D}$ is the vectorization of an image. We define the SVDD pseudo kernel function on the data domain as follow: $κ_{dd} (φ (X_{p}), φ (X_{q})) = κ_{dd} (D_{p}, D_{q}) = exp (- d_{dd} (D_{p}, D_{q}))$ (13)

It is easy to check that k_dd is a symmetric real-valued function, i.e., κ_dd ( $D$ _p, $D$ _q) = κ_dd ( $D$ _q, $D$ _p) for all $D$ _p, $D$ _q. We note that the proposed kernel is not guaranteed to always be a positive define function, but in this paper, we show that it can be nevertheless still be quite useful. As discussed in [5, 7], it is possible to convert pseudo kernels into true kernels.

Theorem 1. [7][Spectral flip transformation] Given a symmetric matrix K and its eigenvalue decomposition K = U^TΛU, where U is an orthogonal matrix, and Λ = diag (λ₁, λ₂, . . . , λ_N), λ_i is the i-th eigenvalue. By “flipping” the axes of negative signature, i.e. $\tilde{Λ} = diag (λ_{1}, λ_{2}, . . ., λ_{N})$ , we can calculate the $\tilde{K} = U^{T} \tilde{Λ} U$ , then $\tilde{K}$ is a positive definite matrix.

The spectral flip transformation (SFT) flips the sign of negative eigenvalues of a symmetric matrix K to form a positive semidefinite kernel matrix $\tilde{K}$ . SFT embed data into a pseudo-Euclidean space $ℝ^{N^{+}, N^{-}}$ , where N⁺ and N^- are the number of positive and negative eigenvalue of K, and N⁺ + N^- = N. The matrix $\tilde{K}$ can be considered as an inner product matrix in the pseudo-Euclidean space $ℝ^{N^{+}, N^{-}}$ .

The flip method can be explained from the perspective of singular value decomposition (SVD) [27]. Given a symmetric matrix K, the spectral decomposition is K = U^TΛU, and the singular value decomposition is K = Udiag (σ₁, . . . , σ_N) V^T, where U^TU = I and V^TV = I.

The matrix U in SVD of K is the same in spectral decomposition of K = U^TΛU. In the other hand, from spectral decomposition of K, K = U^TΛU, we have KK^T = UΛU^TUΛU^T = UΛ²U^T, where $Λ^{2} = diag (λ_{1}^{2}, . . ., λ_{N}^{2})$ is the eigenvalue of KK^T. Since the square root of singular values are corresponding to eigenvalues of KK^T, $σ_{i} = \sqrt{λ_{i}^{2}} = ∣ λ_{i} ∣$ . The fact means the flip method actually approximates the eigenvalues of K by its corresponding singular values [27]. The induced $\tilde{K} = U d i a g (σ_{1}, ..., σ_{N}) U^{T} = U d i a g (| λ_{1} |, | λ_{2} |, ..., | λ_{N} |) U^{T}$ is then used as kernel matrix in kernel methods.

Consider a test sample X_t that is the same as a training sample X_i from input space. Then if one uses a classifier trained with modified kernel $\tilde{K}$ , but uses the unmodified test form, represented by K_t = [κ (φ (X₁) , φ (X_t)) , . . . , κ (φ (X_N) , φ (X_t))] ^T, the same sample will be treated inconsistently. The spectrum flip modifications can be represented by linear transformations [5], that is $\tilde{K} = PK$ , where P = U^Tdiag (sgn (λ₁) , sgn (λ₂) , . . . , sgn (λ_N)) U is the corresponding transformation matrix, and we propose to apply the same linear transformation P to the test sample X_t such that ${\tilde{K}}_{t} = P K_{t}$ .

4. A kernel analysis framework for FRIS

4.1 Graph embedding discriminant analysis for image set

4.1.1 Graph embedding discriminant analysis

A graph G = (V, W) represents a set of vertices V, and a set of edges that connect pairs of vertices, where W = [W (i, j)] _N×N is the adjacent matrix represent the similarity between pairs of vertices. The corresponding diagonal matrix D and the Laplacian matrix L of a graph are defined as, $L = D - W, D_{ii} = \sum_{j \neq i} W (i, j)$ (14)

Given N points ${(X_{i}, l_{i})}_{i = 1}^{N}$ in the input image space, where X_i ∈ $ℝ$ ^D ×N_i and l_i ∈ {1, 2, . . . , c} is the class label, the number of k-th class contains N_k samples. satisfy N₁ + N₂ + . . . + N_c = N. Suppose there exists an non-linear function φ maps these point into another high-dimensional model space $M$ , and another non-linear function ψ maps model M in the model space $M$ to a new feature space $F$ . $φ : X \mapsto M = φ (X) \in M, where X \in ℝ^{D \times N}$ (15) $ψ : M \mapsto ψ (M) \in F, where M \in M$ (16)

Combination the two non-linear function as one description as follows, $φ = φ \circ ψ : X \mapsto φ (X) \in F, where X \in ℝ^{D \times N}$ (17)

Now, we obtain N models ${(M_{i}, l_{i})}_{i = 1}^{N}$ in the model space $M$ , where l_i ∈ {1, 2, . . . , C} is model label, which is the same as the class label in the input image space. The models in the model space can be treated as N vertices of a graph G. The task of graph embedding is to determine a low-dimensional representation of the vertex set, Y = [Y₁, Y₂, . . . , Y_N] ^T, that preserves the similarities between pairs of data in the original high-dimensional space [28]. It aims to maintain similarities among vertex pairs according to the graph preserving criterion [30] $\frac{1}{2} \sum_{i, j} (Y_{i} - Y_{j})^{2} W (i, j)$ (18)

Let α_l = (a_1l, a_2l, . . . , a_Nl) ^T, and K_i = (κ_1i, . . . , κ_Ni) ^T. Generally, the kernel function in the model space can be defined by inner product as follows, $κ_{ij} = κ (φ (X_{i}), φ (X_{j})) = φ (X_{i}), φ (X_{j})$ (19)

According to the representer theorem [17], the solution can be expressed as a linear combination of data points, i.e. $Γ_{l} = \sum_{j = 1}^{N} a_{jl} φ (X_{j})$ . We note that $Γ_{l}, φ (X_{i}) = \sum_{j = 1}^{N} a_{jl} κ (φ (X_{j}), φ (X_{i})) = α_{l}^{T} K_{i}$ (20) where α_l = (a_1l, a_2l, . . . , a_Nl) ^T and K_i = (κ_1i, κ_2i, . . . , κ_Ni) ^T. Suppose that the final solution has the following form, $\begin{matrix} Y_{i} & = (Γ_{1}, φ (X_{i}), . . ., Γ_{r}, φ (X_{i})) \\ = (α_{1}^{T} K_{i}, α_{2}^{T} K_{i}, . . ., α_{r}^{T} K_{i}) \\ = A^{T} K_{i} \end{matrix}$ (21) where A = [α₁, α₂, . . . , α_r].

The Equation (18) can be simplified to: $\begin{array}{l} \frac{1}{2} \sum_{i j} {‖ Y_{i} - Y_{j} ‖}^{2} W (i, j) \\ = \frac{1}{2} \sum_{i j} {‖ A^{T} K_{i} - A^{T} K_{j} ‖}^{2} W (i, j) \\ = \sum_{i} t r (A^{T} K_{i} K_{i}^{T} A) W (i, i) - \sum_{i j} t r (A^{T} K_{j} K_{i}^{T} A) W (i, j) \\ = t r (A^{T} K D K^{T} A) - t r (A^{T} K W K^{T} A) \end{array}$ (22) In the above formulation, we utilize W (i, j) = W (j, i), tr (AB) = tr (BA), where K = [K₁, K₂, . . . , K_N]. Add the following normalising constraint tr (A^TKDK^TA) =1 to the problem. This constraint enables us to convert the minimisation problem into a maximisation one as follows: $\begin{matrix} arg max_{A} tr (A^{T} KW K^{T} A) \\ s . t . tr (A^{T} KD K^{T} A^{T}) = 1 \end{matrix}$ (23)

The optimal A_opt are given by the largest eigenvectors of the eigen-problem: KWK^TA = λKDK^TA. Grouping the maximum r eigenvectors α_i, (i = 1, 2, . . . , r) corresponding to the maximum r eigenvalue, we obtain A_opt = [α₁, α₂, . . . , α_r]. Given a test image set X_t ∈ $ℝ$ ^D in input space, the projection is obtained by set $Z_{t} = A_{opt}^{T} K_{t}$ , where K_t = [κ (φ (X₁) , φ (X_t)) , . . . , κ (φ (X_N) , φ (X_t))] ^T.

For the convenience of description, the graph embedding framework is denoted as KGE (W, D). By choosing different W and D, the Kernel Fisher Discriminant (KFD) analysis, Kernel Locality Preserving Projection (KLPP) and Kernel Margin Fisher Analysis (KMFA) can be embedded into this framework.

4.1.2 Fisher Discriminant Analysis

Kernel Fisher’s Discriminant Analysis (KFD) is equivalent to KGE (W, I), where I is an identity matrix. KFD can be solved the eigen-value problem, KWK^TA = λKK^TA, where graph W is defined as, $W (i, j) = {\begin{matrix} \frac{1}{N_{k}} & if l_{i} = l_{j} = k \\ 0 & if l_{i} \neq l_{j} \end{matrix}$ (24)

4.1.3 Locality Preserving Projection

The purpose of locality preserving projections is to minimize an objective function that incurs a heavy penalty if neighbouring points in the original space are mapped far apart in the transformed space [12]. KLPP is equivalent to KGE (W, D), which can be solved through a generalised eigenvalue problem KWK^TA = λKDK^TA. The local geometrical structure of $F$ can be modelled by building a similarity graph W = [W (i, j)] _N×N denoted as, $W (i, j) = {\begin{matrix} 1 & if X_{i} \in N (X_{j}) or X_{j} \in N (X_{i}) \\ 0 & otherwise \end{matrix}$ (25) where N (X_i) is the KNN of X_i.

4.1.4 Margin Fisher Analysis

Let N_w (X_i) be the KNN of X_i sharing the same label as l_i, within-class similarity graph W_w is defined as, $W_{w} (i, j) = {\begin{matrix} 1 & if X_{i} \in N_{w} (X_{j}) or X_{j} \in N_{w} (X_{i}) \\ 0 & otherwise \end{matrix}$ (26)

Let N_b (X_i) be the KNN of X_i, having different labels l_j (j ≠ i), between-class similarity graph W_b is defined as, $W_{b} (i, j) = {\begin{matrix} 1 & if X_{i} \in N_{b} (X_{j}) or X_{j} \in N_{b} (X_{i}) \\ 0 & otherwise \end{matrix}$ (27)

Our aim is to maximize the discriminatory power while simultaneously preserve the geometry structure, by mapping the points on model space $M$ to a new feature space $F$ . A suitable transformation would place the connected points of W_w is close as possible, while moving the connected points of W_b as far as possible. Such a mapping can be described by optimizing the following two objective functions [10 , 30]: $f_{1} = min_{A} \frac{1}{2} \sum_{ij} (Y_{i} - Y_{j})^{2} W_{w} (i, j)$ (28) $f_{2} = max_{A} \frac{1}{2} \sum_{ij} (Y_{i} - Y_{j})^{2} W_{b} (i, j)$ (29) where Y_i = A^TK_i. Noting $\begin{array}{l} \frac{1}{2} \sum_{i j} {‖ Y_{i} - Y_{j} ‖}^{2} W_{w} (i, j) \\ = \frac{1}{2} \sum_{i j} {‖ A^{T} K_{i} - A^{T} K_{j} ‖}^{2} W_{w} (i, j) \\ = t r (A^{T} K D_{w} K^{T} A) - t r (A^{T} K W_{w} K^{T} A) \end{array}$ (30) where D_w (i, i) = ∑_j≠iW_w (i, j).

Add the following normalising constraint tr (A^TKD_wK^TA) =1 to the problem. This constraint enables us to convert the minimisation problem in to a maximisation one as follows: $\begin{matrix} min_{A} {\frac{1}{2} \sum_{ij} {Y_{i} - Y_{j}}^{2} W_{w} (i, j)} \\ = min_{A} {tr (A^{T} K D_{w} K^{T} A) - tr (A^{T} K W_{w} K^{T} A)} \\ = max_{A} {tr (A^{T} K W_{w} K^{T} A)} \end{matrix}$ (31)

The maximisation problem of Equation (29) can be simplified as: $\begin{array}{l} \max_{A} {\frac{1}{2} \sum_{i j} {‖ Y_{i} - Y_{j} ‖}^{2} W_{b} (i, j)} \\ = \max_{A} {t r (A^{T} K D_{b} K^{T} A) - t r (A^{T} K W_{b} K^{T} A)} \\ = \max_{A} {t r (A^{T} K L_{b} K^{T} A)} \end{array}$ (32) where L_b = D_b - W_b, and D_b (i, i) = ∑_j≠iW_b (i, j).

Combined the two maximisation problem into one maximisation problem as following, $max_{A} {tr (A^{T} K (L_{b} + β W_{w}) K^{T} A)}$ (33) subject to tr (A^TKD_wK^TA) =1, where β is a Lagrangian multiplier. The solution to the optimisation can be sought as the r largest eigenvectors of the following generalised eigenvalue problem, $K (L_{b} + β W_{w}) K^{T} A = λ K D_{w} K^{T} A$ (34)

KMFA is equivalent to KGE (L_b + βW_w, D_w). It should note that the Grassmann Graph embedding Discriminant Analysis (GGDA)[10, 11] and KMFA [30] are equivalent in nature.

If we define W_w and W_b as follows, $W_{w} (i, j) = {\begin{matrix} \frac{1}{N_{k}} & if l_{i} = l_{j} = k \\ 0 & if l_{i} \neq l_{j} \end{matrix}$ (35)

$W_{b} (i, j) = {\begin{matrix} \frac{1}{N} - \frac{1}{N_{k}} & if l_{i} = l_{j} = k \\ \frac{1}{N} & if l_{i} \neq l_{j} \end{matrix}$ (36)

Then Grassmann Discriminant Analysis (GDA) [9] is a special case of GGDA [10, 11].

4.2 The kernel learning framework

In computer vision applications, the multi-view facial images are non-linearly distributed [6]. The popular learning algorithms such as discriminant analysis and support vector machines, etc., are not directly applicable to such features due to the non-Euclidean nature of the underlying spaces [19]. To overcome this limitation, each image set is mapped into a high-dimensional feature space [6, 14], e.g. a high dimensional Reproducing Kernel Hilbert Space (RKHS), using a nonlinear mapping function, to which many Euclidean algorithms can be generalized [14]. In $ℝ$ ^D, kernel methods have proven to be effective for many computer vision tasks. Mapping to a RKHS relies on a kernel function, which, according to Mercer’s theorem, must be positive definite. This section suggests an FRIS framework in kernel space. Figure 1 show the basic workflow of proposed kernel learning framework.

Fig. 1

The workflow of proposed FRIS system. The left dashed Box is the learning procedure, and the right is the testing procedure.

Algorithm 1 summarizes the main scheme of the proposed kernel learning framework for image set-based classification. Given a training set ${(X_{i}, l_{i})}_{i = 1}^{N}$ , where $X_{i} \in ℝ^{D \times N_{i}}$ is a matrix, each column is the vectorization of an image (or feature extracted from an image, e.g. LBP [16] feature), and l_i ∈ {1, 2, . . . , C} is the class label. As shown in Figure 1, we first learn a non-linear mapping function φ from images in train set X_i (i = 1, 2, . . . , N), which mapped the samples into model space $M$ . The non-linear mapping function φ is defined in Equation (15).

By exploring an efficient metric for the data in the model space, a kernel function is used to map each model to a high-dimensional feature space $F$ , to which many Euclidean algorithms can be generalized. With the explicit mapping, a kernel version graph embedding discriminant analysis framework is extended for image set matching to obtain the project matrix A_opt. In the proposed framework, the traditional linear Fisher discriminant, locality preserving projection and marginal Fisher analysis are embedded and applied to keep the local geometry structure of the image set in the feature space. After obtain the project matrix, all train image set are mapped to feature space by $Y = A_{opt}^{T} K = [Y_{1}, Y_{2}, . . ., Y_{N}]$ , where $Y_{i} = A_{opt}^{T} K_{i} \in F$ , and i = 1, 2, . . . , N.

During test stage, a new test image set

X_{t} \in ℝ^{D \times N_{t}}

is projected to the feature space by

Z_{t} = A_{opt}^{T} K_{t} \in F

, where K_t = [κ (φ (X₁) , φ (X_t)) , . . . , κ (φ (X_N) , φ (X_t))] ^T. Finally, an NN classifier is applied to obtain the class label of Z_t.

Algorithm 1

A kernel learning framework for FRIS

1: Input

1. Train set

{(X_{i}, l_{i})}_{i = 1}^{N}

, where

X_{i} \in ℝ^{D \times N_{i}}

is an image set,

l_i ∈ {1, 2, . . . , c} denotes class label, and N_i is the number

of samples in class l_i.

2. A kernel function κ_ij in feature space

F

, which is used to

measure the similarity between two image sets.

3. Testing image set

X_{t} \in ℝ^{D \times N_{t}}

in input space, where N_t is

the number of samples in X_t.

2: Output: The class label of Z_t.

3: Train procedure:

1. Model the image set. For ∀X_i (i = 1, 2, . . . , N), learning

a nonlinear mapping function

φ : X_{i} \mapsto φ (X_{i}) \in M

where

X_{i} \in ℝ^{D \times N_{i}}

, and N_i is the number of samples in X_i;

2. Define the similarity metric d (φ (X_i) , φ (X_j)) in

feature space;

3. Calculate Gram kernel matrix:

K = [κ_{ij}]_{i, j = 1}^{N}

;

4. Select a discriminant analysis method in the graph

embedding framework to obtain the projection model A_opt;

5. Project the image set in feature space by

Y = A_{opt}^{T} K

;

4: Testing procedure: Calculate the projection of

X_{t} \in ℝ^{D \times N_{t}}

Z_{t} = A_{opt}^{T} K_{t} \in F

, where K_t = [κ (φ (X₁) , φ (X_t)) , . . . ,

κ (φ (X_N) , φ (X_t))] ^T;

5: Decision process: An NN classifier is applied to obtain the

class label of Z_t.

In the proposed kernel learning framework, a few things to note are: (1) We need to learn a suitable nonlinear function φ to map the image set from the input space to model space $M$ , in which image set can be expressed by a more descriptive model, and become more discrimination by defining new metric. Most of the existed method mainly treated the image set as an subspace, and handle subspace as a point in the Grassmann manifold, and the angle between the two vectors as similarity measure; or utilize the covariance matrix of image set as input, and treat the covariance matrix as a point of ${Sym}_{d}^{+}$ space spanned by non-singular matrix, Logarithmic-Euclidean metric as similarity measure. In this paper, we proposed a new method to model an image set by SVDD technology. By mapping the data points from the input data space to a high dimensional feature space, it finds the smallest sphere enclosing most of the mapped data points in the model space. (2) Kernel learning. Different data description needs to define a different kernel function, because different kernels corresponding to different types of data distribution, usually we use the projection on Grassmann manifold to calculate the kernel matrix. This paper defines a pseudo kernel, which is a non-positive definite. It uses the distance between two spheres to calculate the kernel matrix in the feature space, and by flipping the eigenvalue to convert pseudo kernels into true kernels to meet the conditions of Mercer’s theorem. (3) Generally, samples in model space are more separability than those in the original space, but this nature doesn’t have to be met.

5. Experiments

The proposed approach was compared and contrasted to previous state-of-the-art methods on two public video based face databases: ChokePoint [26] and CMU MoBo [8] in order to ensure an extensive evaluation of different methods against databases changes including images resolutions, facial expressions and illumination variations. In the following sections, we will first briefly overview the databases used in the experiments, followed by a description and discussion of the experiments.

5.1 Experimental settings

ChokePoint dataset

The first dataset, ChokePoint, is a video data set for the purposed of academic study in video-based face matching. The data set consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. The recording of portal 1 and portal 2 is one month apart. In total, the data set consists of 54 video sequences and 64204 labeled face images. In all sequences, only one subject is presented in the image at a time.

In this paper, we use videos from camera 1 and camera 2 at entering portal 1. There are 8 videos for each person, in these experiments, we randomly partition 8 videos into two sets of four videos for train and the rest for testing. The experiments are repeated 5 times. Histogram equalization is employed to eliminate the lighting effects, and all images are resized to 32 × 32 pixels to reduce the computation time in our experiments.

Motion of Body dataset The second dataset, CMU MoBo (Motion of Body), is the most commonly used in video-based face recognition research. It was originally collected for the purpose of human identification from distance. For each person video recordings were made for 4 walking styles (slow walk, fast walk, inclined walk and slow walk while holding a ball), viewed from a set of fixed cameras. The considered subset contains 96 sequences of 24 different subjects walking on a treadmill (each person has 4 videos, 2 for train, and the rest for testing.). Each sequence has 300 frames. We use a cascaded face detector [20] to detect faces in each video sequence. Each face is then resized to gray image of size 30 × 30, followed by histogram equalization to eliminate the lighting effects.

5.2 Comparative methods and settings

We compared the proposed SVDD based kernel combining different discriminant analysis method in this paper with several image set classification methods lately proposed in the literatures. They included Mutual Subspace Method (MSM) [29], Manifold-Manifold Distance (MMD) [24], Sparse Approximated Nearest Points between image sets (SANP) [13], Affine Hull based Image Set Distance (AHISD) [4], Grassmann discriminant analysis (GDA) [9], Graph embedding discriminant analysis on Grassmann manifolds (GGDA) [10], Covariance discriminative learning (CDL) [23], and Multi-local model image set matching based on domain description [31]. The standard implementations of all methods from the original authors were used. To allow comparison with the literature we followed the simple protocol of [24]: the detected face images were histogram equalized but no further preprocessing such as alignment or background removal was performed on them, and the image features were simple pixel (gray level) values for ChokePoint, and Local Binary Pattern (LBP) [1, 2] feature for CMU MoBo.

Parameter setting: The important parameters of each method were empirically tuned according to the recommendations in the original references as well as the source codes provided by the original authors. In MSM/MMD, PCA was performed to learn the single or mixture of linear subspaces by preserving 95% of data energy. For MMD, the parameters were configured according to [24]. The ratio between Euclidean distance and geodesic distance was set to 2.0, and the maximum canonical correlation is used in defining MMD. The number of connected nearest neighbors for computing geodesic distance in both MMD was fixed to its default value i.e. 12. For SANP, we adopt the same weight parameters as [13] for the convex optimization and retain 95% energy by PCA. For AHISD [4], we use the linear version, the best separating hyperplane is determined by using affine subspace estimation formulation, and subspace dimensions are set by retaining enough leading eigenvectors to account for 95% of the overall energy in the eigen-decomposition. For GDA, the projection kernel (κ_proj) was used. In GGDA [10], the simple binary graphs of the paper [10] and canonical correlation kernel were used. In this paper, the default setting in [10] and the neighbour numbers k was set to 2.

Our implementation: The important parameters in the proposed algorithm include: (i) Gaussian kernel parameter, (ii) Parameter J in Equation (5). The parameter value is set in the experiment section.

Note that for comparison we have chosen the GDA [9] as the top performing method in the comparative experiment of [9], where GDA has been compared with DCC [15], MSM [29]. Nearest neighbor (NN) classifier was used for all methods.

5.3 Identification results and analysis

In this experiment, we run the FRIS task on ChokePoint database. For the proposed SVDD kernel, the bandwidth for Gaussian kernel function was empirically chosen as q = 4 and C = 0.2. Table 1 and Figure 2 show the average recognition result and standard deviation in ChokePoint database with SVDD kernel (SVDD), projection kernel (Proj), canonical correlation kernel(CC) and logarithms kernel (LOG). The recognition results of the proposed framework with different kernels and discriminant methods are listed in Table 1. Figure 2(a-c) show the 5 random experiments result and the last column is the average value, and Figure 2(d) shows the average recognition of different discriminant method and kernel.

Table 1
Average recognition result and standard deviation in ChokePoint database with SVDD kernel (SVDD), projection kernel (Proj), canonical correlation kernel(CC) and logarithms kernel (LOG)

Method	SVDD	Proj	CC	LOG
KFD	99.8(0.45)	99.0(0.71)	94.0(1.41)	98.6(1.67)
KMFA	98.8(0.84)	85.0(2.35)	94.8(0.84)	81.6(7.83)
KLPP	97.6(1.67)	87.4(4.87)	91.2(3.42)	91.2(3.35)

Fig. 2

Average recognition result with projection kernel (Proj), canonical correlation kernel(CC), logarithms kernel (LOG) and SVDD kernel (SVDD) on ChokePoint database. In (a-c), the last column is the average recognition rate of 5 experiments, and in (d), the last column show the average recognition rate wile choose different discriminant method. (a) KFD,(b) KMFA,(c) KLPP,(d) different algorithms with different kernels.

As shown in Table 1, the proposed framework using SVDD kernel with KFD discriminant analysis method obtains the best performance, and SVDD kernel obtains the best performance than other kernel function for all discriminant analysis method. From Table 1 and Figure 2(a-c), the recognition rate of KFD is relatively stable with different kernels than KMFA and KLPP. As shown in 2(c), when KLPP is used, the projection kernel obtains the worst performance. Figure 2(d) indicates that SVDD obtains slightly better performance than other methods. In this experiment, the proposed methods have demonstrated promising performance. The proposed method with SVDD kernel in feature space has shown the best recognition rate among all the methods compared.

Moreover, we compare the proposed method with several image set classification methods lately proposed, which is close related to the proposed framework. In our framework, we could find equivalent methods by combination different kernel function and discriminant analysis methods. The kernel type, model technology and discriminant analysis method used are also reported. Table 2 shows the compilation setting and their result. From Table 2, we can find that the proposed framework with SVDD kernel and KFD discriminant analysis method obtains the best performance and CDL ranks the second. With the proposed SVDD based kernel, among the three discriminant analysis method used in this paper, the KFD obtains the best performance and KMFA ranks the second, and the KLPP delivers the worst performance. Although KLPP with SVDD based kernel delivers the worst performance in the proposed framework, but comparing with other image set classification methods lately proposed, it is only worse than CDL. The experimental result show that the proposed SVDD based kernel has demonstrated promising performance.

Table 2

The Recognition rate(RR), standard deviation(STD), kernel type, discriminant analysis method (DA) in the proposed framework and image set model on ChokePoint dataset

Method	RR	STD	Kernel	DA	Model
GDA [9]	95.60	0.89	Proj	KFD	Subspace
GGDA [10]	95.40	0.55	CC	KMFA	Subspace
CDL [23]	98.20	0.84	LOG	KFD	COV
Ours(KFD)	99.80	0.45	SVDD	KFD	SVDD
Ours(KMFA)	98.80	0.84	SVDD	KMFA	SVDD
Ours(KLPP)	97.60	1.67	SVDD	KLPP	SVDD

Lastly, we compare the proposed method with other several image set classification methods lately proposed. Some results have reported in Table 2 on ChokePoint are repeated listed to compare. The computational complexity of different methods with ChokePoint and CMU MoBo database are also reported on Intel(R) Xeon(R) CPU E7-4807× 2 @ 1.87GHZ with 64GB memory. The experiments are repeated 5 times, the average recognition rates, standard deviation and average time cost for each method are tabulated in Table 3. From Table 3, we can find that our proposed method outperforms other competing method on ChokePoint database except AHISD and SANP. However, the proposed method outperform all other competing methods. From the experimental results, the proposed method can solve the set matching problem efficiently, while SANP and CDL obtain less appealing in terms of efficiency than all other methods. As discussed in Section 1, the SANP [13] match the closest virtual points via a convex optimization, and the sample-based matching mechanism and complex optimization procedure make these methods less appealing in terms of efficiency.

Table 3

Face recognition rates (RR) standard deviation (STD) (%), average computation time (seconds) on the ChokePoint and CMU MoBo dataset

Data set	ChokePoint		CMU MoBo
Data set	RR(STD)	Time	RR(STD)	Time
MSM [29]	97.40(1.52)	1.04	97.14(1.12)	4.98
MMD [24]	95.20(0.84)	22.24	94.69(2.74)	181.56
SANP [13]	99.84(0.36)	31258.00	96.73(1.12)	2576.00
AHISD [4]	100.00(0.00)	7.29	95.92(0.00)	23.32
GDA [9]	95.60(0.89)	5.78	96.73(1.12)	41.09
GGDA [10]	95.40(0.55)	34.96	95.51(0.91)	242.75
CDL [23]	98.20(0.84)	333.89	95.51(0.91)	21095.00
Zeng [31]	97.80(1.30)	4.96	97.55(0.91)	26.47
Ours(KFD)	99.80(0.45)	8.66	98.78(1.12)	47.30

We also compare with our previous study [31], and the result show that performance of the proposed method in this paper beyond the previous study. All in a word, the proposed method can solve the set based face recognition problem efficiently.

6. Conclusions

The success of kernel-based methods is often determined by the choice of kernel. This paper addressed the issue of kernel learning for the image set-based classification tasks, and use SVDD to model the image set. After modeling the image set, we define a new kernel based on the distance between two support vector domains to map the domains to an Euclidean space. In this paper, the classic kernel based FRIS methods in recent years are consolidated into a unified framework, and FDA, LPP and MFA are integrated into this framework, which will improve the discriminant ability in feature space. The proposed methods are tested on face recognition task, and the comparisons with several state-of-the-art methods are performed. For the two databases, ChokePoint and CMU MoBo, the proposed methods have demonstrated promising performance.

Footnotes

Acknowledgments

This work is partially supported by the Funds for University Innovation and Entrepreneurship Education of Guangzhou City(No. 2019PT204), the Special Funds for Welfare Research and Capacity Building Project of Guangdong Province (No. 2015A030402003), and the Funds for Science and Technology Project of Guangdong Province(No.2017ZC0117).

Any kernel, e.g. Polynomial kernel, Sigmoid kernel or Gaussian kernel, works here, however, as discussed in [3 , ], Gaussian kernels can provide more tight contour representations, and are used in most SVDD-based approaches.

References

Ahonen ,

Hadid and

Pietikainen , Face description with local binary patterns: Application to face recognition, IEEE Trans Pattern Anal Machine Intell 28 (12) (2006), 2037–2041.

Arsigny ,

Fillard ,

Pennec and

Ayache , Geometric means in a novel vector space structure on symmetric positivedefinite matrices, SIAM J on Matrix Analysis and Applications 29(1) (2007), 328–347.

Ben-hur ,

Horn ,

H.T.

Siegelmann and

Vapnik , Support vector clustering, J Mach Learn Res 2 (2001), 125–137.

Cevikalp and

Triggs , Face recognition based on image sets, In IEEE Conference on Computer Vision and Pattern Recognition, IEEE, (2010), pp. 2567–2573.

Chen ,

E.K.

Garcia ,

M.R.

Gupta ,

Rahimi and

Cazzanti , Similarity-based classification: Concepts and algorithms, J Mach Learn Res 10 (2009), 747–776.

W.-S.

Chu ,

J.-C.

Chen and

J.-J.

Lien , Kernel discriminant analysis based on canonical differences for face recognition in image sets. In

Yagi ,

Kang ,

Kweon , and

Zha , editors, Computer Vision ACCV 2007, volume 4844 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2007), pp. 700–711.

Graepel ,

Herbrich ,

Bollmann-Sdorra and

Obermayer , Classification on pairwise proximity data, Advances in Neural Information Processing Systems (1999), 438–444.

Gross and

Shi , The cmu motion of body (mobo) database. Technical Report CMU-RI-TR-01-18, Robotics Institute, Pittsburgh, PA, June 2001.

Hamm and

D.D.

Lee , Grassmann discriminant analysis: A unifying view on subspace-based learning. In Proc. of the 25th. Int. Conf. on Machine Learning, ACM, 2008. pp. 376–383.

10.

M.T.

Harandi ,

Sanderson ,

Shirazi and

B.C.

Lovell , Graph embedding discriminant analysis on grassmannian manifolds for improved image set matching, In IEEE Conf. On Computer Vision and Pattern Recognition (2011), pp. 2705–2712.

11.

M.T.

Harandi ,

Sanderson ,

Shirazi and

B.C.

Lovell , Kernel analysis on grassmann manifolds for action recognition, Pattern Recogn Lett 34(15) (2013), 1906–1915.

12.

He and

Niyogi , Locality preserving projections. In

Thrun ,

L. K.

Saul , and

Schölkopf , editors, NIPS, Vancouver and Whistler, British Columbia, Canada, 2003. MIT Press.

13.

Hu ,

A.S.

Mian and

Owens , Face recognition using sparse approximated nearest points between image sets, IEEE Trans Pattern Anal Machine Intell 34(10) (2012), 1992–2004.

14.

Jayasumana ,

Hartley ,

Salzmann ,

Li and

Harandi , Kernel methods on the riemannian manifold of symmetric positive definite matrices, In IEEE Conf. on Computer Vision and Pattern Recognition, Oregon, Portland, June 2013. IEEE. pp. 73–80.

15.

kyun Kim ,

Kittler and

Cipolla , Discriminative learning and recognition of image set classes using canonical correlations, IEEE Trans Pattern Anal Machine Intell 29 (2007), 1005–1018.

16.

Liao ,

Zhu ,

Lei ,

Zhang and

Li , Learning multiscale block local binary patterns for face recognition. In

S.-W.

Lee and

Li , editors, Advances in Biometrics, volume 4642 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2007, pp. 828–837.

17.

Shawe-Taylor and

Cristianini , Kernel Methods for Pattern Analysis. Cambridge university press, 2004.

18.

D.M.J.

Tax and

R.P.W.

Duin , Support vector domain description, Pattern Recogn Lett 20 (1999), 1191–1199.

19.

Vemulapalli ,

J.K.

Pillai and

Chellappa , Kernel learning for extrinsic classification of manifold features. In IEEE Conf on Computer Vision and Pattern Recognition, Oregon, Portland, June 2013. IEEE. pp. 1782–1789.

20.

Viola and

M.J.

Jones , Robust real-time face detection, Int J Comput Vision 57(2) (2004), 137–154.

21.

C.-D.

Wang and

J.-H.

Lai , Position regularized support vector domain description, Pattern Recogn 46(3) (2013), 875–884.

22.

C.-D.

Wang ,

W.-S.

Zheng ,

J.-H.

Lai and

Huang , Svstream: A support vector-based algorithm for clustering data streams, IEEE Trans on Knowledge and Data Engineering 25(6) (2013), 1410–1424.

23.

Wang ,

Guo ,

L.S.

Davis and

Dai , Covariance discriminative learning: A natural and efficient approach to image set classification. In IEEE Conf. on Computer Vision and Pattern Recognition (2012), pp. 2496–2503.

24.

Wang ,

Shan ,

Chen and

Gao , Manifold-manifold distance with application to face recognition based on image set, In IEEE Conf on Computer Vision and Pattern Recognition IEEE (2008), pp. 1–8.

25.

Wolfe , A duality theorem for nonlinear programming, Quarterly of Applied Mathematics 19(3) (1961), 239–244.

26.

Wong ,

Chen ,

Mau ,

Sanderson and

B.C.

Lovell , Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition, In IEEE Conf on Computer Vision and Pattern RecognitionWorkshops, IEEE (2011), pp. 74–81.

27.

Wu ,

E.Y.

Chang and

Zhang , An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines. In Proc. of the 22nd. Int. Conf. on Mach. Learn., volume 8. Citeseer, 2005.

28.

Xu ,

Yan ,

Tao ,

Lin and

H.-J.

Zhang , Marginal fisher analysis and its variants for human gait recognition and content-based image retrieval, IEEE Trans Image Processing 16(11) (2007), 2811–2821.

29.

Yamaguchi ,

Fukui and

ichi Maeda , Face recognition using temporal image sequence. In Proc. of the 3rd. Int. Conf. on Automatic Face & Gesture Recognition, Washington, DC, USA, 1998. IEEE Computer Society. pp. 318–323.

30.

S.-C.

Yan ,

Xu ,

B.-Y.

Zhang ,

H.-J.

Zhang ,

Yang and

Lin , Graph embedding and extensions: A general framework for dimensionality reduction, IEEE Trans Pattern Anal Machine Intell 29 (2007), 40–51.

31.

Q.-S.

Zeng ,

J.-H.

Lai and

C.-D.

Wang , Multi-local model image set matching based on domain description, Pattern Recognition 47(2) (2014), 694–704.

Kernel Analysis based on SVDD for Face Recognition from Image Set

Abstract

Keywords

1. Introduction

2. Image set modeling and kernel learning

3.1 Support vector domain description

4.1 Graph embedding discriminant analysis for image set

4.1.1 Graph embedding discriminant analysis

5.1 Experimental settings

5.2 Comparative methods and settings

5.3 Identification results and analysis

Table 1 Average recognition result and standard deviation in ChokePoint database with SVDD kernel (SVDD), projection kernel (Proj), canonical correlation kernel(CC) and logarithms kernel (LOG)

Footnotes

Acknowledgments

References

Table 1
Average recognition result and standard deviation in ChokePoint database with SVDD kernel (SVDD), projection kernel (Proj), canonical correlation kernel(CC) and logarithms kernel (LOG)