A novel fuzzy linear discriminant analysis for face recognition

Abstract

In practical application, the performances of face recognition are always affected by variations of expression, illumination and so on. To address this problem, an interval type-2 fuzzy linear discriminant analysis (IT2FLDA) method is proposed. In this paper, we first propose the supervised interval type-2 fuzzy C-Means (IT2FCM) algorithm. Moreover, the supervised IT2FCM is incorporated into linear discriminant analysis (LDA). In this method, the membership degree matrix of training samples belonging to each class and means of each class are firstly calculated by the supervised IT2FCM algorithm. They are then applied to the definition of fuzzy within-class scatter matrix and fuzzy between-class scatter matrix, respectively. In doing so, means of each class that are estimated by the supervised IT2FCM can converge to a more desirable location than ones obtained by class sample average and fuzzy k-nearest neighbor (FKNN) method. Furthermore, the IT2FLDA is able to minimize the effects of uncertainties, find the optimal projective directions and make the feature subspace discriminating and robust, which inherits the benefits of the supervised IT2FCM and LDA. The experiment results show that the IT2FLDA improves the recognition rate and reduces sensitivity to variations when compared to results from the previous techniques.

Keywords

Type-2 fuzzy set linear discriminant analysis the supervised interval type-2 fuzzy C-Means membership degree matrix

1. Introduction

In face recognition applications, it may be impossible in most cases to obtain perfect knowledge or information for a given face image due to a substantial variation in light direction, different face poses and diversified facial expressions. Uncertain information can create imperfect expressions for face image in various face recognition algorithms. Therefore, various types of uncertainty should be taken into account when performing face recognition methods [3, 9, 15, 22].

The most well-known feature extraction methods used for face recognition are principal component analysis (PCA) [16] and linear discriminant analysis (LDA) [21]. PCA and LDA are the famous algorithms and basic feature extraction techniques. PCA projects the high-dimensional face image space into a low-dimensional feature space by calculating a projection matrix from eigenvectors of the covariance matrix. Since PCA is an unsupervised algorithm, it does not use the class information which affects the performance of classification problems. It is also worth stressing that the PCA can be affected by variations in illumination and different facial expressions. LDA is an improved supervised version of PCA. It seeks the projection matrix by maximizing the ratio of between-scatter matrix to within-scatter matrix. It is worth stressing that LDA produces well-separated classes in a low-dimensional subspace, even under large variation in illumination and facial expressions. However, LDA suffers from some limitations, one of which is the small sample size problem. Researchers have explored a number of effective approaches to solve this problem [8, 10, 25].

From the above algorithm, we note that the relationship of each face to a class is assumed to be crisp. However, as faces are significantly affected by numerous environmental conditions, it is advantageous to investigate these factors and quantify their impact on their “internal” class assignment. That means a substantial variation results in face images belonging to different classes to have similar features. Hence, Kwak and Pedrycz [13] introduced the membership degree of each face pattern to a class in LDA method based upon the fuzzy k-nearest neighbor (KNN) algorithm [11]. Yang et al. [23, 24] extended the idea of incorporating the membership degree of each face pattern into the definition of the between-class and within-class scatter matrices in LDA and 2DLDA. Li [26] applied fuzzy KNN into the computation of the mean matrix for improving the recognition performance of the traditional 2DPCA. Khoukhi and Ahmed [1] further modified by optimizing the parameters of the membership functions of the training images and the number of nearest neighbors used to calculate the membership degrees through genetic algorithm.

In practice, face recognition is a very difficult problem due to the factor of uncertainty being inherently present. To improve the recognition performance of face recognition and address these uncertainties, taking advantage of the type-2 fuzzy theory is a good choice. In fact, the management of uncertainty using type-2 fuzzy theory has been applied to various fields where we cannot obtain satisfactory performance with type-1 fuzzy theory [5, 19, 20, 27, 28, 29]. Inspired by the successful application of them, an interval type-2 fuzzy linear discriminant analysis (IT2FLDA) method is proposed. In this paper, we first propose the supervised IT2FCM, which introduces the classified information to the IT2FCM algorithm. Moreover, the supervised IT2FCM is incorporated into traditional LDA to reduce these outer effects to obtain the correct local distribution information to ensure good performance. The IT2FLDA utilizes the supervised IT2FCM algorithm to weight each face pattern to a class with membership degree and calculate means of each class. They are then applied to the definition of fuzzy within-class scatter matrix and fuzzy between-class scatter matrix, respectively. In doing so, the IT2FLDA is able to find the optimal projective directions by maximizing the ratio of fuzzy between-scatter matrix to fuzzy within-scatter matrix. The resulting embedding subspace has more discriminating and robustness.

This paper is organized as follows. In Section 2, related works are introduced; In Section 3, the proposed IT2FLDA is addressed in detail; Section 4 compares the performance of the proposed method with other previous feature extraction techniques in face recognition. The conclusions are drawn in Section 5.

2. Related works

In this section, a brief overview of the type-1 FLDA algorithm is introduced first. After that the shortcomings about type-1 FLDA is analyzed.

2.1 The type-1 FLDA

The FLDA is first presented by Kwak and Pedrycz in [13]. In traditional LDA approach every vector is assumed to have a crisp membership in the class to which it belongs. But this does not take into account the resemblance of images belonging to different classes, which occurs under varying conditions. In FLDA a vector is assigned the membership grades for every class based upon the class label of its k-nearest neighbors. In this manner, the inter-class image resemblance is accounted.

FLDA suffers the small sample size problem, which can be avoided by using PCA as a preprocessing step. Let a face image $z_{i}$ be a large one-dimensional vector of pixels, which constructed from two-dimensional facial image by concatenating of successive columns of the image. Given a set of feature vectors transformed by the PCA, $X=\left\{{x_{1},x_{2},\cdots,x_{N}}\right\}$ , $N$ is the number of training vectors and the dimension is $n$ . The fuzzy c-class partitioning of these vectors defines the membership degrees of each vector to all the classes. The partition matrix is denoted by $U=\left[{\mu_{ij}}\right]$ for $i=1,2,\cdots,c$ and $j=1,2,\cdots,N$ . $\mu_{ij}$ stands for the membership degree of $j$ th vector in the $i$ th class. The membership functions satisfy the two obvious properties:

$\displaystyle\sum\limits_{i=1}^{c}\mu_{ij}=1$ (1) $\displaystyle 0<\sum\limits_{j=1}^{N}\mu_{ij}<N$ (2)

During the training phase the membership degrees are computed though a sequence of steps: first, calculate the Euclidean distances matrix between each pair of feature vectors; second, set diagonal elements of this matrix to infinity and sort the distance matrix (treat each of its column separately) in an ascending order; third, collect the class labels of the $k$ vectors located in the closest neighborhood of each vector; last, compute the membership degree of the $j$ th vector to $i$ th class using the expression proposed in the literature [11].

$\displaystyle\mu_{ij}=\left\{{\begin{array}[]{ll}0.51+0.49\left({\frac{n_{ij}}% {k}}\right)&\textit{if i is the same as the label of the $j$th vector}\\ 0.49\left({\frac{n_{ij}}{k}}\right)&\textit{otherwise}\\ \end{array}}\right.$ (3)

In the above expression, $n_{ij}$ stands for the number of the neighbors of the $j$ th vector that belong to the $i$ th class. Intuitively, if there are very few neighbors of the vector that belong to the same category, the membership grade is kept close to 0.51; if $n_{ij}=k$ , which means that all the neighbors are in the same class as the vector under consideration, then $\mu_{ij}$ returns to 1.

Taking into account the membership degrees, the mean vector of each class $\tilde{m}_{i}$ is calculated as follows:

$\displaystyle\tilde{{m}}_{i}=\frac{\sum\nolimits_{j=1}^{N}{\mu_{ij}x_{j}}}{% \sum\nolimits_{j=1}^{N}{\mu_{ij}}}\quad i=1,2,\cdots,c$ (4)

The between-class fuzzy scatter matrix $S_{FB}$ and within-class fuzzy scatter matrix $S_{FW}$ are as follows:

$\displaystyle S_{FB}=\sum\limits_{i=1}^{c}{N_{i}}\left({\tilde{{m}}_{i}-\bar{{% m}}}\right)\left({\tilde{{m}}_{i}-\bar{{m}}}\right)^{T}$ (5) $\displaystyle S_{FW}=\sum\limits_{i=1}^{c}{\sum\limits_{x_{k}\in X_{i}}{\left(% {x_{k}-\bar{{m}}_{i}}\right)}}\left({x_{k}-\bar{{m}}_{i}}\right)^{T}$ (6)

where $\bar{m}$ is a mean vector of training set $X$ .

The optimal fuzzy projection $W_{\textit{FLDA}}$ and the final transformation are given by the following expressions:

$\displaystyle W_{\textit{FLDA}}=\text{arg }\mathop{\text{max}}\limits_{W}\frac% {\left|{W^{T}S_{FB}W}\right|}{\left|{W^{T}S_{FW}W}\right|}$ (7) $\displaystyle W^{T}=W_{\textit{FLDA}}^{T}W_{\textit{PCA}}^{T}$ (8)

The feature vectors $y=\left\{{y_{1},y_{2},\cdots,y_{N}}\right\}$ transformed by the FLDA method can be calculated as follows:

$\displaystyle y_{j}=W_{\textit{FLDA}}^{T}x_{j}=W_{\textit{FLDA}}^{T}W_{\textit% {PCA}}^{T}\left({z_{j}-\bar{{z}}}\right)$ (9)

where $\bar{z}$ describes a mean facial image in the training set $Z$ .

2.2 Analysis shortcomings

Face recognition is a complex pattern recognition problem in which face images involve many variations such as facial expression, illumination, pose, and so on. All these non-ideal conditions will produce some outliers in training set. In fact, only a part of the image samples are available for training per class, so it is difficult to give the accurate class mean estimated by the class sample average. As we have found from the above analysis of the FLDA model, the class mean is applied to the definitions of the fuzzy scatter matrices $S_{FB}$ and $S_{FW}$ . Hence, the class mean has a great impact on the projection directions of FLDA, and ultimately affect the robustness of FLDA models.

From the previous section, the membership function of the training vectors used to calculate the class mean, is calculated by Eq. (3), which was proposed by Keller et al. [11]. The membership degree is calculated by weighting the contribution of the k-nearest neighbor vectors, the dominant membership is assigned an offset of 0.51 and only to ensure that the dominant membership remains intact. However, there is no reason reported in [11, 13, 23, 24, 26] for assigning this particular value of offset. Therefore, it appears that the value of the offset in assigning the membership grades will have some influence of artificial factors on the recognition rate. A genetic algorithm is employed to optimize these parameters of the membership functions in [1]. However, it is well-known that the genetic algorithm consumes more time to perform the search and is easy to fall into local optimum. Based on what we have analyzed, we incorporate the proposed supervised IT2FCM into traditional LDA to reduce these outer effects to obtain the correct local distribution information to ensure good performance, and called IT2FLDA method. The proposed IT2FLDA is able to minimize the effects of uncertainties, find the optimal projective directions and make the feature subspace discriminating and robust, which inherits the benefits of type-2 fuzzy theory. The next section will provide details of our proposed IT2FLDA method.

3. The IT2FLDA method

In this section, the supervised IT2FCM is proposed for calculating the fuzzy membership degree and the mean vector of each class firstly; secondly, the proposed supervised IT2FCM is incorporated into traditional LDA for building IT2FLDA model; then, the design scheme of the IT2FLDA method is described; Finally, the influence of parameters of IT2FLDA method are analyzed.

3.1 The fuzzy membership degree and the mean vector of each class

In IT2FLDA method, the fuzzy membership degree and the mean vector of each class can be gained by our proposed supervised IT2FCM. The supervised IT2FCM algorithm is an extension of the IT2FCM algorithm [4, 7, 17], which introduces the classified information to the IT2FCM algorithm. It uses some known information in facial features, which improves the unsupervised IT2FCM. Therefore, the supervised IT2FCM is able to use the classified information and handle the uncertainty found in a given set of feature vectors during the process of feature clustering. It makes feature clustering less susceptible to noise, which achieves the goal that feature vectors can be clustered more appropriately and accurately, especially when pattern distributions contain partitions of different size volumes.

Given a set of feature vectors $X=\left\{{x_{1},x_{2},\cdots,x_{N}}\right\}$ transformed by the PCA. The number of feature vectors is equal to $N$ and they belong to $c$ classes. Due to considering the classified information, initialize the fuzzy cluster centers $V=\left\{{v_{1},v_{2},\cdots,v_{c}}\right\}$ using the following equations.

$\displaystyle v_{i}=\frac{1}{N_{i}}\sum\limits_{x_{j}\in X_{i}}{x_{j}}\quad j=% 1,2,\cdots,N_{i},i=1,2,\cdots,c$ (10)

where $N_{i}$ is the number of samples in class $X_{i}$ . In doing so, it can be easy to get the optimal fuzzy cluster centers and reduce iterations.

The supervised IT2FCM algorithm using two fuzzification weighting exponents $p_{1}$ and $p_{2}$ $(1<p_{1}<\linebreak p_{2}<\infty)$ describe the uncertainty of the choice of fuzzification weighting exponent parameter. $p_{1}$ and $p_{2}$ are important parameters, which reflect the fuzziness of the input data and influence the final clustering results. Figure 1 shows effects on fuzzy clustering by fuzzification weighting exponent. As an example, Fig. 1 shows two clusters of different volumes $C_{1}$ and $C_{2}$ . The shadow areas in the figure can be considered as a decision boundary. This boundary can be expanded by fuzzification weighting exponent. The width of the boundary indicates the range one desires to assign maximally fuzzy memberships (based on relative distance) to the pattern. It is unable to obtain the proper decision boundary relying on a single specific $p$ for two clusters of different volumes as shown in Fig. 1a. According to the distribution of the patterns, the desirable decision boundaries are obtained with two different $p$ . Therefore, the ideal situation is to have the maximum fuzzy region with a wide left region and narrow right region as shown in Fig. 1b.

Figure 1.

Effects on fuzzy clustering by fuzzification weighting exponent. (a) Decision boundary for two clusters of different volumes with a single specific $p$ . (b) Decision boundary for two clusters of different volumes with $p_{1},p_{2}$ .

Because $p_{1},p_{2}$ represent different fuzzy degrees, we should consider memberships for an input data as fuzzy instead of as crisp in the FCM. For this reason $\mu_{ij}$ would be given by the belonging interval $[\underline{\mu}_{ij},\bar{\mu}_{ij}]$ , where $\underline{\mu}_{ij}$ and $\bar{\mu}_{ij}$ stand for the lower and upper fuzzy membership degrees of $j$ th vector in the $i$ th class. Update the lower and upper limits of the range of the fuzzy partition matrixes $\underline{U}=[\underline{\mu}_{ij}]\in R^{c\times N}$ , $\bar{U}=[\bar{\mu}_{ij}]\in R^{c\times N}$ can be expressed as:

$\displaystyle\underline{\mu}_{ij}=\min\left\{{\left[{\sum\limits_{k=1}^{c}{% \left({\frac{d_{ij}}{d_{kj}}}\right)^{\frac{2}{(p_{1}-1)}}}}\right]^{-1},\quad% \left[{\sum\limits_{k=1}^{c}{\left({\frac{d_{ij}}{d_{kj}}}\right)^{\frac{2}{(p% _{2}-1)}}}}\right]^{-1}}\right\}$ (11) $\displaystyle\bar{{\mu}}_{ij}=\max\left\{{\left[{\sum\limits_{k=1}^{c}{\left({% \frac{d_{ij}}{d_{kj}}}\right)^{\frac{2}{(p_{1}-1)}}}}\right]^{-1},\quad\left[{% \sum\limits_{k=1}^{c}{\left({\frac{d_{ij}}{d_{kj}}}\right)^{\frac{2}{(p_{2}-1)% }}}}\right]^{-1}}\right\}$ (12)

where $i=1,2,\cdots,c$ and $j=1,2,\cdots,N$ , $c$ is the number of clusters, $d_{ij}$ is the Euclidean distance between feature vector $x_{j}$ and each cluster center $v_{i}$ . The distance matrix is defined as $D=[d_{ij}]\in R^{c\times N}$ . Figure 2 illustrates an example of an interval type-2 fuzzy set according to Eqs (11) and (12) for a two cluster case.

Figure 2.

Footprint of uncertainty of an interval type-2 fuzzy set for $p_{1}=1.5$ and $p_{2}=5.0$ .

In the process of calculating the lower and upper membership degrees $\underline{\mu}_{ij}$ and $\bar{\mu}_{ij}$ , it is divided into the following two cases: If $c<t$ ( $t$ is threshold value), then using Eqs (11) and (12) directly for calculating the lower and upper membership degrees; If $c>t$ , then we only consider a vector to contribute the first $t_{1}\left({t_{1}\leqslant t}\right)$ largest share of the clustering centers. After that, we reserve the first $t_{1}$ minimum distance from each column of distance matrix $D$ , and collect the indexes of each column corresponding to the reserved distances. The reserved distance matrix is defined as ${D}^{\prime}\in R^{t_{1}\times N}$ . Calculate the lower and upper membership degrees using Eqs (11) and (12), where $k=1,2,\cdots,t_{1}$ . The obtain the lower and upper membership degrees are located in the collected indexes of each column of $\underline{U}\in R^{c\times N}$ , $\bar{U}\in R^{c\times N}$ , and the rest of the elements of membership degrees matrixes set zero.

In order to ensure that the sample has the maximum degree of membership in its own class, the lower and upper membership degree $\underline{\mu}_{ij}$ and $\bar{\mu}_{ij}$ are adjusted according to the category information. If $\underline{\mu}_{ij}$ is the maximum degree of membership, when $i$ is the same as the label of the $j$ th vector, the value of $\underline{\mu}_{ij}$ is retained; If $\underline{\mu}_{ij}$ is not the maximum degree of membership, when $i$ is the same as the label of the $j$ th vector, we must find the maximum degree of membership $\underline{\mu}_{kj}\left({k\neq i}\right)$ and exchange value of them. The procedure for adjustment $\bar{\mu}_{ij}$ is the same as $\underline{\mu}_{ij}$ .

The procedure for updating the type-2 fuzzy cluster center matrix $\tilde{V}(\tilde{V}=[V_{L},V_{R}])$ in the supervised IT2FCM algorithm should take into account the type-2 fuzzy partition matrix $\tilde{U}=\left[{\underline{U},\bar{U}}\right]$ . The interval of fuzzy cluster centers $V_{L}=\left\{{v_{1L},\cdots,v_{cL}}\right\}$ and $V_{R}=\left\{{v_{1R},\cdots,v_{cR}}\right\}$ will be given by the following equations:

$\displaystyle v_{iL}=\frac{\sum\nolimits_{j=1}^{L_{i}}{\bar{{\mu}}^{p}_{ij}x_{% j}}+\sum\nolimits_{j=L_{i}+1}^{N}{\underline{\mu}^{p}_{ij}x_{j}}}{\sum% \nolimits_{j=1}^{L_{i}}{\bar{{\mu}}^{p}_{ij}+}\sum\nolimits_{j=L_{i}+1}^{N}{% \underline{\mu}^{p}_{ij}}}$ (13) $\displaystyle v_{iR}=\frac{\sum\nolimits_{j=1}^{R_{i}}{\underline{\mu}^{p}_{ij% }x_{j}}+\sum\nolimits_{j=R_{i}+1}^{N}{\bar{{\mu}}^{p}_{ij}x_{j}}}{\sum% \nolimits_{j=1}^{R_{i}}{\underline{\mu}^{p}_{ij}+}\sum\nolimits_{j=R_{i}+1}^{N% }{\bar{{\mu}}^{p}_{ij}}}$ (14)

where $N$ is the number of feature vectors, and $p={\left({p_{1}+p_{2}}\right)}\mathord{\left/{\vphantom{{\left({p_{1}+p_{2}}% \right)}2}}\right.\kern-1.2pt}2$ . The detailed enhanced Karnik-Mendel (EKM) algorithm for computing $v_{iL}$ and $v_{iR}$ will be described later.

The interval of the coordinates for cluster centers is obtained. They are defuzzified by using the average of $v_{iL}$ and $v_{iR}$ . Hence, the crisp cluster centers and the fuzzy membership degrees are obtained by the defuzzification as shown in the following equations:

$\displaystyle v_{i}=\frac{v_{iL}+v_{iR}}{2}$ (15) $\displaystyle\mu_{ij}=\frac{\underline{{\mu}}_{ij}+\bar{{\mu}}_{ij}}{2}$ (16)

Based on all this, the supervised IT2FCM algorithm consists of the following steps in Fig. 3.

Figure 3.

The step of the supervised IT2FCM algorithm.

The procedure of our proposed supervised IT2FCM is illustrated in Fig. 4.

Figure 4.

The procedure of the proposed supervised IT2FCM.

In the step 3, the EKM algorithm [6] is used for updating the type-2 fuzzy cluster center matrix $\tilde{V}(\tilde{V}=[V_{L},V_{R}])$ . It consists of two parts: one for computing $V_{L}$ and the other for computing $V_{R}$ . The procedure of the EKM algorithm for finding minimum and maximum of fuzzy cluster centers in Figs 5 and 6.

Figure 5.

EKM algorithm for finding minimum of fuzzy cluster center.

Figure 6.

EKM algorithm for finding maximum of fuzzy cluster center.

By applying above steps, $V_{L}=\left\{{v_{1L},\cdots,v_{cL}}\right\}$ , $v_{iL}=\left({v_{i1L},\cdots,v_{iML}}\right)$ and $L_{i}=\left({L_{i1},\cdots,L_{iM}}\right)$ are obtained.

The maximum of fuzzy cluster center $v_{iR}$ can be obtained using the previous procedure and replacing the second “FOR statement”.

By applying above steps, $V_{R}=\left\{{v_{1R},\cdots,v_{cR}}\right\}$ , $v_{iR}=\left({v_{i1R},\cdots,v_{iMR}}\right)$ and $R_{i}=\left({R_{i1},\cdots,R_{iM}}\right)$ are obtained.

3.2 The IT2FLDA model

In the previous subsection, the fuzzy mean vector of each class $\tilde{m}_{i}$ and the fuzzy membership matrix $U$ can be achieved with the result of the proposed supervised IT2FCM by Eqs (15) and (16).

$\displaystyle U=[\mu_{ij}]\quad i=1,2,\cdots,c,\quad j=1,2,\cdots,N$ (17) $\displaystyle\tilde{{m}}_{i}=v_{i}\quad i=1,2,\cdots,c$ (18)

The membership degree of each vector (contribution to each class) is considered and the corresponding fuzzy within-class scatter matrix and fuzzy between-class scatter matrix are redefined as follows:

$\displaystyle\tilde{{S}}_{FB}=\sum\limits_{i=1}^{c}{\sum\limits_{j=1}^{N}{\mu_% {ij}^{p_{3}}\left({\tilde{{m}}_{i}-\bar{{m}}}\right)}}\left({\tilde{{m}}_{i}-% \bar{{m}}}\right)^{T}$ (19) $\displaystyle\tilde{{S}}_{FW}=\sum\limits_{i=1}^{c}{\sum\limits_{x_{j}\in X_{i% }}{\mu_{ij}^{p_{3}}\left({x_{j}-\tilde{{m}}_{i}}\right)}}\left({x_{j}-\tilde{{% m}}_{i}}\right)^{T}$ (20)

where $\bar{m}$ is a mean vector of training set $X$ , $p_{3}$ is a constant which controls the influence of the fuzzy membership degree.

The optimal interval type-2 fuzzy projection $W_{\textit{IT2FLDA}}$ is given by the following expressions:

$\displaystyle W_{\textit{IT2FLDA}}=\text{arg }\mathop{\text{max}}\limits_{W}% \frac{\left|{W^{T}\tilde{{S}}_{FB}W}\right|}{\left|{W^{T}\tilde{{S}}_{FW}W}% \right|}$ (21)

where $W_{\textit{IT2FLDA}}=\left[{\tilde{w}_{1}\;\tilde{w}_{2}\cdots\tilde{w}_{c-1}}\right]$ is the set of generalized eigenvector of $\tilde{S}_{FB}$ and $\tilde{S}_{FW}$ corresponding to the $c-1$ largest eigenvalues $\left\{{\lambda_{k}\left|{k=1,2,\cdots,c-1}\right.}\right\}$ , that is

$\displaystyle\tilde{{S}}_{FB}w_{k}=\lambda_{i}\tilde{{S}}_{FW}w_{k}\quad k=1,2% ,\cdots,c-1$ (22)

However, the rank of $\tilde{S}_{FB}$ is $c-1$ because it is the sum of “ $c$ ” matrices of rank one. Similarly, the rank of $\tilde{S}_{FW}$ is at most $N-c$ . In general, the number of images $N$ is much smaller than the number of pixels $n$ in each image, therefore $\tilde{S}_{FW}\in R^{n\times n}$ is always singular. The above problem can be solved by using PCA to reduce the dimension of the feature space less than $N-c$ .

Therefore, the optimal projection $\tilde{W}_{\textit{opt}}$ is given by

$\displaystyle\tilde{{W}}_{\textit{opt}}^{T}=W_{\textit{IT2FLDA}}^{T}W_{\textit% {PCA}}^{T}$ (23)

where

$\displaystyle W_{\textit{PCA}}=\text{arg }\mathop{\text{max}}\limits_{W}\left|% {W^{T}S_{T}W}\right|$ (24) $\displaystyle S_{T}=\frac{1}{N}\sum\nolimits_{i=1}^{N}{(z_{i}-\bar{{z}})}(z_{i% }-\bar{{z}})^{T}$ (25) $\displaystyle W_{\textit{IT2FLDA}}=\text{arg }\mathop{\text{max}}\limits_{W}% \frac{\left|{W^{T}W_{\textit{PCA}}^{T}\tilde{{S}}_{FB}W_{\textit{PCA}}W}\right% |}{\left|{W^{T}W_{\textit{PCA}}^{T}\tilde{{S}}_{FW}W_{\textit{PCA}}W}\right|}$ (26)

Note that the optimization $W_{\textit{PCA}}$ is the eigenvector of $S_{T}$ corresponding to the $k(k\leqslant N-c)$ largest eigenvalues. The optimization for $W_{\textit{PCA}}$ is performed over $n\times k$ matrices, while the optimization for $W_{\textit{IT2FLDA}}$ is performed over $k\times(c-1)$ matrices.

Finally, the feature vectors $Y=\left\{{y_{1},y_{2},\cdots,y_{N}}\right\}$ transformed by IT2FLDA method can be calculated as follows:

$\displaystyle y_{j}=W_{\textit{IT2FLDA}}^{T}x_{j}=W_{\textit{IT2FLDA}}^{T}W_{% \textit{PCA}}^{T}\left({z_{j}-\bar{{z}}}\right)$ (27)

where $\bar{z}$ describes a mean facial image in the training set $Z$ .

3.3 The design scheme of the proposed algorithm

In this section, the design scheme of the proposed algorithm is presented in detail in Fig. 7. Our overall proposed IT2FLDA method can be summarized in Fig. 8.

Figure 7.

The step of the IT2FLDA algorithm.

Figure 8.

A diagram of computing for the IT2FLDA method.

3.4 Parameters analysis

In this section, the influence of parameters of IT2FLDA method on the face recognition rate will be analyzed. Experiments are performed on ORL face database. The ORL database comprises ten different images of each of 40 distinct subjects. For each subject, we randomly selected 5 images for training with the remaining images for testing.

Table 1
The influence of parameters $p_{1},p_{2}$ on the recognition rate

[height=0.8cm,width=2.4cm]p ${}_{2}$ p ${}_{1}$	1.5	2.5	3.5	4.5	5.5	6.5	7.5	8.5
2.5	96.5	–	–	–	–	–	–	–
3.5	97.0	96.5	–	–	–	–	–	–
4.5	95.0	97.0	96.0	–	–	–	–	–
5.5	94.5	96.5	95.5	94.5	–	–	–	–
6.5	95.0	96.0	95.5	95.5	94.5	–	–	–
7.5	95.0	96.0	96.0	95.0	95.5	94.5	–	–
8.5	95.0	95.5	96.0	95.5	95.5	95.0	94.5	–
9.5	94.5	95.0	96.0	95.5	95.0	95.5	95.0	94.5

Figure 9.

The influence of parameter $t_{1}$ on the recognition rate.

The influence of parameter $t_{1}$ on the recognition rate is shown in Fig. 9. The experiment was performed for the IT2FLDA method with the values of $t_{1}$ taken in the entire range from 2 to 40 (since there are a total of 40 classes), for a fixed set of $p_{1}=1.5,p_{2}=2.5,p_{3}=2$ . Observe the influence of the change of the value of $t_{1}$ of recognition rate. The result, as shown in Fig. 9, demonstrates that with the number of $t_{1}$ increasing, the recognition rate declines. The main reasons are as follow. Each cluster center is calculated using all training data. In the case of too many categories, the membership degree of each vector has too small value in each class according to Eqs (11) and (12). This will lead to weak discrimination of the samples in different classes, and the clustering centers are easy to be interfered by samples of other classes. Therefore, we only consider a vector to contribute the first $t_{1}$ largest share of the clustering centers. In this way, samples can be clustered more appropriately and more accurately.

Table 1 shows the effect of the values of $p_{1},p_{2}\left({p_{1}<p_{2}}\right)$ in the range from 1.5 to 9.5 with a step size of 1 on the recognition rate, where $t_{1}=5$ and $p_{3}=2$ . There is no obvious regularity of parameters selection. However, as the values of $p_{1},p_{2}$ continue to increase, the best recognition rate is more difficult to achieve. The reasons are as follow. If the values of $p_{1},p_{2}$ are increased then the decision boundary becomes wider. This will lead to more samples are included in the decision boundary, which make them have similar membership degree. It can reduce the influence of uncertain factors during the update the cluster centers, but also make weak discrimination of the samples in different class. Therefore, we should choose proper $p_{1},p_{2}$ according the specific situation.

Figure 10.

The influence of parameters $p_{3}$ on the recognition rate.

Figure 10 shows the effect of value of $p_{3}$ in range from 2 to 9 with a step size of 0.5 on the recognition rate, where $p_{1}=2.5,p_{2}=4.5$ and $t_{1}=5$ . From Fig. 10, it can be seen that with the value of $p_{3}$ in Eqs (19) and (20) increasing, the recognition rate decreases greatly. The maximum recognition rates are achieved with the values of $p_{3}=2$ and 2.5.

From the above analysis of parameters of IT2FLDA method, it can be concluded that parameters selection has influence on the face recognition rate to some extent. Understanding the role of parameters of IT2FLDA method can help us to choose parameters, which still has some rules to follow.

4. Experimental results

In this section, we verify the performance of the IT2FLDA algorithm from recognition accuracy and the robustness in face recognition and gender classification. We elaborate on the experimental results for a number of well-known and commonly used face databases. In all scenarios, the results of the IT2FLDA algorithm are compared to the state-of-the-art techniques. Experiments are performed on a personal computer with Intel Core i3 CPU at 3.10 GHz and 2.00 GB RAM. All the algorithms have been implemented using MATLAB programming language.

4.1 Face recognition

The proposed algorithm for face recognition is tested on the ORL [18] and Yale [30] face databases. In order to clearly illustrate the advantage of the proposed method, we compare IT2FLDA with LDA [21], FLDA [13] and F2DPCA [26] methods. In all face recognition experiments, we first reduce the dimension of the training set using PCA and determine $N-c$ eigenvectors representing the best performance; then, we extract discriminant vectors using IT2FLDA. The number of discriminant vectors is set up to be $c-1$ . The parameters of the supervised IT2FCM are chosen with $p_{1}=1.5$ , $p_{2}=2.5$ , $p_{3}=2$ and $t_{1}=4$ . Last, we use the nearest neighbor algorithm as classifier.

4.1.1 ORL face databases

The ORL database comprises ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). Each image was digitized and stored as a 112 $\times$ 92 pixel array whose gray levels ranged between 0 and 255. Some images of one subject in the ORL database are shown in Fig. 11. For each subject, we randomly selected 5 images for training with the remaining images for testing. This procedure was repeated 20 times by randomly choosing different training and testing sets.

Figure 11.

The ORL face database.

The ORL database is used to evaluate the performance of IT2FLDA method under conditions where the pose and facial details are varied. The recognition rates for various feature extraction methods are shown in Fig. 12. Table 2 contains a comparative analysis of the mean and standard deviation for the obtained recognition rates. It can be seen that the IT2FLDA achieves the average recognition rate of 93.45% and gets the best result. F2DPCA achieves the average recognition rate of 93.4%. FLDA and LDA achieve the average recognition rates of 92.05% and 91.35%, respectively. The average recognition rate of F2DPCA is higher than that of FLDA and LDA in ORL database. The main reason is that the images in ORL database vary slightly and F2DPCA does not need to transform matrix into vector, which can save useful structural information embedding in the original images. Also, it can be seen that the IT2FLDA has minimum standard deviation compare with FLDA, LDA and F2DPCA. Hence, the proposed IT2FLDA algorithm outperforms other methods.

Table 2

Comparison of the mean and standard deviation for recognition rates on the ORL

Method	Average recognition rate	Min recognition rate	Max Recognition rate
IT2FLDA	93.45 $\pm$ 1.54	91.50	96.50
FLDA [10]	92.05 $\pm$ 1.65	88.00	95.50
LDA [19]	91.35 $\pm$ 2.03	86.50	94.00
F2DPCA [26]	93.40 $\pm$ 1.94	90.00	96.00

Figure 12.

The recognition rates of various methods on ORL.

4.1.2 Yale face databases

The Yale face database contains 165 face images of 15 individuals. There are 11 images per subject, one for each facial expression or configuration: center-light, glasses/no glasses, happy, normal, left-light, right-light, sad, sleepy, surprised and wink. Each image was digitized and presented by a 231 $\times$ 195 pixel array whose gray levels ranged between 0 and 255. Some of face images in the Yale databases are shown in Fig. 13. For each subject, we randomly selected 5 images for training with the remaining images for testing. This procedure was repeated 20 times by randomly choosing different training and testing sets.

Figure 13.

The Yale face database.

The Yale database is used to examine the performance of IT2FLDA method when both facial expressions and illumination are varied. The recognition rates for various feature extraction methods are shown in Fig. 14. Table 3 contains a comparative analysis of the mean and standard deviation for the obtained recognition rates. We can see that the IT2FLDA achieves the average recognition rate of 97.33% and gets the best result. FLDA and LDA achieve the average recognition rates of 96.88% and 95.33%, respectively. F2DPCA achieve the average recognition rates of 84.05%, and gets the worst result. The reason is that F2DPCA is sensitive to substantial variations in light direction and facial expression. Although the IT2FLDA is not significant increase on the average recognition rate, it substantially reduces the standard deviation. That is because type-2 fuzzy sets revealed more robust characteristics, especially the uncertainty occurring, such this case that the illumination and facial expressions variations result in face images belonging to different classes to have similar features. Hence, IT2FLDA has better performance as compared to other methods.

Table 3

Comparison of the mean and standard deviation for recognition rates on the Yale

Method	Average recognition rate	Min recognition rate	Max recognition rate
IT2FLDA	97.33 $\pm$ 1.50	95.55	100
FLDA [10]	96.88 $\pm$ 1.89	93.33	98.88
LDA [19]	95.33 $\pm$ 2.23	91.11	97.77
F2DPCA [26]	84.05 $\pm$ 2.42	80.00	88.88

Figure 14.

The recognition rates of various methods on Yale.

4.2 Gender classification

We apply the proposed algorithm to gender classification on the LFW [14] and AR [2] face databases. In all gender classification experiments, we first reduce the dimension of the training set using PCA and determine 100 eigenvectors representing the best performance; then, we extract discriminant vectors using IT2FLDA. Since it is a 2-class classification problem, one basis discriminant vector is sufficient for efficient classification. The parameters of the supervised IT2FCM are chosen with $p_{1}=1.5$ , $p_{2}=2.5$ and $p_{3}=2$ . Considering the uneven distribution of samples, we use Twin Support Vector Machines (Twin SVM) [12] instead of the nearest neighbor algorithm as classifier.

4.2.1 LFW face database

LFW face database contains images of 5749 different individuals. Of these, 1680 people have two or more images in the database. The remaining 4069 people have just a single image in the database. These images are collected from the web, which has the characteristics of pose complexities, rich facial expressions, illumination variation and so on. In this experiment, 5749 images are chosen from 5749 individuals (with only one image per person), 4257 of which belong to class male and 1492 to class female. We randomly select 50% images (2129 males and 746 females) as the training data, and the remaining part (2128 males and 746 females) are used for the test. Original images were normalized (in scale and orientation) such that the two eyes were aligned at the same position. The size of each cropped image is 50 $\times$ 50. Some of face images in the LFW face database are shown in Fig. 15. This procedure was repeated 20 times by randomly choosing different training and testing sets.

Table 4
Comparison of the mean and standard deviation for gender recognition rates on LFW

Method	Average recognition rate	Male recognition rate	Female recognition rate
IT2FLDA	85.05 $\pm$ 0.43	85.31 $\pm$ 0.78	82.05 $\pm$ 1.36
FLDA [10]	84.15 $\pm$ 0.46	85.16 $\pm$ 0.78	81.29 $\pm$ 1.19
LDA [19]	84.43 $\pm$ 0.50	85.22 $\pm$ 0.76	82.19 $\pm$ 1.37
F2DPCA [26]	82.37 $\pm$ 0.75	84.73 $\pm$ 1.22	75.64 $\pm$ 1.52

Figure 15.

The LWF face database.

Figure 16.

The gender recognition rates of various methods on LFW.

Table 5

Comparison of the mean and standard deviation for gender recognition rates on AR

Method	Average recognition rate	Male recognition rate	Female recognition rate
IT2FLDA	90.76 $\pm$ 1.62	90.79 $\pm$ 2.93	90.73 $\pm$ 3.75
FLDA [10]	89.81 $\pm$ 2.07	90.53 $\pm$ 3.20	88.96 $\pm$ 5.15
LDA [19]	88.68 $\pm$ 2.11	88.86 $\pm$ 3.12	88.46 $\pm$ 5.78
F2DPCA [26]	86.65 $\pm$ 2.54	92.20 $\pm$ 2.78	80.06 $\pm$ 5.83

Figure 17.

The non-occluded subset of AR database.

Figure 18.

The gender recognition rates of various methods on AR.

The gender recognition rates for various feature extraction methods are shown in Fig. 16. Table 4 contains a comparative analysis of the mean and standard deviation for the obtained gender recognition rates. The proposed IT2FLDA method outperforms other methods. IT2FLDA achieves the average gender recognition rates of 85.05% and gets the best result. FLDA, LDA and F2DPCA achieve the average gender recognition rates of 84.15%, 84.43% and 82.37%, respectively. From the Table 4, it can be seen that there is a big gap, at least 9 percent, between male recognition rate and female recognition rate in F2DPCA method. Since F2DPCA is an unsupervised algorithm, it does not use the class information that leads to extract vectors with weaker ability of classification. Moreover, the average gender recognition rate of LDA is slightly better than that of FLDA. The reason is that FLDA use the k-nearest neighbor, which is influenced by the uneven distribution of samples. In this experiment, the number of male samples is much greater than female one. FLDA and F2DPCA methods get the membership degree of each face pattern to a class based upon the k-nearest neighbor, which is easily disturbed by male samples and makes the values of membership degree inaccurate.

4.2.2 AR database

The AR database consists of over 4000 frontal images from 126 individuals. For each individual, 26 pictures were taken in two separated sessions. We chose a non-occluded subset (14 images per subject) of AR consisting of 65 males and 55 females to conduct experiments of gender classification. We randomly select 33 males and 28 females were used for training, and the remaining 32 males and 27 females for testing. The size of each image is 50 $\times$ 40. Some images of one subject in the AR database are shown in Fig. 17. This procedure was repeated 20 times by randomly choosing different training and testing sets.

The AR database was employed to test the performance of the IT2FLDA under conditions where there is a variation over time, in facial expressions, and in lighting conditions. The gender recognition rates for various feature extraction methods are shown in Fig. 18. Table 5 contains a comparative analysis of the mean and standard deviation for the obtained gender recognition rates. IT2FLDA achieves the average gender recognition rate of 90.76% and gets the best result. FLDA, LDA and F2DPCA achieve the average gender recognition rates of 89.81%, 88.68% and 86.65%, respectively. In F2DPCA method, there is a big gap, at least 12 percent, between male recognition rate and female recognition rate. Since F2DPCA does not obtain good discriminating feature vectors. IT2FLDA has less standard deviation than other methods. IT2FLDA is able to make the feature subspace discriminating and robust. Hence, the proposed IT2FLDA method outperforms other methods.

Table 6
The mean and standard deviation for recognition rates on Yale with “salt and pepper” noise

Method	Average recognition rate	Min recognition rate	Max recognition rate
IT2FLDA	93.05 $\pm$ 2.33	88.88	96.66
FLDA [10]	90.66 $\pm$ 2.91	85.55	96.66
LDA [19]	89.16 $\pm$ 3.09	82.22	94.44
F2DPCA [26]	81.88 $\pm$ 2.99	76.66	87.77

Figure 19.

The recognition rates of various methods on Yale with “salt and pepper” noise.

4.3 The robustness of the algorithm under noise environment

In order to verify the robustness of the proposed method, experiments are conducted under “salt and pepper” noise environments on Yale database. All images incorporate “salt and pepper” noise, whose density is 0.1. This procedure was repeated 20 times by randomly choosing different plus noise training sets and testing sets.

The recognition rates for various feature extraction method are shown in Fig. 19. Table 6 contains a comparative analysis of the mean and standard deviation for the recognition rates. From Table 6, we can see that the recognition rates go down and have lager standard deviation under noise. From the perspective of descent rate and standard deviation, the three fuzzy methods perform better than LDA. In LDA method, the sample must be fully belongs to one class. It means that the sample fully contribute to the class mean calculation, even through the sample stay away from other samples of this class. That caused the class mean is not exact. In the fuzzy methods, the sample belongs to one class according to its fuzzy membership degree. Hence, if the sample is isolated, it has less contribution to computing the class mean. It can reduce the influence and make the class mean to be more reasonable.

5. Conclusions

In this paper, the IT2FLDA method for face recognition is proposed. We first propose the supervised IT2FCM, which introduces the classified information to the IT2FCM algorithm. Then the supervised IT2FCM is incorporated into traditional LDA to reduce these outer effects to obtain the correct local distribution information to ensure good performance. Experimental results show that IT2FLDA has high recognition rates compared with LDA, FLDA and F2DPCA when applied to a number of well-known face databases. It is worth stressing that IT2FLDA developed in the setting of type-2 fuzzy sets revealed more robust characteristics. The advantage is more obvious when the uncertainty occurs, such as the large variation of illumination, facial expression and noise environment. Moreover, IT2FLDA yields a better performance can be attributed to the fact that the supervised IT2FCM can make feature clustering less susceptible to noise. The supervised IT2FCM achieves the goal that feature vectors can be clustered more appropriately and accurately, especially when pattern distributions contain partitions of different size volumes. The cluster centers have a great impact on the projection directions of IT2FLDA, and ultimately affect the robustness of IT2FLDA. Therefore, IT2FLDA is able to find the optimal projective directions by maximizing the ratio of fuzzy between-scatter matrix to fuzzy within-scatter matrix. The resulting embedding subspace has more discriminating and robustness.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61374194&No. 61403081), the National Key Science and Technology Pillar Program of China (No. 2014BAG01B03), the Natural Science Foundation of Jiangsu Province (No. BK20140638) and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

References

Khoukhi

and Ahmed

S.F.

, A genetically modified fuzzy linear discriminant analysis for face recognition, Journal of the Franklin Institute 348 (2011), 2701–2717.

Martinez

and benavente

, The AR face database, CVC Tech. Report No. 24, 1998.

et al., Singular value decomposition and local near neighbors for face recognition under varying illumination, Pattern Recognition 64 (2017), 60–83.

Hwang

and Rhee

C.H.

, Uncertain fuzzy clustering: interval type-2 fuzzy approach to C-means, IEEE Transactions on Fuzzy Systems 15 (2007), 107–120.

Qiu

et al., A modified interval type-2 fuzzy C-means algorithm with application in MR image segmentation, Pattern Recognition Letters 34 (2013), 1329–1338.

and Mendel

J.M.

, Enhanced karnik-mendel algorithms, IEEE Transactions on Fuzzy Systems 17 (2009), 923–934.

Rubio

and Castillo

, Interval type-2 fuzzy clustering for membership function generation, in: 2013 IEEE Workshop on Hybrid Intelligent Models and Applications, Singapore, 2013, pp. 13–18.

Song

F.X.

et al., Maximum scatter difference, large margin linear projection and support vector machines, Acta Automatica Sinica 30 (2004), 890–896.

Smith

and Hancock

, Facial gender classification using shape-from-shading, Journal of Image and Vision Computing 28 (2010), 1039–1048.

10.

Yang

and Yang

, Why can LDA be performed in PCA transformed space? Pattern Recognition 36 (2003), 563–566.

11.

Keller

J.M.

Gray

M.R.

and Givens

J.A.

, A fuzzy k-nearest neighbor algorithm, IEEE Transactions on Systems, Man and Cybernetics SMC-15 (1985), 580–585.

12.

Jayadeva Khemchandani

and Chandra

, Twin Support Vector Machines for Pattern Classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007), 905–910.

13.

Kwak

K.C.

and Pedrycz

, Face recognition using a fuzzy fisherface classifier, Pattern Recognition 38 (2005), 1717–1732.

14.

LFW database, http://vis-www.cs.umass.edu/lfw/.

15.

Berbar

M.A.

, Three robust features extraction approaches for facial gender classification, The Visual Computer 30 (2014), 19–31.

16.

Turk

and Pentland

, Face recognition using eigenfaces, in: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, Maui, USA, 1991, pp. 586–591.

17.

Linda

and Manic

, General type-2 fuzzy C-means algorithm for uncertain fuzzy clustering, IEEE Transactions on Fuzzy Systems 20 (2012), 883–897.

18.

ORL database, http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

19.

Melin

Mendoza

and Castillo

, An improved method for edge detection based on interval type-2 fuzzy logic, Expert Systems with Applications 37 (2010), 8527–8535.

20.

Melin

Gonzalez

C.I.

Castro

J.R.

et al., Edge-detection method for image processing based on generalized type-2 fuzzy logic, IEEE Transactions on Fuzzy Systems 22 (2014), 1515–1525.

21.

Belhumeur

P.N.

Hespanha

J.P.

and Kriegman

D.J.

, Eigenfaces vs. fisherfaces: recognition using class specific linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997), 711–720.

22.

Mozaffari

Behravan

and Akbari

, Gender classification using single frontal image per person: combination of appearance and geometric based features, in: 20th International Conference on Pattern Recognition, Istanbul, Turkey, 2010, pp. 1192–1195.

23.

Yang

Yan

Wang

and Yang

, Face Recognition using Complete Fuzzy LDA, in: 19th International Conference on Pattern Recognition, Tampa, USA, 2008, pp. 1–4.

24.

Yang

Yan

Zhang

and Sun

, Feature extraction based on fuzzy 2DLDA, Neurocomputing 73 (2010), 1556–1561.

25.

Fei

and Zhang

, Median MSD-based method for face recognition, Neurocomputing 72 (2009), 3930–3934.

26.

, Face Recognition Method Based on Fuzzy 2DPCA, Journal of Electrical and Computer Engineering 2014, Article ID 919041.

27.

Huang

, Clustering multi-typed objects in extended star-structured heterogeneous data, Intelligent Data Analysis 21 (2017), 225–241.

28.

and Du

, Indirect adaptive fuzzy observer and controller design based on interval type-2 T-S fuzzy model, Applied Mathematical Modelling 36 (2012), 1558–1569.

29.

et al., An interval type-2 T-S fuzzy classification system based on PSO and SVM for gender recognition, Multimedia Tools and Applications 75 (2016), 987–1007.

30.

Yale face database, http://cvc.yale.edu/projects/yalefaces/yalefaces.html.

A novel fuzzy linear discriminant analysis for face recognition

Abstract

Keywords

1. Introduction

2. Related works

2.1 The type-1 FLDA

3. The IT2FLDA method

3.1 The fuzzy membership degree and the mean vector of each class

Table 1 The influence of parameters p 1 , p 2 on the recognition rate

4.1 Face recognition

4.1.1 ORL face databases

4.2.1 LFW face database

Table 4 Comparison of the mean and standard deviation for gender recognition rates on LFW

Table 6 The mean and standard deviation for recognition rates on Yale with “salt and pepper” noise

5. Conclusions

Footnotes

Acknowledgments

References

Table 1
The influence of parameters $p_{1},p_{2}$ on the recognition rate

Table 4
Comparison of the mean and standard deviation for gender recognition rates on LFW

Table 6
The mean and standard deviation for recognition rates on Yale with “salt and pepper” noise