Non-negative matrix factorization with L 0 sparseness constraints and its applications to face recognition

Abstract

In order to solve the problem of unstable sparseness of non-negative matrix factorization (NMF), the improved NMF algorithms with L₀ sparseness constraints are proposed. With the constraining the L₀ norm of the coefficient matrix, we applied inverse matching principle into non-negative least square (ISNNLS) which enhances the reconstruction ability of the decomposition matrix. In addition, the L₀ sparseness constraints are added to the basis matrix. In the updating process, the proposed algorithm set the smallest value to zero by projecting the basis vectors onto the closest non-negative vector with the expected sparseness. The experimental results have illustrated that the proposed algorithm can achieve higher reconstruction quality and effectiveness compared with the other algorithms.

Keywords

Sparseness NMF sparse coding feature extraction

1 Introduction

Non-negative matrix factorization (NMF) [1] was proposed by Lee and Seung in 1999. Given a non-negative data matrix X, NMF factorizes it into non-negative basis matrix X and coefficient matrix H. The algorithm is simple and fast to converge. Also, the purely additive representation makes the analysis of data more reasonable and the description of the raw data more intuitive, endowing the basis matrix with the ability of local representation, which enables the NMF to become increasingly popular in many fields [2 –4]. By combining the NMF based image reconstruction with an eye filter, Park et al. [5] proposed a new method for eye detection. Considering the internal geometry of the data set, Cai et al. [6] presented graph-regularised NMF(GNMF) that describes the relation between adjacent data points in the data with nearest neighbor graph. To make full use of discriminant information and take the geometric structure of the data into account, the manifold-respecting discriminant NMF K nearest neighbour(NMF-K-NN) has been proposed in [7]. On this foundation, by introducing two novel fuzzy K nearest neighbour graphs, Jun et al. [8] presented NMF based on fuzzy K nearest neighbour graph(NMF-FK-NN) which reduces the effect of external factors on recognition. In [9], topology preserving non-negative matrix factorization(TPNMF) was described, which could maintain the local topological structure of the face space by minimizing the constraint gradient distance. Compared with the other decomposition algorithms such as principal component analysis(PCA) [10] or K-means clustering [11], the sparsity of NMF decomposition results gives it a special status. However, the sparsity of the NMF decomposition results is only a subsidiary product of non-negative constraints, and the degree of sparsity is difficult to control. Thus, the novel NMF algorithms that could enhance the sparsity has been extensively studied. A common method is to add the L₁ norm constraints on W or H which enables the solution of the model to achieve the desired sparsity [12, 13]. Actually, the L₀ norm can measure the sparsity of matrices more intuitively for it directly reflects the number of non-zero elements in matrices. The non-convexity of L₀ norm leads to the solution of a NP-hard problem. Nevertheless, Vavasis [14] has demonstrated that the decision problem of an exact NMF (X = WH) is NP-hard. An optimization algorithm for approximate NMF(X ≈ WH) ought to solve the exact NMF problem and the approximation version of exact NMF is also NP-hard. Hence, almost NMF solving algorithms are sub-optimal. We have to accept that the NMF with L₀ sparseness constraints could also be efficient and suitable compared with NMF based on L₁ sparseness constraints.

At present, a few NMF methods based on L₀ norm constraints are available. In [15], Morup et al. proposed a novel NMF with approximate L₀ constraints that incorporates both least angle regression and selection algorithm with non-negative constraints [16] and normalization invariant updating rules [17]. By introducing the smooth L₀ norm constraints into the NMF algorithm,Yang et al. [18] proposed NMF based on smooth L₀ norm constraints(NMF_S L₀). In this paper, we present two tactics to impose L₀ sparseness constraints by constraining on the basis matrix W or the coefficient matrix H. When the L₀ constraints imposed on H, it is difficult to find a good approximate solution for non-negative sparse coding that usually render a Np-hard problem. By incorporating active set algorithm for non-negative least squares (NNLS) [19] and orthogonal matching pursuit (OMP) [20] that the most common method for sparse coding without non-negative constraints, Robert et al. [21] proposed sparse NNLS (sNNLS).

Here, we use the idea of inverse matching to improve the non-negative least squares algorithm. In the step of matrix update, we can use the multiplicative update rules directly. When the L₀ constraints imposed on W, we project each column of the W onto the closest vector in each iteration, and make it satisfy the expected sparsity.

2 Sparse NMF

NMF is a matrix decomposition method for dealing with large-scale and high-dimensional data. Its essence is to map high-dimensional data to low dimensions under the non-negative constraints. In order to make the result of decomposition more sparse and less redundant, a large number of sparse non-negative matrix factorization algorithms have been proposed.

NMF decomposes the data matrix X into low rank non-negative factors W and H. The widely used error function to measure the error between X and WH is Euclidean distance [22] which has been given by $min f (W, H) = {∥ X - WH ∥}_{F}^{2} s . t W, H \geq 0$ (1)

Lee and Seung have given the multiplicative update rules [23] $H \leftarrow H \otimes \frac{W^{T} X}{W^{T} WH}$ (2) $W \leftarrow W \otimes \frac{{XH}^{T}}{{WHH}^{T}}$ (3)

In order to enhance the sparsity of NMF, Hoyer presented sparse NMF by introducing the sparseness constraints [24]. The objective function is defined as follows $\begin{matrix} m i n f (W, H) = ∥ X - WH ∥_{F}^{2} + λ \sum^{} i j | H_{i j} | \\ \forall i, j W_{i j} \geq 0, H_{i j} \geq 0, λ \geq 0 \end{matrix}$ (4)

He also gave the function to measure the sparsity of a vector [25] $sparse (x) = \frac{\sqrt{n} - (\sum {| x |}_{i}) / - \sqrt{\sum x_{i}^{2}}}{\sqrt{n} - 1}$ (5) where n denotes is the number of dimension of x. The range of this function is between zero and one. The more smaller value denotes that the vector is dense. If x contains only one non-zero entry, the function will return one. The function takes a value of zero if all the components are non-zero and equal.

3 Sparse NMF with L₀ constraints

Imposing sparse constraints to the NMF model is usually applied to the coefficient matrix H or the basis matrix W, and the two models are as follows $\begin{matrix} min {∥ X - WH ∥}_{F} s . t . W \geq 0, H \geq 0 \\ {∥ h_{i} ∥}_{0} \leq L, \forall i \end{matrix}$ (6) $\begin{matrix} min {∥ X - WH ∥}_{F} s . t . W \geq 0, H \geq 0 \\ {∥ w_{i} ∥}_{0} \leq L, \forall i \end{matrix}$ (7) which are marked as sNMFL₀_H and sNMFL₀_W, where h_i is the ith column of H and L denotesthe maximum number of non-zero elements in the vector. Each column of the raw data matrix which is denoted as x_i can be represented as a liner combination of the basis matrix and the vector h_i with r dimensions. After Introducing the sparseness constraints, the data vector x_i can be interpreted as a liner combination of L non-negative basis vectors. The basis matrix contains some essential characteristics of the data vector, and the smaller the reconstruction error is, the more important the extractedfeatures are.

The NMF algorithm generally updates W and H alternately by fixing the basis matrix or coefficient matrix and solving the other one which transforms the non-convex optimization problem into the convex optimization problem [26]. In this paper, we apply the principle to sNMFL₀_H and sNMFL₀_W.

3.1 sNMFL₀_H

The NMF with L₁ constraints is generally divided into two stages:sparse coding and matrix updating. SNMFL₀_H uses the same steps to solve model(6), as shown in Algorithm 1.

Algorithm 1. sNMFL₀_H

Step 1. Initialize the basis matrix randomly

Step 2. for i = 1 to num(number of iterations)

Step 3. Non-negative Sparse Coding: Fix basis matrix W and carry out non-negative sparse coding for data matrix X, producing a sparse non-negative cofficient matrix H.

Step 4. Matrix Updating:Maintaining the sparsity of H, update the W and H.

Step 5. End for

In coding phase, non-negative sparse coding is a Np-hard problem, and an approximate solution is needed. In the absence of non-negative constraints, orthogonal matching pursuit (OMP) is a common encoding method for its fast convergence and low computational complexity. Peharz et al. proposed a non-negative version matching pursuit (NMP) algorithm in [27]. Robert et al. [28] found the link between the OMP and the NNLS active-set algorithm which has been shown in the algorithm 2, and presented sparse NNLS.

Algorithm 2. Active-Set NNLS

Step 1. Initialization: Z = {1, 2, 3, …, K}, P = ∅, h = 0

Step 2. a = W^T (x - Wh)

Step 3. main loop: while Z ≠ ∅ and ∃ i ∈ Z : a_i > 0

Step 4. i^* is the ordinal number of the largest item in a. Move i^* from set Z to set P.

Step 5. $z_P = ((W_{P}^{T} W_{P})^{- 1} W_{P}^{T}) x$ , W_P is the matrix that W contains only columns indexed by set P.

Step 6. inner loop: white ∃ j ∈ P : z_j < 0

Step 7. $α = min_{k \in P} h_{k} / - {(h}_{k} - z_{k})$

Step 8. h ← h + α (z - h), update h.

Step 9. Move the index of the item in h equal to zero to the active set Z, and move the index greater than zero to the in-active set P.

Step 10. $z_{P} = ((W_{P}^{T} W_{P})^{- 1} W_{P}^{T}) x$ , update z_P and set z_Z to zero.

Step 11. h ← z

Step 12. a = W^T (x - Wh)

In the above algorithm, x is the column of data matrix and h is the column of coefficient matrix. The entries which are indexed by the elements of the active set Z remain zero while the entries indexed by the elements of the in-active set P are positive. In the inner loop(6–10), the possibly negative solution z is corrected to a non-negative one.

The Algorithm 2 adds basis vectors to the initial empty set. On the contrary, we remove the vector from the optimized non-sparse vector set. We call this method inverse sparse non-negative least squares (ISNNLS) algorithm which is shown in Algorithm 3. In the first step,we get the optimized but non-sparse solution from NNLS. When the number of non-zero items in the coefficient vector is greater than L, the minimum non-zero item in the vector is set to zero. At the same time, we move the index of entry which is minimum in the vector from in-active set P to active set Z. Then, the inner loop (6–10) in algorithm 2 is used to approximate data vector with the remaining vectors in set P.

Algorithm 3. ISNNLS

Step 1. The optimal solution of the non-sparse coefficient vector is obtained by using NNLS. The active set Z and in-active set P are initialized to store the index of the zero and positive items in the vector respectively.

Step 2. Set the minimum non-zero entry in the vector to zero and move the index from P to Z.

Step 3. Approximate data vector by performing steps of 6–10 in algorithm 2.

Step 4. If the number of non-zero items in the vector is less than or equal to the threshold, the algorithm terminates. Otherwise, jump to Step 2.

In the matrix update stage, we allow only the non-zero values of the coefficients to be changed in order to maintain the sparse structure of the coefficient matrix H. According to formulas (2) and (3), we notice that the zero entries in the original matrix are not changed after the update because of the matrix multiplication. In the meantime, the objective function ∥X - WH ∥ _F decreases as the iteration proceeds. Thus, we could adopt the unconstrained non-negative matrix factorization method to update the matrix. An improved NMF algorithm proposed by C. Lin [28], which uses projected gradient to optimize the NMF, has better convergence than the multiplicative iterative method. We use this method but some modifications to update W and H. Let active set Z′ indicate the zero values in H after sparse coding. When the active set algorithm is executed, we need to take notice of these zero items so as not to move the indices of zero items in Z′ to the in-active set. The algorithm terminates till the target function is less than the error threshold.

3.2 sNMFL₀_W

Hoyer proposed the NMF algorithm based on L₁ norm constraints, which introduces the gradient descent method. The algorithm projects the basis vector onto the closest non-negative vector under the constraints of expected L₁ sparseness. Inspired by this algorithm, we present sNMFŁ₀_W which is shown in algorithm 4. The first step is to calculate an optimal basis matrix W without constraints. At this point, the coefficient matrix is fixed. Subsequently, the basis vector is projected to the closest non-negative vector in Euclidean space. In order to achieve the expected L₀ constraints, the L largest entries of the vector is preserved, and the rest are set to zero. Then, we enhance the coefficient matrix H, where the non-zero items of W are allowed to be changed so as to maintain the sparse structure of W. The update method is similar to the 3.1 section.

Algorithm 4. sNMFL₀_W

Step 1. Initialize the coefficient matrix randomly

Step 2. for i = 1:num(number of iterations)

Step 3. Obtain the basis matrix W by performing NNLS.

Step 4. for j = 1: r(number of columns of W)

Step 5. Preserve the L largest items of w_j and set the remaining entries to zero.

Step 6. end

Step 7. Matrix Updating:Maintaining the sparsity of W, update the W and H.

Step 8. end

4 Experimental results

This section is divided into two experiments: non negative sparse coding and face feature extraction. In Section 4.2, we constructed sparse data by several sparse coding methods, and compared the effect of several sparse coding, the base vector recognition rate and running time. In Section 4.3, we execute sNMFL₀_W and NMF with L₁ constraints (sNMFL₁) for face feature extraction experiment on ORL and Yale face database, in order to compare the sparse ability, reconstruction quality and speed of the two methods.

4.1 Experimental data and environment

The ORL [29] face database consists of 40 individuals, each providing 10 images. Each image has 256 gray levels with 112×92 pixels. The images of this database vary greatly in facial expressions, facial details, and shooting angles.

The Yale [30] face database consists of 15 individuals, each providing 11 different images. All the images was taken down under different lighting conditions and shooting angles with the facial expression changed greatly. Each image has 100×100 pixels. The facial images from the ORL and Yale database are presented in Figs. 1 and 2.

Fig. 1

Ten images of the same person in the ORL face database.

Fig. 2

Face examples of one person in Yale database.

4.2 Non-negative sparse coding

In this section, we compare several non-negative sparse coding methods by constructing sparse data.We use the 100 dimensions overcomplete basis matrix which contains r ∈ {200, 400, 600} basis vectors. ISNNLS and several encoding methods (NMP [27], NLARS [15], sNNLS [20]) are compared under different number of basis vectors. We use isotropic Gauss noise to generate ten random basis matrices and normalize each vector. Furthermore, the ‘standard’ r × 100 coefficient matrix H that a comparison of the encoding matrix after decomposition is generated for each base matrix with the sparse factor L ∈ {5, 10, …, 50} changed from 5(very sparse) to 50(sparse). Each non-zero term in H is derived from the absolute value of samples of the Gauss distribution whose standard deviation is ten. We synthesize sparse data matrices X by using formulas X = WH and execute the several encoding methods mentioned above on each data set. In addition, considering the noise factor, we add evenly distributed noise in synthetic sparse data, SNR = 10 dB.

Figure 3 has shown the quality of reconstruction of different non-negative sparse coding. The reconstructed quality is measured by Formula (8). $SNR = \frac{10 \log_{10} {∥ X ∥}_{F}^{2}}{{∥ X - WH ∥}_{F}^{2}} dB$ (8)

Fig. 3

Quality of data reconstructed by different encoding methods.

We can derive from the graph that SNR shows an upward trend with the decrease of sparsity. The SNR curve of NLARS is obviously lower than other algorithms, which shows the worst reconstruction quality. The reconstruction quality of sNNLS is always better than NMP. ISNNLS has always maintained the highest SNR, which perform best.

Table 1 shows the percentage of basis vectors that correctly identified with different coding patterns. It can be seen that the recognition rate of several algorithms decreases with the reduction of sparsity, but the recognition rate of ISNNLS is always the highest under the condition of different sparse degree and the number of basis vectors.

Table 1

Correct recognition rates of different encoding methods(%)

Encoding	Number of	Sparse factor(L)
method	basis vectors
5	10	15	20	25	30	35	40	45	50
NMP	200	74.6	55.5	46.5	42.8	40.5	39.2	38.6	37.9	37.2	36.1
NLARS		67.2	46.3	38.2	35.1	34.8	35.5	36.1	37.7	39.2	40.8
sNNLS		75.3	56.8	48.1	43.6	42.0	40.6	40.2	41.0	41.6	42.0
ISNNLS		81.8	66.3	55.2	49.5	45.2	43.5	41.9	41.9	42.3	42.3
NMP	400	70.5	46.7	37.0	31.7	28.8	27.4	26.4	26.0	25.1	24.8
NLARS		63.0	38.0	28.4	24.9	22.9	22.8	23.4	23.4	24.5	25.5
sNNLS		71.2	48.5	37.8	32.4	29.3	28.5	27.5	27.3	27.3	27.9
ISNNLS		79.5	61.6	48.1	40.2	35.0	32.3	30.4	29.1	28.6	28.6
NMP	600	68.7	42.8	31.8	26.4	23.8	22.4	21.2	20.6	19.9	19.2
NLARS		60.9	34.5	24.8	20.4	18.2	18.2	17.6	17.9	18.4	18.9
sNNLS		69.2	44.4	33.1	26.9	24.2	22.8	21.6	21.8	21.4	21.4
ISNNLS		79.0	59.3	44.2	34.7	29.8	26.9	24.8	23.9	23.1	22.4

From the Table 2 we can see that the running time of the NLARS algorithm is significantly longer than the other three algorithms while sNNLS takes the shortest running time. ISNNLS is faster than NLARS but slightly lower than NMP and sNNLS.

Table 2

Average running time of four algorithms(s)

Method	Sparse factor(L)
5	10	15	20	25	30	35	40	45	50
NMP	0.077	0.146	0.214	0.281	0.370	0.449	0.559	0.639	0.812	0.955
sNNLS	0.069	0.121	0.183	0.262	0.377	0.482	0.583	0.730	0.955	1.190
ISNNLS	1.229	1.569	1.925	2.235	2.449	2.683	2.629	2.718	2.778	2.801
NLARS	4.115	5.822	7.000	8.355	8.938	9.602	10.53	10.66	11.87	11.52

4.2 sNMFL₀_W method for face images

In this section, we use sNMFL₀_W and sNMF_L₁ for face feature extraction on ORL and Yale face database, and compare the extraction results and the speed of two algorithms. We perform sNMFL₀_W on ORL and Yale with the sparsity L of basis vectors constrained to, respectively, 40%, 30%, 20% (the ratio of non-zero entries to the total pixels number in the image). For each sparsity, we have trained 25 basis images. In order to compare sNMFL₀_W and sNMF_L₁, we firstly compute averageL₁-sparsity(using formula(5)) of the basis images that trained by sNMFL₀_W, and perform sNMF_L₁ on the same database. It is necessary for two algorithm to achieve the same L₁-sparsity. The sNMFL₀_W were executed for 30 iterations while sNMF_L₁ executed for 3000 iterations in order to ensure convergence of algorithms. We repeated the experiment ten times and computed the L₀-sparsity of the two algorithms, SNR (using formula(8)) and running time of two algorithms.

What can we derive from Figs. 4 and 5 is that the results of the two methods are similar and the extracted features are more localized as the basis images become more sparse. Where SNR took the mean value in the liner domain.

Fig. 4

Facial feature extraction results on ORL database(first row:basis images trained by sNMFL₀_W. Second row:basis images trained by sNMF_L₁. The sparsity of first row is followed by 40% (A), 30% (B), 20% (C), second behavior 59% (D), 49.5% (E), 36% (F)(percentage of non-zero pixels in total pixels)).

Fig. 5

Facial feature extraction results on Yale database(first row:basis images trained by sNMFL₀_W. Second row:basis images trained by sNMF_L₁. The sparsity of first row is followed by 40% (A), 30% (B), 20% (C), second behavior 55.2% (D), 46.3% (E), 36.3% (F)).

In Tables 3 and 4, we see that the SNR values of two algorithms are very close, indicating that the two algorithms can achieve the same reconstruction quality, but sNMFL₀_W used fewer non-zero pixels. Maybe someone will question that the non-zero pixels of the basis images are many, but most pixels of these are extremely small. Thus, we added a column of SNR^* to the table for sNMF_L₁. SNR^* indicates the SNR that 40%, 30%, 20% of largest pixels values are retained, and the rest of the values are set to zero. From the result, we can see that sNMFL₀_W can get better SNR. Moreover, this method is obviously faster than the sNMF_L₁.

Table 3

sNMFL₀ _W and sNMF_ L₁ compared in terms of reconstruction quality, running time, and different sparse measure on ORL database. (L₀-sparsity(percentage of non-zero pixels in total pixels), L₁-sparsity(using formula 5))

Method	L₀ (%)	L ₁	SNR(dB)	SNR*(dB)	Time(s)
sNMF_ L₁	59	0.5	15.16	14.75	1302
sNMFL₀ _W	40	0.5	15.15	610
sNMF_L₁	49.5	0.564	15.05	14.44	1330
sNMFL₀_W	30	0.564	15.02	474
sNMF_L₁	36	0.636	14.84	14.12	1374
sNMFL₀_W	20	0.636	14.82	318

Table 4

sNMFL₀_W and sNMF_L₁ compared in terms of reconstruction quality, running time, and different sparse measure on Yale database

Method	L₀ (%)	L ₁	SNR(dB)	SNR*(dB)	Time(s)
sNMF_L₁	55.2	0.52	41.6	37.62	720
sNMFL₀_W	40	0.52	40.8	501
sNMF_L₁	46.3	0.58	39.2	33.25	752
sNMFL₀_W	30	0.58	39.0	420
sNMF_L₁	36.3	0.64	36.6	28.02	779
sNMFL₀_W	20	0.64	35.4	298

5 Conclusions

In this paper, we introduce L₀ sparseness constraints into NMF to solve the problem of unstable sparseness while extracting part-based representation of data.While L₀ constraints imposed on the coefficient matrix (sNMFL₀_H), the algorithm is divided into two stages: sparse coding and matrix updating. In sparse coding stage, NMF with constraints is a NP-hard problem in general. Therefore, ISNNLS algorithm is proposed to obtain approximate solution. Compared with the existing sparse encoding methods, the proposed method has better performance in data reconstruction, and has potential applications in face recognition and image reconstruction. On the other hand, experimental results of face feature extraction performed in ORL and Yale database show that sNMFL₀_W, having potential applications in the field of facial feature extraction, can lead to a more part-based representation of face images with less running time. In the future, we will further improve the speed of ISNNLS algorithm and extend this method to computer vision.

Footnotes

Acknowledgments

This work is partially supported by National Natural Science Foundation of China; the grant number is 61563037; Outstanding Youth Scheme of Jiangxi Province; the grant number is 20171BCB23057; Natural Science Foundation of Jiangxi Province; the grant number is 20171BAB202018.

References

Lee

and Seung

H.S.

, Learning the parts of objects by non-negative matrix factorization, Nature 401(6755) (1999), 788–791.

Tong

, Zhang

J.L.

, A video watermarking framework resistant to super strong cropping attacks based on NMF with sparseness constraints on parts of the basis matrix, Journal of Electronics and Information Technology 34(8) (2012), 1819–1825.

Xiang

S.J.

and Yang

J.Q.

, NMF-based image hashing algorithm using restricted random blocking, Journal of Electronics and Information Technology 33(2) (2011), 337–341.

Zhao

H.F.

, Li

Q.C.

, Bin

, Image segmentation of white matter based on local Walsh transform and non-negative matrix factorization, Guangdianzi Jiguang/Journal of Optoelectronics Laser 23(7) (2012), 1425–1430.

Park

C.W.

, Park

K.T.

and Moon

Y.S.

, Eye detection using eye filter and minimisation of NMF-based reconstruction error in facial image, Electronics Letters 46(2) (2010), 130–132.

Cai

, He

, Han

, Graph regularized nonnegative matrix factorization for data representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8) (2011), 1548–1560.

Shou

N.A.

, Yoo

and Choi

, Manifold-respecting discriminant nonnegative matrix factorization, Pattern Recognition Letters 32(6) (2011), 832–837.

and Lin

, Non-negative matrix factorisation based on fuzzy K nearest neighbour graph and its applications, Computer Vision 7(5) (2013), 346–353.

Zhang

, Fang

, Tang

Y.Y.

, Topology preserving non-negative matrix factorization for face recognition, IEEE Transactions on Image Processing 17(4) (2013), 574–584.

10.

Turk

and Pentland

, Eigenfaces for recognition, Journal of Cognitive Neuroscience 3(1) (1991), 71–86.

11.

MacQueen

, Some methods for classification and analysis of multivariate observations,(1), pp, Proc 5th Berkeley Symp Math Stat (1967), 281–297.

12.

Donoho

and Elad

, Optimally sparse representation in general(nonortho-gonal) dictionaries via l1 minimization, Proc Acad Sci 100(5) (2003), 2197–2202.

13.

Tropp

, Just relax: Convex programming methods for identifying sparse signals, IEEE Trans Inf Theory 55(2) (2006), 1030–1051.

14.

Vavasis

, On the complexity of nonnegative matrix factorization, SIAM Journal on Optimization 20(3) (2010), 1364–1377.

15.

Morup

, Madsen

and Hansen

, Approximate l0 constrained non-negative matrix and tensor factorization, Proceedings of ISCAS 2008, pp. 1328–1331.

16.

Efron

, Hastie

, Johnstone

and Tibshirani

, Least angle regression, The Annals of Statistics 32(2) (2004), 407–451.

17.

Eggert

and Koerner

, Sparse coding and NMF, pp, International Joint Conference on Neural Networks (2004), 2529–2533.

18.

Yang

and Chen

, Spectral unmixing using nonnegative matrix factorization with smoothed l0 norm constraint, Proceedings of SPIE (2009), p 7494.

19.

Pati

Y.C.

, Rezaiifar

and Krishnaprasad

P.S.

, Orthogonal matching pursuit:recursive function approximation with applications to wavelet decomposition, Systems and Computers (1993), 40–44.

20.

Peharz

and Pernkopf

, Sparse nonnegative matrix factorization with l0-constraints, Neurocomputing 80 (2012), 38–46.

21.

Lawson

, Hanson

, Solving Least Squares Problems, Prentice-Hall, 1974.

22.

Guo

Y.F.

, Shu

T.T.

, Yang

J.Y.

and Li

S.J.

, Feature extraction method based on the generalised fisher discriminant criterion and facial recognition, Pattern Analysis & Applications 4(1) (2001), 61–66.

23.

Lee

D.D.

and Seung

H.S.

, Algorithms for non-negative matrix factorization, Advanced in Neural Information Processing Systems (2001), 556–562.

24.

Hoyer

P.O.

, Non-negative sparse coding, Proceedings of Neural Networks for Signal Processing 2002, pp. 557–565.

25.

Hoyer

P.O.

, Non-negative matrix factorization with sparseness constraints, Journal of Machine Learning Research 5(11) (2004), 1457–1469.

26.

Berry

M.W.

, Browne

, Langville

A.N.

, Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics & Data Analysis 52(1) (2007), 155–173.

27.

Peharz

, Stark

and Pernkopf , Sparse nonnegative matrix factorization using l0-constraints, Proceedings of MLSP (2010), pp. 83–88.

28.

Lin

, Projected gradient methods for nonnegative matrix factorization, Neural Compatation 19(10) (2007), 2756–2779.

29.

Samaria

F.S.

and Harter

A.C.

, Parameterisation of a stochastic model for human face identification, Applications of Computer Vision (1994), 138–142.

30.

Georghiades

A.S.

, Belhumeur

P.N.

and Kriegman

D.J.

, From few to many: Illumination cone models for face recognition under variable lighting and pose, IEEE Transactions on Pattern Analysis and Machine Intelligence 26(6) (2001), 643–660.