Abstract
In order to solve the problem of unstable sparseness of non-negative matrix factorization (NMF), the improved NMF algorithms with L0 sparseness constraints are proposed. With the constraining the L0 norm of the coefficient matrix, we applied inverse matching principle into non-negative least square (ISNNLS) which enhances the reconstruction ability of the decomposition matrix. In addition, the L0 sparseness constraints are added to the basis matrix. In the updating process, the proposed algorithm set the smallest value to zero by projecting the basis vectors onto the closest non-negative vector with the expected sparseness. The experimental results have illustrated that the proposed algorithm can achieve higher reconstruction quality and effectiveness compared with the other algorithms.
Introduction
Non-negative matrix factorization (NMF) [1] was proposed by Lee and Seung in 1999. Given a non-negative data matrix
At present, a few NMF methods based on L0 norm constraints are available. In [15], Morup et al. proposed a novel NMF with approximate L0 constraints that incorporates both least angle regression and selection algorithm with non-negative constraints [16] and normalization invariant updating rules [17]. By introducing the smooth L0 norm constraints into the NMF algorithm,Yang et al. [18] proposed NMF based on smooth L0 norm constraints(NMF_S L0). In this paper, we present two tactics to impose L0 sparseness constraints by constraining on the basis matrix
Here, we use the idea of inverse matching to improve the non-negative least squares algorithm. In the step of matrix update, we can use the multiplicative update rules directly. When the L0 constraints imposed on
Sparse NMF
NMF is a matrix decomposition method for dealing with large-scale and high-dimensional data. Its essence is to map high-dimensional data to low dimensions under the non-negative constraints. In order to make the result of decomposition more sparse and less redundant, a large number of sparse non-negative matrix factorization algorithms have been proposed.
NMF decomposes the data matrix
Lee and Seung have given the multiplicative update rules [23]
In order to enhance the sparsity of NMF, Hoyer presented sparse NMF by introducing the sparseness constraints [24]. The objective function is defined as follows
He also gave the function to measure the sparsity of a vector [25]
Imposing sparse constraints to the NMF model is usually applied to the coefficient matrix
The NMF algorithm generally updates
The NMF with L1 constraints is generally divided into two stages:sparse coding and matrix updating. SNMFL0_H uses the same steps to solve model(6), as shown in Algorithm 1.
Algorithm 1. sNMFL0_H
In coding phase, non-negative sparse coding is a Np-hard problem, and an approximate solution is needed. In the absence of non-negative constraints, orthogonal matching pursuit (OMP) is a common encoding method for its fast convergence and low computational complexity. Peharz et al. proposed a non-negative version matching pursuit (NMP) algorithm in [27]. Robert et al. [28] found the link between the OMP and the NNLS active-set algorithm which has been shown in the algorithm 2, and presented sparse NNLS.
Algorithm 2. Active-Set NNLS
In the above algorithm,
The Algorithm 2 adds basis vectors to the initial empty set. On the contrary, we remove the vector from the optimized non-sparse vector set. We call this method inverse sparse non-negative least squares (ISNNLS) algorithm which is shown in Algorithm 3. In the first step,we get the optimized but non-sparse solution from NNLS. When the number of non-zero items in the coefficient vector is greater than L, the minimum non-zero item in the vector is set to zero. At the same time, we move the index of entry which is minimum in the vector from in-active set P to active set Z. Then, the inner loop (6–10) in algorithm 2 is used to approximate data vector with the remaining vectors in set P.
Algorithm 3. ISNNLS
In the matrix update stage, we allow only the non-zero values of the coefficients to be changed in order to maintain the sparse structure of the coefficient matrix
sNMFL0_W
Hoyer proposed the NMF algorithm based on L1 norm constraints, which introduces the gradient descent method. The algorithm projects the basis vector onto the closest non-negative vector under the constraints of expected L1 sparseness. Inspired by this algorithm, we present sNMFŁ0_W which is shown in algorithm 4. The first step is to calculate an optimal basis matrix
Algorithm 4. sNMFL0_W
Experimental results
This section is divided into two experiments: non negative sparse coding and face feature extraction. In Section 4.2, we constructed sparse data by several sparse coding methods, and compared the effect of several sparse coding, the base vector recognition rate and running time. In Section 4.3, we execute sNMFL0_W and NMF with L1 constraints (sNMFL1) for face feature extraction experiment on ORL and Yale face database, in order to compare the sparse ability, reconstruction quality and speed of the two methods.
Experimental data and environment
The ORL [29] face database consists of 40 individuals, each providing 10 images. Each image has 256 gray levels with 112×92 pixels. The images of this database vary greatly in facial expressions, facial details, and shooting angles.
The Yale [30] face database consists of 15 individuals, each providing 11 different images. All the images was taken down under different lighting conditions and shooting angles with the facial expression changed greatly. Each image has 100×100 pixels. The facial images from the ORL and Yale database are presented in Figs. 1 and 2.

Ten images of the same person in the ORL face database.

Face examples of one person in Yale database.
In this section, we compare several non-negative sparse coding methods by constructing sparse data.We use the 100 dimensions overcomplete basis matrix which contains r ∈ {200, 400, 600} basis vectors. ISNNLS and several encoding methods (NMP [27], NLARS [15], sNNLS [20]) are compared under different number of basis vectors. We use isotropic Gauss noise to generate ten random basis matrices and normalize each vector. Furthermore, the ‘standard’ r × 100 coefficient matrix
Figure 3 has shown the quality of reconstruction of different non-negative sparse coding. The reconstructed quality is measured by Formula (8).

Quality of data reconstructed by different encoding methods.
We can derive from the graph that SNR shows an upward trend with the decrease of sparsity. The SNR curve of NLARS is obviously lower than other algorithms, which shows the worst reconstruction quality. The reconstruction quality of sNNLS is always better than NMP. ISNNLS has always maintained the highest SNR, which perform best.
Table 1 shows the percentage of basis vectors that correctly identified with different coding patterns. It can be seen that the recognition rate of several algorithms decreases with the reduction of sparsity, but the recognition rate of ISNNLS is always the highest under the condition of different sparse degree and the number of basis vectors.
Correct recognition rates of different encoding methods(%)
From the Table 2 we can see that the running time of the NLARS algorithm is significantly longer than the other three algorithms while sNNLS takes the shortest running time. ISNNLS is faster than NLARS but slightly lower than NMP and sNNLS.
Average running time of four algorithms(s)
In this section, we use sNMFL0_W and sNMF_L1 for face feature extraction on ORL and Yale face database, and compare the extraction results and the speed of two algorithms. We perform sNMFL0_W on ORL and Yale with the sparsity L of basis vectors constrained to, respectively, 40%, 30%, 20% (the ratio of non-zero entries to the total pixels number in the image). For each sparsity, we have trained 25 basis images. In order to compare sNMFL0_W and sNMF_L1, we firstly compute averageL1-sparsity(using formula(5)) of the basis images that trained by sNMFL0_W, and perform sNMF_L1 on the same database. It is necessary for two algorithm to achieve the same L1-sparsity. The sNMFL0_W were executed for 30 iterations while sNMF_L1 executed for 3000 iterations in order to ensure convergence of algorithms. We repeated the experiment ten times and computed the L0-sparsity of the two algorithms, SNR (using formula(8)) and running time of two algorithms.
What can we derive from Figs. 4 and 5 is that the results of the two methods are similar and the extracted features are more localized as the basis images become more sparse. Where SNR took the mean value in the liner domain.

Facial feature extraction results on ORL database(first row:basis images trained by sNMFL0_W. Second row:basis images trained by sNMF_L1. The sparsity of first row is followed by 40% (A), 30% (B), 20% (C), second behavior 59% (D), 49.5% (E), 36% (F)(percentage of non-zero pixels in total pixels)).

Facial feature extraction results on Yale database(first row:basis images trained by sNMFL0_W. Second row:basis images trained by sNMF_L1. The sparsity of first row is followed by 40% (A), 30% (B), 20% (C), second behavior 55.2% (D), 46.3% (E), 36.3% (F)).
In Tables 3 and 4, we see that the SNR values of two algorithms are very close, indicating that the two algorithms can achieve the same reconstruction quality, but sNMFL0_W used fewer non-zero pixels. Maybe someone will question that the non-zero pixels of the basis images are many, but most pixels of these are extremely small. Thus, we added a column of SNR * to the table for sNMF_L1. SNR * indicates the SNR that 40%, 30%, 20% of largest pixels values are retained, and the rest of the values are set to zero. From the result, we can see that sNMFL0_W can get better SNR. Moreover, this method is obviously faster than the sNMF_L1.
sNMFL0 _W and sNMF_ L1 compared in terms of reconstruction quality, running time, and different sparse measure on ORL database. (L0-sparsity(percentage of non-zero pixels in total pixels), L1-sparsity(using formula 5))
sNMFL0_W and sNMF_L1 compared in terms of reconstruction quality, running time, and different sparse measure on Yale database
In this paper, we introduce L0 sparseness constraints into NMF to solve the problem of unstable sparseness while extracting part-based representation of data.While L0 constraints imposed on the coefficient matrix (sNMFL0_H), the algorithm is divided into two stages: sparse coding and matrix updating. In sparse coding stage, NMF with constraints is a NP-hard problem in general. Therefore, ISNNLS algorithm is proposed to obtain approximate solution. Compared with the existing sparse encoding methods, the proposed method has better performance in data reconstruction, and has potential applications in face recognition and image reconstruction. On the other hand, experimental results of face feature extraction performed in ORL and Yale database show that sNMFL0_W, having potential applications in the field of facial feature extraction, can lead to a more part-based representation of face images with less running time. In the future, we will further improve the speed of ISNNLS algorithm and extend this method to computer vision.
Footnotes
Acknowledgments
This work is partially supported by National Natural Science Foundation of China; the grant number is 61563037; Outstanding Youth Scheme of Jiangxi Province; the grant number is 20171BCB23057; Natural Science Foundation of Jiangxi Province; the grant number is 20171BAB202018.
