A strategy to estimate the optimal low-rank in incremental SVD-based algorithms for recommender systems

Abstract

Recommender systems apply machine learning and data mining techniques for filtering unseen information, and they can provide an opportunity to predict whether a user would be interested in a given item. The main types of recommender systems are collaborative filtering (CF) and content-based filtering, which suffer from scalability and data sparsity resulting in poor quality recommendations and reduced coverage. There are two incremental algorithms based on Singular Value Decomposition (SVD) with high scalability for recommender systems which are named the incremental SVD algorithm and incremental Approximating the Singular Value Decomposition (ApproSVD) algorithm. In both mentioned methods, the estimated value of rank for approximating the recommender systems’ data matrix is chosen experimentally in the related literature. In this paper, we investigate the role of singular values for estimating a more reliable amount of rank in the mentioned dimensionality reduction techniques to improve the recommender systems’ performance. In other words, we offered a strategy for choosing the optimal rank that approximates the data matrix more accurately in incremental algorithms with the help of singular values. The numerical results illustrate that the suggested strategy improves the accuracy of the recommendations and run times of both algorithms when employs for Movielens, Netflix, and Jester dataset.

Keywords

Recommender systems collaborative filtering singular value decomposition (SVD)incremental algorithm dimension reduction techniques non-negative matrix factorization optimal rank

1. Introduction

1.1 Related works

Recommender Systems are software tools and techniques for assisting in the prediction of the users’ opinion to suggest to them the most appropriate item of interest. The aim is to support users in various decision-making processes, such as what items to buy, what news to read, or what music to listen. Recommender systems rely on different types of input data, which are often placed in a matrix with one dimension representing users and the other dimension representing items, which includes explicit input by users regarding their interest in products. We refer to explicit user feedback as ratings. Usually, explicit feedback composes a sparse matrix, due to the fact that there is the possibility of some users have rated only a small range of items.

Recommender systems are valuable means for on-line users to cope with information overload. During the last decade, various techniques for generating recommendations have been proposed, and many of them have been successfully spread in commercial environments. That is why recommender systems have become one of the most powerful and popular tools in electronic commerce. There are two main types of recommender systems: collaborative filtering (CF) and content-based filtering. The principle of collaborative filtering is to recommend the active user the items that other users with similar tastes liked in the past. The similarity in the taste of two users is calculated based on the similarity in the rating history. The common challenge of collaborative filtering and other types of recommender systems is dealing with massive data and sparsity to make accurate recommendations. Hence, the size of a dataset and its information sufficiency should be concidered to make accurate predictions efficiently. In addition, data arrives in no particular order as new rows and columns. The existing and new data may be added, changed, or retracted in any order. The ultimate size of the data matrix is unknown. The collected data will typically be very sparse, but missing values cannot be presumed to be zeros. Several approaches have been proposed to remedy the sparsity and scalability problems associated with the CF, such as supervised classification techniques [2], unsupervised clustering techniques [13], and dimensionality reduction approaches, which include many methods such as singular value decomposition [18], low-rank approximation using matrix factorization techniques [17, 10], and principal component analysis [8].

The use of SVD as a tool to improve collaborative filtering has been known for some time. It is a powerful technique for dimensionality reduction. The reduced orthogonal dimensions resulting from SVD are less noisy than the original data and keep the latent associations among the terms and documents. Earlier work took advantage of this semantic property to reduce the dimensionality of feature space [13]. The task is to compute the best running estimation of a rank-k SVD from the actual data matrix, without any storage or caching of incoming data, and make recommendations [3]. As Sarwar described in [15], one of the significant advantages of SVD is that there are incremental algorithms to compute an approximated decomposition. This allows accepting new users or ratings without having to recompute the model that had been built from previously existing data. The same idea was later extended and formalized by Brand [3] into an on-line SVD model. The use of incremental SVD methods has become a commonly accepted approach after its success in the Netflix Prize. In [14], Sarwar has introdused the incremental SVD algorithms, for recommender systems, but for determining the optimal value of rank $k$ various experiments performed for finding the correct predictions with smaller errors. Later in [19], ApproSVD algorithm introduced, which before applying the SVD technique, the matrix’s columns rearranged based on their amount of none-zero elements. In [18], the incremental ApproSVD algorithm was introduced, while thre is a comparison between teh behavior of two incremental SVD-based algorithms. Based on the results in [18], when the number of rank $k$ had increased, the amount of MAE and RMSE in incremental ApproSVD algorithm had reduced, while they had risen in incremental SVD algorithms. Moreover, the run time of both algorithms take longer with increase of estimated rank $k$ .

1.2 Contribution

Previous studies [14, 18, 19] have illustrated that incremental SVD-based algorithms are successful in accuracy and scalability, but in all researches the optimal value of the dimension reduction matrix $k$ has been chosen experimentally. In this paper, we have been inspired by a technique for selecting the optimal value of $k$ in the SVD-based initialization strategy of NMF, named by SVD-NMF in [12]. By applying this strategy, we do not lose much information for prediction in recommender systems. It assists in eliminating unnecessary information by means of singular values’ properties. In other words, for a large size of the recommender matrix, we compute the appropriate approximate matrix based on the behavior of singular values. Our numerical results on the Movielens, Netflix, and Jester datasets demonstrate the proficiency of this strategy in approximating the data matriced in the incremental method.

1.3 Organization

The rest of the paper is organized as follows: Some preliminaries associated with basic mathematics definitions have been described in the next section. Section 3 explores the incremental SVD-based algorithm and the primary strategy for choosing the optimal rank in the original data matrix. Finally, the numerical results of our research have been discussed in Section 4.

2. Preliminaries

2.1 Low-rank approximation

The Low-rank matrix approximation estimates a matrix by one whose rank is less than that of the original matrix. The aim is to compute more compact representations of the data with limited loss of information [9]. Let $A$ be $m\times n$ matrix, then the Low-rank approximation (rank $k$ ) of $A$ is given by

$\displaystyle A_{m\times n}\thickapprox B_{m\times k}C_{k\times n}.$

It can be stored and manipulated more economically than the matrix itself. In the above approximation, only $k(m+n)$ elements have to be kept instead of $m n$ elements of the original matrix $A$ . A reduced system model could provide a near approximation of the original system. One standard method among various approaches for model reduction, feasibility reconstruction, noise removal, Etc. is to replace a lower-dimensional matrix computed via subspace approximation instead of the original data matrix. That is why the use of low-rank approximations plays an essential role in a wide range of applications [1]. The list of applications contains image processing, statistical data analysis, noise reduction, data mining, regularization for ill-posed problems, principal component analysis (PCA), machine learning, and so on.

Singular value decomposition (SVD) provides the true rank and gives the best low-rank approximation of a matrix, and it has vast applications in many areas [9]. The other successful method is Non-negative Matrix Factorization (NMF). NMF characterizes both items and users by vectors of factors inferred from item rating patterns in their basic form.. These methods have become popular in recent years by combining good scalability with predictive accuracy. They offer much flexibility for modeling various real-life situations.

2.2 Matrix completion

One strength of matrix factorization is that it allows the incorporation of additional information. When explicit ratings are not available, recommender systems can infer user preferences by using implicit ratings, which indirectly reflects their opinions by observing user’s behavior compared to other users’ search patterns or purchase history. High-rank matrix completion is the problem of (approximately) recovering a data matrix from very few observed entries. It has wide applications in machine learning, especially in on-line recommendation systems [4]. There are some sensible approaches for filling the missing values before applying the SVD, leading to a significant performance increase. Usually, the missing values in the user-item rating matrix are filled by zero. This approach is straightforward and efficient in computation, which makes it very common. However, this approach does not consider the data variance’s underlying correlation structure affecting the data variance, which is generally high. Subsequently, if we have a large number of missing values, this imputation approach can result in inaccurate recommendations. The list of other imputation methods includes filling the rating matrix by random numbers, normal distribution, uniform distribution, item or user average, SVM classifier, and so on [7]. In this paper, we applied user average instead of missing values in the data matrix before computing SVD decomposition.

2.3 Singular value decomposition (SVD)

The core of the SVD algorithm lies in the following theorem.

(Singular Value Decomposition (SVD)).

Let $A\in\mathbb{R}^{m\times n}$ with $\textit{rank}(A)=r$ . Then the eigenvalues of $n\times n$ symmetric matrix $A^{T}A$ are real and non-negative. Let these eigenvalues be denoted by $\sigma_{i}^{2}$ , where $\sigma_{1}^{2}\geqslant\sigma_{2}^{2}\geqslant\cdots\geqslant\sigma_{n}^{2}$ then $\sigma_{1},\cdots,\sigma_{n}$ are called the singular values of $A$ . Every $m\times n$ matrix $A$ can be decomposed into

$\displaystyle A=U\Sigma V^{T},$ (1)

where $U_{m\times m}$ and $V_{n\times n}$ are orthogonal and $\Sigma$ is an $m\times n$ rectangular diagonal matrix with $r$ non-zero elements, which are the non-zero singular values of $A$ . This decomposition is called Singular Value Decomposition or SVD [16].

2.4 Truncated singular value decomposition (TSVD)

The truncated SVD algorithm is used to compute a small number of singular values instead of calculating all the singular values of a matrix. By means of TSVD, we can calculate an approximation of a matrix using less data than the original matrix. In other words, If we discard all but the $k$ largest singular values and the corresponding singular vectors, the product of the resulting matrices $A\approx U_{k}\Sigma_{k}{V_{k}}^{T}$ is the best rank-r approximation of $A$ . For this reason, the resulting matrix $A_{k}$ is an excellent reduced-dimension representation of matrix $A$ .

2.5 Eckart-young theorem

(Eckart-Young [5]).

Let $A_{k}$ be the rank- $k$ approximation of $A$ achieved by SVD-truncation as above. Then $A_{k}$ is the closest rank- $k$ matrix to $A$ , i.e.

$\displaystyle\min\|A-B\|_{F}=\|A-A_{k}\|_{F}=\sqrt{\sigma_{k+1}^{2}+\cdots+% \sigma_{r}^{2}}$ (2)

where $B$ is a rank- $k$ matrix, and $\|.\|_{F}$ is Frobenius norm. Hence, the minimal error is given by the Euclidean norm of the singular values, which zeroed out in the process in [9].

.

SVD also gives the best low rank approximation in spectral norm:

$\displaystyle\|A-A_{k}\|_{2}=\min_{\textit{rank}{(B)}=k}\|A-B\|_{2}=\sigma_{k+% 1}.$ (3)

2.6 Non-negative matrix factorization (NMF)

Often the analyzing data is non-negative, and the low-rank data are further required to be comprised of non-negative values to avoid physical realities contradiction. Classical approaches cannot guarantee to keep the non-negativity. The method for finding reduced rank non-negative factors to approximate a given non-negative matrix is the so-called non-negative matrix factorization (NMF) problem which can be defined in general form as follows:

(NMF problem).

Let $A\in\mathbb{R}^{m\times n}_{+}$ a be a non-negative matrix, there are two matrices $W\in\mathbb{R}^{m\times k}_{+}$ and $H\in\mathbb{R}^{k\times n}_{+}$ , such that:

$\displaystyle A\approx WH,$

where $W$ and $H$ are called basis matrix and coefficient matrix respectively, and $k$ is the rank of factorization.

Note that these two low-rank matrices’ dimension is vital in estimating the most optimal approximation of $A$ . On the one hand, small $k$ is needed for approximating the matrix $A$ , and on the other hand, the accuracy of this approximation is dependent on larger $k$ . Almost all researchers consider $k$ as different numbers at the beginning of algorithms. Hence it is crucial to find a strategy to choose optimal $k$ , sufficiently smaller than $\min\{m,n\}$ [12].

2.7 Principal component analysis

In recent years, the amount of data that requires to be analyzed or stored has constantly been increasing. Fortunately, high-dimensional data can often be modeled as a low-dimensional subspace of the original dimension. Under this assumption, an approximate representation can be discovered via a principal component analysis (PCA). PCA is a way of identifying patterns in data and expressing them in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in high-dimension data, where the graphical representation is not available, PCA is a powerful tool for analyzing data. The other main advantage of PCA is that once the data patterns have been found, they can be compressed. In other words, the dimension of data can be reduced without much loss of information. PCA is almost the same as the SVD; however, before computing singular vectors, we have to subtract the mean from each row of $A(a_{i}\rightarrow a_{i}-\frac{1}{n}\sum_{j=1}^{n}a_{j})$ . The right singular vectors of the resulting matrix are called the principal components of A.

3. Incremental algorithms

A Recommender system consists of two primary entities: users $U$ and items $I$ , where users provide their opinions (ratings) about items. A demonstration of two-dimensional prediction mapping function of a recommender system is

$\displaystyle R^{r}_{\textit{User}\times\textit{Item}}:U\times I\rightarrow% \textit{rating}.$ (4)

which $r$ is the opinion of user $U$ for the item $I$ . This function can be converted into a user-item matrix $R$ with rating entries. The recommender system’s task is to find the entries in $R$ for which users did not rate and to predict their opinions about new items to offer the most match item to their interests. Thus, dimension reduction techniques can be used for predicting unknown elements. As the prediction accuracy of a recommender system depends heavily on the available number of ratings, it would suffer when the rating data is sparse.

There are two separate steps in the performance of collaborative filtering (CF) type of recommender systems, the first step, which is called off-line or model building, is usually neighborhood formation or investigating the similarity of user-user or item-item, and the second step is the on-line or execution process, which the actual prediction production can be considered as the on-line step. If there is no great change in the rating matrix in a short period frequently, recommender systems may compute off-line steps once a day or a week. Researchers have shown that SVD-based dimensional reduction algorithms can estimate better the data matrix in the off-line step for finding similarities among the data [13, 15]. SVD-based models perform pretty excellent in the on-line step with a run time of $\mathcal{O}(m^{1})$ ; however, the off-line part of these algorithms is time-consuming with a run time of $\mathcal{O}(m^{3})$ [14, 7]. Scholars overcame the expensive computation of SVD by introducing some incremental SVD-based algorithms with more accurate predictions and faster running time. The incremental algorithms ensure highly scalable overall performance, which plays a significant role in the user-item matrix’s dynamic property. In the off-line stage, which is computationally intensive and performs only once, after performing SVD algorithm on data matrix $R_{1}$ , three matrices $U_{1}$ , $\Sigma_{1}$ and $V_{1}$ are produced. When the new data columns $R_{2}$ enters, the data matrix $R_{1}$ will be updated to $R^{\prime}=[R_{1},R_{2}]$ . In this stage, by applying the incremental algorithm on the updated matrix, three matrices $U_{2}$ , $\Sigma_{2}$ , and $V_{2}$ will be computed by means of $U_{1}$ , $\Sigma_{1}$ , and $V_{1}$ [18]. The complete information about the performance of the incremental SVD algorithm is described in Section 3.2.

Besides being time-consuming, SVD requires complete data. In many experimental settings, some parts of the measurement matrix may be missing, contaminated, or untrusted, so imputation techniques are applied for overcoming these problems [6]. As we have mentioned in Section 2.2, we replaced the users’ average ratings instead of missing values, which were already filled by zeros, before the performance of SVD.

3.1 Choosing optimal rank

k

As we mentioned before, one of the significant challenges with data matrix in recommender systems is the time-consuming computation in the off-line procedure of incremental SVD-based algorithms. One of the solutions is reducing the dimension of the data matrix with factorization methods. Hence, we should find an appropriate method for computing the nearest rank to the original matrix in the algorithm. This strategy has been inspired by the choosing rule for SVD-NMF in [12] and assists us to find the most optimal amount of rank $k$ in a shorter time, for approximating $R$ in the incremental-based algorithms, while in most recommender algorithms, researchers consider the optimal rank $k$ , experimentally. According to Eq. (1), the diagonal entries of $\Sigma$ are arranged in descending order. In many cases, the first few singular values account for over 90% of all singular values. In other words, $k$ can be considered a rank of the factorization, which maintains 90% of the information of all singular values. Moreover, $k$ should satisfy the basic rule $(m+n)k<mn$ . On the one hand, to reduce the dimension of the rating matrix $R$ , we need $k$ to be small; on the other hand, for obtaining an accurate approximation, it is better $k$ considered large enough.

Suppose $R\in\mathbb{R}^{m\times n}$ is a user-item matrix and $\textit{rank}(R)=r$ , by applying SVD decomposition on matrix $R$ three matrices $U\in\mathbb{R}^{m\times m}$ , $\Sigma\in\mathbb{R}^{m\times n}$ and $V\in\mathbb{R}^{n\times n}$ will be obtained. In order to find the nearest low ranked matrix with rank $k$ to $R$ , we consider that the singular values of $R$ obtained by SVD are sorted in descending order, and also according to principal component analysis (PCA), in high dimensional data, the sum of the first few singular values contains a large proportion of information about all singular values. This proportion is chosen 0.9 that contains enough information of singular values, so it leads to more accuracy in approximation. Therefore, in Eq. (1), the sum of all non-zero diagonal entries for $\Sigma$ is $\textit{sum}_{r}=\sigma_{1}+\sigma_{2}+\cdots+\sigma_{r}$ , and then the number of singular values, which accounts for 90% of all non-zero diagonal entries is chosen and that is $\textit{sum}_{k}=\sigma_{1}+\cdots+\sigma_{k}$ , so the rule for choosing the nearest optimal rank $k$ is:

$\displaystyle\frac{\textit{sum}_{k}}{\textit{sum}_{r}}<0.9,\text{ and }\frac{% \textit{sum}_{k+1}}{\textit{sum}_{r}}\geqslant 0.9.$ (5)

This idea is meaningful due to the fact that the non-zero elements of $\Sigma$ are the square root of non-negative eigenvalues of matrix $RR^{T}$ , then if $m$ , $n$ are the number of rows and columns of matrix $R$ , respectively, $r\leqslant\min\{m,n\}$ is satisfied. After 90% of components have been extracted by the Eq. (5), we get $k\ll r$ . Thus, we apply this strategy for finding the nearest matrix to the original data matrix with a higher dimension in the following incremental SVD-based algorithms.

3.2 Optimal incremental SVD algorithm

Let the matrix $R\in\mathbb{R}^{m\times n_{1}}$ be the user-item matrix of a recommender system, and the matrices $U$ , $\Sigma$ , and $V$ have been computed from SVD of $R$ . Then, from Eq. (5), by using the singular values in $\Sigma$ , we can estimate the rank of $k$ to approximate $R$ . Also, suppose that the matrix $R_{1}\in\mathbb{R}^{m\times n_{2}}$ is some new items to be added to the matrix $R$ . Thus we can have a new matrix $R^{\prime}$ as

$\displaystyle R^{\prime}:=[R,R_{1}],$ (6)

where $R^{\prime}\in\mathbb{R}^{m\times(n_{1}+n_{2})}$ is the updated matrix, which contains the original matrix $R$ . Now we can write the Eq. (6) as

$\displaystyle R^{\prime}=[R,R_{1}]=[U\Sigma V^{T},R_{1}]=U\underbrace{[\Sigma,% U^{T}R_{1}]}_{F}\left[\begin{array}[]{c c}V^{T}&0\\ 0&I\\ \end{array}\right]=UF\left[\begin{array}[]{c c}V^{T}&0\\ 0&I\\ \end{array}\right].$

Now by considering optimal rank $k$ , and using TSVD to decompose matrix $F$ , we have

$\displaystyle R^{\prime}=U(U_{F}{\Sigma}_{F}V_{F}^{T})\left[\begin{array}[]{c % c}V^{T}&0\\ 0&I\\ \end{array}\right]=(UU_{F}){\Sigma}_{F}\left(\left[\begin{array}[]{c c}V&0\\ 0&I\\ \end{array}\right]V_{F}\right)^{T},$

Therefore, the SVD of $R^{\prime}$ is calculated as

$\displaystyle R^{\prime}=U_{R^{\prime}}{\Sigma}_{R^{\prime}}V_{R^{\prime}}^{T}.$

where $U_{R^{\prime}}:=UU_{F}$ , ${\Sigma}_{R^{\prime}}:={\Sigma}_{F}$ and $V_{R^{\prime}}:=\left[\begin{array}[]{c c}V&0\\ 0&I\\ \end{array}\right]V_{F}$ , while $U_{R^{\prime}}\in\mathbb{R}^{m\times k}$ , ${\Sigma}_{R^{\prime}}\in\mathbb{R}^{k\times k}$ , and $V_{R^{\prime}}\in\linebreak\mathbb{R}^{({n_{1}}+n_{2})\times k}$ . Since the matrices $U_{R^{\prime}}$ , $V_{R^{\prime}}$ are orthogonal, it provides the opportunity that the algorithm continues updating the matrix $R^{\prime}$ in a similar process. Let $R_{2}\in\mathbb{R}^{m\times n_{3}}$ is the new matrix which is folding in to $R^{\prime}$ . Therefore, we can write

$\displaystyle R^{\prime\prime}:=[R,R_{1},R_{2}]=[R^{\prime},R_{2}],$

where $R^{\prime\prime}\in\mathbb{R}^{m\times(n_{1}+n_{2}+n_{3})}$ is the updated matrix including $R^{\prime}$ . By continuing the same process for $R^{\prime\prime}$ , we get

$\displaystyle R^{\prime\prime}=U_{R^{\prime\prime}}{\Sigma}_{R^{\prime\prime}}% V_{R^{\prime\prime}}^{T}.$

Notice that, the matrices $U_{R^{\prime\prime}}$ , $V_{R^{\prime\prime}}$ are orthogonal too [18]. The process of optimal incremental SVD algorithm is shown in Algorithm 3.2.

InputInput OutputOutput ReturnReturn Optimal incremental SVD algorithmmatrices $R_{1}\in\mathbb{R}^{m\times n_{1}}$ , $R_{2}\in\mathbb{R}^{m\times n_{2}}$ ; matrices $U_{k}\in\mathbb{R}^{m\times k}$ , ${\Sigma}_{k}\in\mathbb{R}^{k\times k}$ , $V_{k}\in\mathbb{R}^{({n_{1}}+n_{2})\times{k}}$ . Use SVD on matrix $R_{1}$ , and compute the three matrices $U_{1}$ , ${\Sigma}_{1}$ , $V_{1}$ , and the rank $r$ of $R_{1}$ ;Estimate rank $k$ by means of the optimal formula: $\frac{\textit{sum}_{k}}{\textit{sum}_{r}}<0.9$ , and $\frac{\textit{sum}_{k+1}}{\textit{sum}_{r}}\geqslant 0.9$ , while $k\in\mathbb{Z}^{+}$ satisfies $1\leqslant k\leqslant\min\{m,n_{1}+n_{2}\}$ ; $F\leftarrow[{\Sigma}_{1},U_{1}^{T}R_{2}]$ , use SVD to decompose $F$ . While keeping the rank- $k$ approximation, we get three matrices $U_{F}$ , ${\Sigma}_{F}$ , $V_{F}$ ; $U_{k}\leftarrow U_{1}U_{F}$ ; ${\Sigma}_{k}\leftarrow{\Sigma}_{F}$ ; $b\leftarrow\textit{size}({\Sigma}_{1},2)$ ; $b<n_{1}$ $V_{k}\leftarrow[V_{1},\textit{zeros}(n_{1},n_{2});\textit{zeros}(n_{2},b),% \textit{eye}(n_{2})].V_{F}$ $V_{k}\leftarrow[V_{1},\textit{zeros}(n_{1},n_{2});\textit{zeros}(n_{2},n_{1}),% \textit{eye}(n_{2})].V_{F}$ User eigenvector of $[R_{1},R_{2}]$ , i.e. $U_{k}\sqrt{{\Sigma}_{k}}(i)$ , and item eigenvector of $[R_{1},R_{2}]$ , i.e. $\sqrt{{\Sigma}_{k}}V_{k}^{T}(j)$ .

[b] InputInput OutputOutput ReturnReturn Optimal incremental ApproSVD algorithm

matrices $R_{1}\in\mathbb{R}^{m\times n_{1}}$ , $R_{2}\in\mathbb{R}^{m\times n_{2}}$ ; parameters $c_{1},c_{2}\in\mathbb{Z}^{+}$ satisfying $c_{1}\leqslant n_{1}$ , $c_{2}\leqslant n_{2}$ ; for $R_{1}$ , column sampling probabilities $\{p_{i}\}_{i=1}^{n_{1}}$ satisfying $p_{i}\geqslant 0$ and $\sum_{i=1}^{n_{1}}p_{i}=1$ ; for $R_{2}$ , column sampling probabilities $\{p^{\prime}_{j}\}_{j=1}^{n_{2}}$ satisfying $p^{\prime}_{j}\geqslant 0$ and $\sum_{j=1}^{n_{2}}p^{\prime}_{j}=1$ .

$H_{k}\in\mathbb{R}^{m\times k}$ . $t=1\rightarrow c_{1}$ Pick $i_{t}\in 1,\ldots,n_{1}$ under sampling probabilities $p_{\alpha},\alpha=1,\ldots,n_{1}$ ( $i_{t}$ denotes the column index of $R_{1}$ ); $C_{1}^{t}\leftarrow R_{1}^{(i_{t})}/\sqrt{c_{1}p_{i_{t}}}$ (a column vector $R_{1}^{(i)}$ denotes the $i-th$ column of $R_{1}$ ); $t=1\rightarrow c_{2}$ Pick $j_{t}\in 1,\ldots,n_{2}$ under sampling probabilities $p^{\prime}_{\alpha}$ , $\alpha=1,\ldots,n_{2}$ ( $j_{t}$ denotes the column index of $R_{2}$ ); $C_{2}^{t}\leftarrow R_{2}^{(j_{t})}/\sqrt{c_{2}p^{\prime}_{j_{t}}}$ (a column vector $R_{2}^{(j)}$ denotes the $j-th$ column of $R_{2}$ );

The Inputs of Algorithm (3.2) $R_{1}$ and $R_{2}$ can be replaced by $C_{1}$ and $C_{2}$ . Applying the optimal strategy $\frac{\textit{sum}_{k}}{\textit{sum}_{r}}<0.9$ , and $\frac{\textit{sum}_{k+1}}{\textit{sum}_{r}}\geqslant 0.9$ , for choosing parameter $k\in\mathbb{Z}^{+}$ , which satisfies $1\leqslant k\leqslant\min\{m,c_{1}+c_{2}\}$ . Skip the steps 6–11 in algorithm 1, and use the obtained $U_{k}$ ; $H_{k}\leftarrow U_{k}$ ; $H_{k}\in\mathbb{R}^{m\times k}$ .

3.3 Optimal incremental ApproSVD algorithm

The ApproSVD approach has been suggested in [19], and incremental ApproSVD has been introduced in [18], which can solve the scalability challenge in a recommender system. In this method, the most important facet is choosing sampling probabilities of the data matrix’s columns and newly added matrix. Suppose that the chosen column sampling probabilities are $p_{i}=\textit{nnz}(R^{i}_{1})/\textit{nnz}(R_{1})$ , for $i=1,\ldots,n_{1}$ , and $p^{\prime}_{j}=\textit{nnz}(R^{j}_{2})/\textit{nnz}(R_{2})$ , for $j=1,\ldots,n_{2}$ , where $\textit{nnz}(R_{1}^{i})$ and $\textit{nnz}(R_{2}^{j})$ denote the number of non-zero elements in the $i$ -th and $j$ -th columns of matrices $R_{1}$ and $R_{2}$ , respectively. Then we rearrange the columns of $R_{1}$ and $R_{2}$ based on their amount of corresponding probabilities from the largest values to the smallest values. For instance, if the $i$ th column of $R_{1}$ has the largest sampling probability value $p_{i}$ , it comes first, and after that the $j$ th column, which has the second largest probability, considers as second column, and so on, until we pick $c_{1}$ column of $R_{1}$ to construct the matrix of $C_{1}$ . We implement the same process for finding matrix of $C_{2}$ from probabilities of $R_{2}$ . Finally, after obtaining the matrices $C_{1}$ and $C_{2}$ , the optimal incremental SVD process will be performed by using these two matrices. The process of incremental ApproSVD algorithm is described in Algorithm 3.2.

4. Experimental evaluation

This section describes the experimental validation of choosing optimal $k$ by the proposed method Eq. (5) in two incremental SVD-based algorithms. We discuss about our experimental platforms, such as the datasets, the evaluation metric, and the computational environment. Then, we describe and compare our numerical results with [18].

4.1 Datasets

The first dataset for performing our experiments is the MovieLens dataset [20]. Each week, hundreds of users visit MovieLens to rate and receive recommendations for movies. The dataset can be converted into a user-item matrix with 943 rows (users) and 1682 columns (movies), in which approximately 6.3% of entries are filled. The sparsity of a dataset is calculated by $(1-\frac{\textit{non zero entries}}{\textit{all possible zero entries}})$ . The matrix has 100000 ratings, and all unrated items have a value of zero. The ratings range from 1 to 5, where 1 represents dislike, and 5 represents a strong preference.

The second dataset is Netflix [21], and we just have performed our experiments on a small part of the dataset. It is a user-item matrix with 930 rows and 241 items, in which there are 1.5% of ratings. The Netflix data matrix has 3250 ratings, and like Movielens, the unrated items get zero values. The rating range is between 1 and 5, where 1 shows the lowest interest and 5 represents a strong interest.

Last but not least is the Jester dataset [22]. This data matrix is different from two other datasets, the ratings are between $-$ 10 and 10, which $-$ 10 shows the slightest interest, and 10 illustrates the highest level of interest. The unrated items get the amount of 99; therefore, the Jester dataset is not a sparse matrix. We have chosen 1500 rows and 100 columns of the Jester dataset for performing both algorithms.

4.2 Feature extraction and selection

First of all, we have tried to fill the sparse matrix by using imputation methods for Movielens and Netflix datasets. Thus the average amount of ratings in each row (user’s average imputation method) replaced instead of zeros in the same row of the data matrix. Then, rating the entries in the dataset is randomly divided into five partitions to execute five-fold cross validation. In other words, all ratings were evenly divided into five disjoint folds and applied four folds together to train our algorithm, and use the remaining fold as a test set to evaluate the performance. This process is repeated five times; therefore, we used each fold as a test set once. Algorithms performed on training data to make predictions for the test set. The data used to test the model is called the validation set, while the data used to create the model is called the training set. For each of these pairs, the model is trained on the training set and validated on the validation set.

The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are applied to measure the closeness of predicted ratings and genuine ratings as evaluation scales. Let $r_{ui}$ and $\hat{r}_{ui}$ are the actual rating and predicted rating on item $i$ by user $u$ , respectively, and there are $n$ rating-prediction pairs; therefore, MAE and RMSE are given by these expressions

$\displaystyle\textit{MAE}=\frac{\sum_{i=1}^{n}|r_{ui}-\hat{r}_{ui}|}{n},$ (7) $\displaystyle\textit{RMSE}=\sqrt{\frac{\sum_{i=1}^{n}(r_{ui}-\hat{r}_{ui})^{2}% }{n}}.$ (8)

The computed values of MAE and RMSE demonstrate the accuracy of the algorithm.

4.3 Numerical results and comparison

All the experiments were executed in MATLAB R2017a, running on a Windows 8.1 with Intel Pentium Core i5 with 4GB RAM. For the Movielens dataset, as we are mainly comparing the results of our experiment with the numerical results in [18], we have considered the same numerical conditions in [18] for both algorithms while applying optimal rank $k$ strategy in Eq. (5). The column sampling probabilities in incremental ApproSVD algorithm for Movielens and Netflix datasets are chosen as $p_{i}=\textit{nnz}(R_{1}^{(i)})/\textit{nnz}(R_{1}),i=1,\ldots,n_{1}$ and $p^{\prime}_{j}=\textit{nnz}(R_{2}^{(j)})/\textit{nnz}(R_{2}),j=1,\ldots,n_{2}$ , while for Jester dataset, we have found the probabilies of each column bsed on $\frac{\textit{number of rated items}}{\textit{number of unnrated items}}$ . Then, the columns have scaled to including $C_{1}$ and $C_{2}$ . Finally, the matrix $H_{k}$ whose columns are the left singular vectors of $[C_{1},C_{2}]$ has obtained. So we can predict the rating of user $u$ on the movie $i$ by $H_{k}H_{k}^{T}[R_{1},R_{2}](u,i)$ . If the number of calculated ratings is out of the bound of valid ratings in all datasets, it changes to the lowest or highest rating. For instance, instead of values less than 1 and more than 5 in the Movielens dataset, we set 1 and 5, respectively.

Tables 1 and 2 represent the amounts of MAE, RMSE, and running time of incremental SVD-based algorithms. Both tables are divided into two main parts for comparison. The first part of Tables 1 and 2 are related to numerical execution of optimal incremental SVD and optimal incremental ApproSVD, respectively. The second part of the tables is the numerical results in [18]. Therefore, we have chosen the same column size as [18] for presenting our strategy’s efficiency in choosing the most appropriate rank. In addition, the reason behind the larger amount of RMSE than MAE is that the calculated errors are squared before being summed in Eq. (8).

Table 1
Comparing the performance of optimal incremental SVD and regular incremental SVD algorithms [18], by using Movielens dataset

$n_{1}$	$n_{2}$	$k$	RMSE	MAE	Time (sec)	$k$	RMSE	MAE	Time (sec)
		Optimal incremental SVD				Incremental SVD
500	50	286	0.8365	0.5588	2.126867	10	0.9511	0.7530	810.88
600	50	325	0.8703	0.6042	2.788316	10	0.9502	0.7522	1016.88
700	50	357	0.9126	0.6645	3.557209	10	0.9500	0.7530	1038.56
800	50	382	0.9376	0.6996	4.455154	10	0.9507	0.7537	1043.09
800	50	382	0.9376	0.6996	4.455154	100	0.9779	0.7537	1057.94
800	50	382	0.9376	0.6996	4.455154	400	0.9963	0.7949	1310.70
800	50	382	0.9376	0.6996	4.455154	600	0.9971	0.7537	1033.09

In Table 1, the first column is the column size of the original matrix $R_{1}$ , and we have selected four values $n_{1}=$ 500, 600, 700 and 800. The second column is the column size of added matrix $R_{2}$ in incremental SVD and we have selected the column size $n_{2}=$ 50. The third column is the eigenvector’s dimension or optimal amount of $k$ by using the introduced method in Eq. (5), instead of choosing experimentally. The fourth, fifth, and sixth columns are the amount of RMSE, MAE, and running time of our optimal incremental SVD algorithm, respectively, compared to results of [18], in the rest of the columns in the table. From Table 1, by observing the performance of optimal incremental SVD, we can see that the amounts of RMSE, MAE, and running time have increased with the rise of column size $n_{1}$ , Although compared to results of regular incremental SVD the amount of RMSE, MAE, and running time have decreased significantly by using Eq. (5) instead of choosing the approximation rank $k$ experimentally.

Table 2

Comparing the performance of optimal incremental ApproSVD and regular incremental ApproSVD algorithms [18], by using Movielens dataset

$n_{1}$	$n_{2}$	$c_{1}$	$c_{2}$	$k$	RMSE	MAE	Time (sec)	$k$	RMSE	MAE	Time (sec)
				Optimal incremental ApproSVD				Incremental ApproSVD
900	100	500	50	287	0.9648	0.7391	8.221602	10	0.9673	0.7686	748.37
900	100	600	50	323	0.9662	0.7406	8.962699	10	0.9643	0.7658	750.51
900	100	700	50	358	0.9678	0.7415	9.480302	10	0.9612	0.7630	752.11
900	100	800	50	382	0.9686	0.7427	10.441040	10	0.9580	0.7604	756.47
900	100	800	50	382	0.9686	0.7427	10.441040	100	0.9815	0.7799	765.95
900	100	800	50	382	0.9686	0.7427	10.441040	400	1.0013	0.7991	768.37
900	100	800	50	382	0.9686	0.7427	10.441040	600	1.0024	0.8003	772.22

In Table 2, for the first column, we considered $n_{1}$ as column size of original matrix $R_{1}$ , the second column is the column size of added matrix $R_{2}$ , and we selected for all experiments the same value $n_{2}=$ 100. The third column is the number of columns picked for matrix $C_{1}$ from $R_{1}$ , and we selected four values $c_{1}=$ 500, 600, 700 and 800. The fourth column is the number of columns picked for matrix $C_{2}$ from $R_{2}$ and we selected the same value $c_{2}=$ 50 as [18]. The fifth column is the eigenvector dimensions $k$ while have chosen by introduced optimal strategy Eq. (5). Overall, we compared the amount of RMSE, MAE, and running time of our optimal incremental ApproSVD algorithm in columns fifth, sixth and seventh, respectively, compared to results of [18] in the second part of the table. According to Table 2, by raising the amount of $C_{1}$ , the values of RMSE, MAE, and run time of the optimal incremental ApproSVD have been raised. However, compare to results of [18], it is obvious that our suggested method performed in a better way than choosing $k$ experimentally in reducing the MAE errors and running time, but by raising the number of columns, the values of RMSE have increased and it is more than the amount of RMSE of regular incremental ApproSVD. It should be pointed that by raising the amount of $k$ and fixing the amount of chosen columns $c_{1}=$ 800, in regular incremental ApproSVD, the RMSE grows significantly, while by using the introduced optimal strategy, we can control the rise of RMSE by selecting the most appropriate $k$ . In other words, the optimal strategy can be considered as an approch for finding the upper bound of approximation rank $k$ .

Table 3 includes the results of performing incremental SVD algorithm on Netflix dataset. We divided the table into three different parts: Optimal incremental SVD, regular incremental SVD with fixed rank $k=$ 2, and regular incremental SVD with improved rank $k=$ 50. Generally, our experiments prove that an increase in amount of rank $k$ leads to a minor error in predictions, while the run time is slower. Hence, the suggested strategy can provide the optimal rank in a reasonable runtime. Table 4 indicates the experiment of incremental SVD algorithm on Jester dataset. We partitioned the table into three main parts: Optimal incremental SVD, regular incremental SVD with fixed rank $k=$ 10, and regular incremental SVD with improved rank $k=$ 50. Vividly, like other datasets, the higher amount of rank has a good result on the prediction accuracy. With the help of optimal strategy, we can find the appropriate rank $k$ . Furthermore, when the amount of rank is 10, we lose lots of relevant data in the approximating data matrix so the amount of error in predictions can be more considerable.

Table 3

Experimenting optimal incremental SVD and regular incremental SVD algorithms (with fixed amount of rank $k$ ) on Netflix dataset

		Optimal Incremental SVD				Incremental SVD (with fixed $k=2$ )				Incremental SVD (with fixed $k=50$ )
$n_{1}$	$n_{2}$	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)
160	50	11	0.7691	1.0416	0.957561	2	0.7750	1.0470	0.966988	50	0.7664	1.0363	1.913158
170	50	12	0.7898	1.0555	0.748457	2	0.7910	1.0561	0.750140	50	0.7898	1.0572	1.973150
180	50	14	0.8260	1.0819	0.734361	2	0.8273	1.0901	0.728022	50	0.8180	1.0710	1.749486
190	50	16	0.8399	1.0996	0.716003	2	0.8465	1.1006	0.717597	50	0.8357	1.0905	1.780797

Table 4

Experimenting optimal incremental SVD and regular incremental SVD algorithms (with fixed amount of rank $k$ ) on Jester dataset

		Optimal Incremental SVD				Incremental SVD (with fixed $k=10$ )				Incremental SVD (with fixed $k=50$ )
$n_{1}$	$n_{2}$	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)
50	20	43	0.9070	1.3583	2.622029	10	1.4134	1.9975	2.491162	50	0.7399	1.1397	2.498808
60	20	51	0.8473	1.2648	3.081897	10	1.4785	2.0395	2.711224	50	0.8683	1.2860	2.759050
70	20	59	0.7704	1.1501	3.298398	10	1.5773	2.1156	3.074714	50	0.9653	1.3838	3.157536
80	20	67	0.7162	1.0722	3.465819	10	1.6552	2.1663	3.287857	50	1.0651	1.4792	3.285023

Table 5 illustrates the performance of incremental ApproSVD algorithms on the Netflix dataset. The table has three main parts, including the results in optimal incremental ApproSVD, regular incremental SVD with a fixed $k=$ 10, and regular incremental SVD with a fixed rank $k=$ 50. In implementing the ApproSVD algorithm, the dimension of the fold-in matrix $R_{2}$ is chosen 100 ( $n_{2}=$ 100). The influence of rank in incremental ApproSVD is different from incremental SVD. The optimal strategy estimates more appropriate $k$ , because the data matrix’s columns has been rearranged based on their probabilities before implementing the algorithm. Moreover, the terms of the error have smaller amounts by utilising optimal ApproSVD than regular ApproSVD.

Table 6 is the representation of applying ApproSVD on the Jester dataset with $n_{2}=$ 50. When the amount of rank is chosen through optimal strategy, the algorithm can provide a more reliable prediction about items of interest in the quicker amount of run time.

Table 5

Performance of optimal incremental ApproSVD and regular incremental ApproSVD algorithms (with fixed amount of $k$ ) by using Netfilix dataset

			Optimal Incremental ApproSVD				Incremental ApproSVD with $k=10$				Incremental ApproSVD with $k=50$
$n_{1}$	$c_{1}$	$c_{2}$	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)
240	160	50	22	0.9056	1.1701	1.243752	10	0.9062	1.1646	1.149683	50	0.9159	1.1830	1.262382
240	170	50	23	0.8739	1.1062	0.949777	10	0.8595	1.0984	0.988568	50	0.8614	1.1077	0.966711
240	180	50	24	0.8632	1.1175	0.838261	10	0.8772	1.1196	0.838275	50	0.8756	1.1182	0.846852
240	190	50	25	0.8544	1.1146	0.918938	10	0.8633	1.1104	0.860873	50	0.8638	1.1166	0.919519

Table 6

Numerical results of optimal incremental ApproSVD and regular incremental ApproSVD algorithms (with fixed amount of $k$ ) by using Jester dataset

			Optimal Incremental ApproSVD				Incremental ApproSVD with $k=10$				Incremental ApproSVD with $k=50$
$n_{1}$	$c_{1}$	$c_{2}$	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)	$k$	MAE	RMSE	Time (sec)
100	50	20	43	0.8756	1.3443	3.141894	10	1.4577	1.9353	3.104193	50	0.6067	1.1665	3.195677
100	60	20	51	0.7788	1.2007	3.557236	10	1.4829	1.9687	3.420721	50	0.8093	1.1273	3.363486
100	70	20	59	0.6741	0.9854	3.941319	10	1.4937	1.9833	3.746730	50	0.8988	1.2684	3.819553
100	80	20	67	0.6103	0.9632	4.148993	10	1.4968	1.9850	3.952896	50	0.9978	1.4119	3.957119

Figure 1.

MAE of optimal incremental SVD algorithm with the change of original column size on Movielens dataset.

Figure 2.

RMSE of optimal incremental SVD algorithm with the change of original column size on Movielens dataset.

Figure 3.

MAE values of optimal incremental SVD algorithm with the change of k on Movielens dataset.

Figure 4.

RMSE values of optimal incremental SVD algorithm with the change of k on Movielens dataset.

Figure 5.

Running time of optimal incremental SVD with the change of original column size on Movielens dataset.

Figure 6.

Running time of optimal incremental SVD algorithm with the change of k on Movielens dataset.

Figure 7.

MAE of optimal incremental ApproSVD algorithms with the change of original column size on Movielens dataset.

Figure 8.

RMSE of optimal incremental ApproSVD algorithms with the change of original column size on Movielens dataset.

Figure 9.

MAE values of optimal incremental ApproSVD algorithm with the change of k on Movielens dataset.

Figure 10.

RMSE values of optimal incremental ApproSVD algorithm with the change of k on Movielens dataset.

Figure 11.

Running time of optimal incremental ApproSVD algorithm with the change of k on Movielens dataset.

Figure 12.

Running time of optimal incremental ApproSVD algorithms with the change of original column size on Movielens dataset.

Figure 13.

MAE values of optimal incremental SVD algorithm with the change of k on Jester dataset.

Figure 14.

MAE values of optimal incremental SVD algorithm with the change of k on Netflix dataset.

Figure 15.

RMSE values of optimal incremental SVD algorithm with the change of k on Jester dataset.

Figure 16.

RMSE values of optimal incremental SVD algorithm with the change of k on Netflix dataset.

Figure 17.

RMSE values of optimal incremental ApproSVD algorithm with the change of k on Jester dataset.

Figure 18.

RMSE values of optimal incremental ApproSVD algorithm with the change of k on Netflix dataset.

Figure 19.

MAE values of optimal incremental ApproSVD algorithm with the change of k on Jester dataset.

Figure 20.

MAE values of optimal incremental ApproSVD algorithm with the change of k on Netflix dataset.

The behavior of optimal incremental SVD for Movielens dataset has been shown in Figs 1–6. According to Figs 1–4, the increase in amounts of original column size and $k$ leads to an increase in amounts of MAE and RMSE in the algorithm. Moreover, Figs 5 and 6 have manifested that the run time duration is dependent on original column size and amount of $k$ , since it has been increased when the size of matrix and values of $k$ have increased. In addition, the optimal incremental ApprSVD algorithm’s behavior on Movielens data has been demonstrated in Figs 7–12. According to Figs 7–10, the rise in the number of original column size and $k$ causes an increase in MAE and RMSE. Figures 11 and 12 show that run time duration is related to original column size and values of $k$ , since it became larger when the size of matrix and amount of $k$ have been increased.

According to Figs 14 and 16, the amounts of MAE and RMSE in applying incremental SVD algorithm on Netflix dataset have increased by increasing the amount of rank $k$ , which is entirely dependent on rising in the number of columns in each experiment. In contrast, Figs 13 and 15 show that the Jester dataset have met a decrease in MAE and RMSE. The reason can be the high percentage of sparsity in the Netflix data matrix. In Figs 17, 19, and 20, the amount of MAE and RMSE by performing incremental ApproSVD algorithm on both Jester and Netflix datasets have decreased gradually. The reason can be the rearrangement of columns based on their probabilities in incremental ApproSVD algorithm, which leads to a better approximation of the data matrix. Only in Fig. 18, the RMSE of incremental ApproSVD on Netflix data has fluctuated, which the reason behind it can be the high sparsity level.

5. Conclusion

In this paper, we tried to apply a mathematical method for choosing a reliable optimal rank $k$ for estimating the original data matrix in recommender systems when SVD applies as a dimension reduction technique. This strategy uses the properties of singular values in the data matrix and finds a trust-able approximation by eliminating irrelevant data. The performance of incremental SVD-based algorithms on the new approximated matrix results in smaller errors and faster running time compared to the past numerical results, where the amount of rank $k$ had been selected experimentally. Moreover, the results illuminated that the larger amount of rank $k$ leads to more accurate estimation for the data matrix, since it assists to keep more relevant data in the new matrix. Therefore, when the dimension of data matrix is enormous, optimal strategy can be a very helpful approach in finding a trust-able lower rank $k$ , which keeps the valuable data in approximated matrix.

References

Berry

M.W.

Browne

Langville

A.N.

Pauca

V.P.

and Plemmons

R.J.

, Algorithms and applications for approximate nonnegative matrix factorization, Computational Statistics and Data Analysis, Elsevier 52(1) (2006), 155–173.

Billsus

and Pazzani

M.J.

, Learning collaborative information filters, in: Proceedings of the Fifteenth International Conference on Machine Learning, Madison, 26–30 July 1998, pp. 46–54.

Brand

, Fast on-line SVD revisions for lightweight recommender systems, SDM, 2003.

S.S.

Wang

and Singh

, On the power of truncated SVD for general high-rank matrix estimation problems, in: Advances in Neural Information Processing Systems, 2017, pp. 445–455.

Eckart

and Young

, The approximation of one matrix by another of lower rank, Psychometrika 1 (1936), 211–218.

Ghazanfar

M.A.

and Prügel-Bennett

, The advantage of careful imputation sources in sparse data-environment of recommender systems: Generating improved SVD-based recommendations, Informatica (Slovenia) 37 (2013), 61–92.

Ghazanfar

M.A.

and Prügel-Bennett

, The advantage of careful imputation sources in sparse data-environment of recommender systems: Generating improved SVD-based recommendations, Informatica 37(1) (2013).

Kim

and Yum

B.J.

, Collaborative filtering based on iterative principal component analysis, Expert Systems with Applications 28(4) (2005), 823–830.

Kishore Kumar

and Schneider

, Literature survey on low rank approximation of matrices, Linear and Multilinear Algebra 65(11) (2017), 2212–2244.

10.

Mohammadi

Arabi Naree

and Lati

, User-item content awareness in matrix factorization based collaborative recommender systems, Intelligent Data Analysis 24(3) (2020), 723–739.

11.

Park

Y.J.

and Tuzhilin

, The long tail of recommender systems and how to leverage it, in: Proceedings of the 2008 ACM Conference on Recommender Systems, 2008, pp. 11–18.

12.

Qiao

, New SVD based initialization strategy for non-negative matrix factorization, Pattern Recogn. Lett 63(C) (2015), 71–77.

13.

Sarwar

B.M.

Karypis

Konstan

J.A.

and Riedl

J.T.

, Application of dimensionality reduction in recommender systems – A Case Study, in: Proceeding of ACM Web KDD Workshop on Web Mining for E-Commerce, ACM Press, New York, 2000, pp. 82–90.

14.

Sarwar

Karypis

Konstan

and Riedl

, Incremental singular value decomposition algorithms for highly scalable recommender systems, in: Fifth International Conference on Computer and Information Technology, Citeseer, 2002, pp. 27–28.

15.

Sarwar

Karypis

Konstan

and Riedl

, Analysis of recommendation algorithms for e-commerce, in: Proceedings of the 2nd ACM Conference on Electronic Commerce, ACM, 2000, pp. 158–167.

16.

Strang

, Introduction to Linear Algebra, Fourth Wellesley, MA: Wellesley-Cambridge Press, 2009.

17.

Takàcs

Pilàszy

Nèmeth

and Tikk

, Investigation of various matrix factorization methods for large recommender systems, in: 2008 IEEE International Conference on Data Mining Workshops, IEEE, 2008, pp. 553–562.

18.

Zhou

Huang

and Zhang

, SVD-based incremental approaches for recommender systems, Journal of Computer and System Sciences 81(4) (2015), 717–733. doi: 10.1016/j.jcss.2014.11.016.

19.

Zhou

Huang

and Zhang

, A personalized recommendation algorithm based on approximating the singular value decomposition (ApproSVD), in: Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, vol. 02, IEEE Computer Society, 2012, pp. 458–464.

20.

MovieLens dataset, http://www.grouplens.org/node/73.

21.

Netflix dataset, https://www.kaggle.com/shivamb/netflix-shows.

22.

Jester dataset, http://eigentaste.berkeley.edu/dataset/.

A strategy to estimate the optimal low-rank in incremental SVD-based algorithms for recommender systems

Abstract

Keywords

1. Introduction

1.1 Related works

1.2 Contribution

1.3 Organization

2. Preliminaries

2.1 Low-rank approximation

2.2 Matrix completion

2.3 Singular value decomposition (SVD)

(Singular Value Decomposition (SVD)).

2.5 Eckart-young theorem

(Eckart-Young [5]).

.

(NMF problem).

2.7 Principal component analysis

3. Incremental algorithms

4. Experimental evaluation

4.1 Datasets

4.2 Feature extraction and selection

Table 1 Comparing the performance of optimal incremental SVD and regular incremental SVD algorithms [18], by using Movielens dataset

References

Table 1
Comparing the performance of optimal incremental SVD and regular incremental SVD algorithms [18], by using Movielens dataset