User-item content awareness in matrix factorization based collaborative recommender systems

Abstract

Recommender systems promote sales of products and services by helping users alleviate the information overload problem. Collaborative filtering is most extensively used approach to design recommender system. The main idea of collaborative filtering is that recommendation for each active user is received by comparing with the preferences of other users who have rated the product in similar way to the active user. Matrix factorization technique is one of the most widely employed collaborative filtering techniques due to its effectiveness and efficiency in dealing with very large user-item rating matrices. One of the principal disadvantages and challenges of the collaborative filtering type algorithms is content awareness, namely, they use only people’s behavior to produce recommendations and are not aware of the predicted content’s metadata. In this work, we study and compare two ways of incorporating this type of content information directly into the matrix factorization approach. We extend the baseline optimization problem by two techniques. The first one penalizes item and user feature vectors with some small amounts pushing them towards each other in the latent space, and the second one makes two item and user specific latent feature vectors as similar as possible if the two items and users have similar tagging history. The results of the experiments, on the benchmark data sets, show that the proposed model has a better performance compared to some other methods.

Keywords

Recommender systems collaborative filtering matrix factorization singular value decomposition content awareness

1. Introduction

Modern consumers are inundated with choices. Huge selection of products, with unprecedented opportunities are offered by electronic retailers and content providers to meet a variety of special needs and tastes. User satisfaction and loyalty can be increased by matching consumers with the most appropriate products. Therefore, recommender systems which analyze patterns of user interest in products to provide personalized recommendations suiting a user’s taste, have become very popular in recent years. Due to the increasing importance of recommendation, it has become an autonomous research field since the mid 1990s [1]. Recommender systems can be categorized into the following main types [31, 14]:

•
Content-Based filtering

This system recommends items depending on the past activities of the user. A basic process focusing on matching up the features of a user profile, in which interests and preferences are stored, with the features of an item, is used in order to recommend new interesting items to the user. The system tries to recommend items which are similar to the ones a user liked in the past. The term similar can be quantitatively derived according to the domain of the items. For instance, if recommendation is done in book domain since the author of Anna Karenina and War and Peace are same, a user who likes the first one may like the second one also and vice versa [28]. Therefore, Content-based filtering algorithms try to recommend items based on similarity count [32].
•
Collaborative filtering

Collaborative filtering considered as “people-to-people correlation” recommends a particular user those items that other users with similar tastes liked in the past. This approach tries to find peers of users that share similar tastes in the specified domain [38]. Then items that are mostly liked by the peers of the particular user are recommended. Recent article by Feuerverger et al. [5] gives an extensive review and discussion of different collaborative filtering algorithms as well as an up-to-date and comprehensive bibliography. Most collaborative filtering are based on either nearest neighbors or matrix factorization [16]. Although the nearest-neighbor approach is more intuitive, the matrix-factorization approach has gained popularity as a result of the Netflix contest [17]. In particular, we focus on the matrix factorization approach only [5].
•
Hybrid recommender systems

The hybrid filtering is a combination of more than one filtering approach [32]. One motivation behind the collaborative filtering approach is to overcome some common problems associated with above filtering approaches such as cold start, overspecialization, and sparsity problems. The other is to improve the accuracy and efficiency of the recommendation process.

Unlike the content-based methods, collaborative filtering did not require to pick the semantic of the products. Because the collaborative filtering methods try to make predictions based on the past behaviour of other users that are similar to the user. However, collaborative filtering suffers from the following limitations [1, 37, 36].

1)
Data sparsity

A modern E-commerce recommender system may include millions of users and millions of items. Even for a very active user may exist a relatively small proportion of items in E-commerce systems. Meanwhile, even the very popular items are rated by only a tiny part of users existing in E-commerce systems. Due to the sparsity of available user activity records, it is difficult for collaborative filtering based recommender systems to discover similar users or similar items according to their rating behaviors. So the personalized recommendations for users can not be generated by the collaborative filtering based recommender systems. This problem, known as the data sparsity problem, is a major issue that leads to negative effects on the recommendation quality of the collaborative filtering based recommender systems.
2)
Cold start problem

Since collaborative filtering approach does not require extra information on the users or the items, it is capable of recommending an item without understanding the item itself [37]. However, this very advantage leads to the so-called “cold start” problem, which refers to the general difficulty in performing collaborative filtering for users and items that are relatively new. By definition, newer users are those who have not rated many items, so it is difficult to find other users with similar preferences. Likewise, newer items are those which have not been rated by many users, so it is difficult to recommend them to anyone.
3)
Scalability

In order to make recommendations for users, traditional collaborative filtering based recommender systems need to compute the pairwise similarities among users or among items, whose time complexity of computing similarities grows exponentially with the number of users and the number of items. As the rapidly growing amount of users and items available in E-commerce systems, traditional collaborative filtering algorithms suffer seriously from scalability problems.

In this paper, we try to inherit the advantages of matrix factorization approach as well as cope with the cold start problem by combining matrix factorization approach and content awareness about the indivi-dual items and users. For example, for recipes [6] we may know their ingredient lists; for movies [17] we may know their genres; for users we may know their occupation and ages [20]. Moreover, we focus on ways to take advantage of such content information directly in the matrix factorization approach, not by using a hybrid or two-step algorithm. We extend the baseline optimization problem by two techniques. The first one penalizes item and user feature vectors with some small amounts pushing them towards each other in the latent space, and the second one makes two item and user specific latent feature vectors as similar as possible if the two items and users have similar tagging history.

The rest of this paper is organized as follows. Section 2 introduces related works. Section 3 describes the details of our proposed recommendation algorithms by combining matrix factorization framework with two different user-item content awareness techniques. Experiments are evaluated in Section 4. The paper ends with a brief conclusion.
2. Related works

Many work has been proposed to overcome data sparsity, cold start problem and scalability issues in the research of recommender systems. For instance, in order to deal with the data sparsity issue, Sarwar et al. [34] and Yongli Ren [30] adopted imputation techniques to filling the missing ratings and make the user-item rating matrix dense. However, data imputation has still several unsolved issues, such as how to select the most important missing data to fill in. On the other hand, several clustering techniques based recommendation algorithms have been proposed to cope with the scalability issue [2, 19, 39]. Although clustering techniques based recommendation algorithms can improve the scalability of recommender systems, they often provide less personalized recommendations and often lead to poor accuracy. Another work dealing with the scalability issue refers to item-based techniques holding the promise of allowing CF-based algorithms to scale to large data sets and at the same time produce high-quality recommendations [33]. To overcome cold start problems,various ideas have been proposed. For example, using "filterbots" which are artificial items or users inserted into the system with pre-defined characteristics, have been suggested by Park et al. [27]. For instance, an action-movie filterbot can make recommendations to new users who have only liked one or two action movies. More recently, Zhao et al. [40] suggested shared collaborative filtering, an ensemble technique that aggregates predictions from several different systems. Since one recommender system may have data on user-item pairs that another one does not, it is possible to improve recommendations by sharing information across different systems. Another common approach for dealing with the cold start problem is to fill in the missing ratings with pseudo ratings before applying collaborative filtering. For example, Goldberg et al. [7] did this with principal component analysis. Nguyen et al. [25] did this with rule-based induction, while Melville et al. [21] did this with a hybrid, two-step approach, creating pseudo ratings with a content-based classifier. Moreover, another approach which can be a remedy for the problem of the cold start users is to generate user and item profiles by discovering frequent user-generated tag patterns, and to enrich each individual profile by a two-phase profile enrichment procedure [22].

In the last years, matrix factorization [17, 11] methods have drawn lots of attentions due to their good scalability and predictive accuracy. The performance of matrix factorization methods depends on how the system is modeled to mitigate the data sparsity and over-fitting problems. In [29], an algorithm is proposed based on the classic Multiplicative update rules, which utilizes imputed ratings to overcome the sparsity problem. Active learning algorithms are also effective in reducing the sparsity problem for recommender systems by requesting users to give ratings to some items when they enter the systems. In [9], a matrix factorization model is proposed, which incorporates the classic matrix factorization algorithms with ratings completion inspired by active learning. In addition, matrix factorization technique offers a flexible framework to incorporate additional sources of information to improve the recommendation quality. Koren [17] and Adomavicius [1], argued that additional information, such as social network information, user demographics and item descriptions, may provide useful information for matrix factorization technique to improve the recommendation performance. Following by the hints and with more rich additional sources of information become available, several recommendation approaches are introduced to extend the matrix factorization techniques by utilizing additional information recently. For example, Zhen et al. [41] proposed tag informed collaborative filtering to seamlessly integrate tagging history into the matrix factorization framework. Ma et al. [19] and Jamali and Ester [13] present social recommendation algorithms based on matrix factorization by employing both users’ social network information and rating records. Their experimental results demonstrate that those additional information can be used to improve the recommendation quality. Various additional information has been exploited to improve the quality of recommendation under the matrix factorization framework. Kim et al. [15] incorporated item attributes into an item-based probabilistic model to solve the cold start item problem. Gu et al. proposed a graph regularized nonnegative matrix factorization model for collaborative filtering, by constructing two graphs on the item as well as user side, to utilize the internal and external information [8]. To the best of our knowledge, there exists only one recommendation algorithm [26] which attempts to combine matrix factorization approach and item attribute information to improve the recommendation quality.

3. The proposed recommender system methodology

Our algorithms are based on the extending the techniques developed in [26] with some extra penalties. In this section, we first give a brief review of the basic matrix factorization method [17, 35]. Then two improvements over the traditional matrix factorization method are provided. Finally, the pseudo-code of the algorithms are presented.

3.1 Matrix factorization

Let $r_{ui}$ denotes the rating of user $u$ to item $i$ , for a given set of users $U=\{u_{1},\ldots,u_{N}\}$ , and set of items $I=\{i_{1},\ldots,M\}$ . These ratings compose a user-item matrix, $R=\left[r_{ui}\right]_{N\times M}$ . Since the rating matrix $R$ is highly sparse because of unknown entries, the set of known indices is denoted as follows:

$\displaystyle T=\{(u,i):r_{ui}∼{}\text{is∼{}known}\}.$

The goal of the recommender system is to predict the rating, $\hat{r}_{ui}$ , that user $u$ would give to item $i$ , for an unknown user item pair $(u,i)\not\in T$ . Moreover let $T_{u\cdot}$ be the set of items that have been rated by user $u$ , and $T_{\cdot i}$ be the set of users who rated item $i$ . Matrix factorization uses all known ratings to decompose the rating matrix $R$ into the dot product of two low-rank latent feature matrices in order to predict unknown ratings in $R$ . User latent features are represented by $P_{N\times K}$ , and item latent features are represented by $Q_{M\times K}$ which when multiplied return the best approximation of the original matrix $R$ ,

$\displaystyle R\approx\widehat{R}=\textit{PQ}^{T}=\left[\begin{array}[]{c}p_{1% }^{T}\\ p_{2}^{T}\\ \vdots\\ p_{N}^{T}\\ \end{array}\right]\left[\begin{array}[]{c c c c}q_{1}&q_{2}&\cdots&q_{M}\\ \end{array}\right],$

where $p_{u}(u=1,2,\ldots,N)$ is a $K$ -dimensional latent feature vector of user $u$ , and $q_{i}(i=1,2,\ldots,M)$ is a $K$ -dimensional latent feature vector of item $i$ . Each $K$ -th value of $p_{u}$ represents a preference of user $u$ , and each $K$ -th value of $q_{i}$ represents a degree on which item $i$ supports this preference. In practice those $K$ -dimensional vectors in the multidimensional latent space represent users’ preferences and items’ aspects. Preferences might be interests of users, while aspects are different properties of items such as a movie genre, length, quality, among others. Vectors, close to each other in the multidimensional space, represent similar preferences or aspects. Therefore if two users or items have one or several attributes in common, then their feature vectors should be close to each other in the latent space. The prediction of the rating $\hat{r}_{ui}$ given by user $u$ to item $i$ is given by $\hat{r}_{ui}=p_{u}^{T}q_{i}$ . In order to learn the latent feature vectors of users and items, we solve the approximate problem described above in a traditional way by solving the following optimization problem:

$\displaystyle\min_{P,Q}∼{}\left\|R-PQ^{T}\right\|^{2},$

where $\|\cdot\|$ is the Frobenius norm. The solution for this problem is to find two low $K$ -rank matrices $P$ and $Q$ , which when multiplied return the matrix as close as possible to the original matrix $R$ . The effect from the overfitting problem [10] is reduced by adding a regularization penalty $\lambda$ applied to $P$ and $Q$ as

$\displaystyle\min_{P,Q}∼{}\left\|R-PQ^{T}\right\|^{2}+\lambda(\left\|P\right\|% ^{2}+\left\|Q\right\|^{2}).$ (1)

Furthermore, since most entries in $R$ are unknown, we can only evaluate the first term in Eq. (1) over known entries $(u,i)\in T$ , i.e.

$\displaystyle\min_{P,Q}∼{}\sum_{(u,i)\in T}(r_{u,i}-p_{u}^{T}q_{i})^{2}+% \lambda\left(\sum_{u}\left\|p_{u}\right\|^{2}+\sum_{i}\left\|q_{i}\right\|^{2}% \right).$ (2)

3.2 Relative scaling of penalty terms

The principal idea of the relative scaling of the penalty terms is that the number of users can differ significantly from the number of items. For instance, the sum of $p_{u}$ in the second part of Eq. (2) can become much larger than the sum of $q_{i}$ , meaning that the second part of Eq. (2) will be mostly denoted by user vectors $p_{u}$ . Then it is beneficial to scale the second penalty term on the size of $q_{i}$ , such that the penalty on $q_{i}$ is on the same order of magnitude as on $p_{u}$ , with the factor $\gamma$ , where $\gamma$ is defined as a coefficient of the number of users to the number of items, that is $\gamma=\frac{N}{M}$ . This turns Eq. (2) into

$\displaystyle\min_{P,Q}∼{}\sum_{(u,i)\in T}(r_{u,i}-p_{u}^{T}q_{i})^{2}+% \lambda\left(\sum_{u}\left\|p_{u}\right\|^{2}+\gamma\sum_{i}\left\|q_{i}\right% \|^{2}\right).$ (3)

To sum up, recommender system tries to minimize the regularized squared error on the set of available ratings in order to learn the user and item factor vectors $p_{u}$ and $q_{i}$ .

3.3 Content awareness

In this subsection, we focus on a particular type of supplemental information, content information about the individual items and users. For example, for recipes [6] we may know their ingredient lists; for movies [17] we may know movie genres or dates of release, metadata information of users like age, occupation and so on. We focus on two ways to take advantage of such content information directly in the matrix factorization approach, in order to improve the recommendation accuracy of recommender systems. Both approaches use extra penalties with selective shrinkage effects. Suppose that, for each item $i$ , there is a content vector $a_{i}=\left[a_{i1},\ldots,a_{iD}\right]$ of $D$ attributes, and for each user $u$ , there is a content vector $b_{u}=\left[b_{u1},\ldots,b_{uE}\right]$ of $E$ attributes. Stacking these vectors together gives two attribute matrices, $A=\left[a_{id}\right]_{M\times D}$ , and $B=\left[b_{ue}\right]_{N\times E}$ , respectively. For simplicity, we may assume that all entries in $A$ and $B$ are binary, indicating whether item $i$ (user $u$ ) possesses attribute $d$ ( $e$ ). $A$ and $B$ will be used to calculate similarities between the items, and users, respectively.

3.3.1 Generalized alignment-biased technique

To incorporate $A$ and $B$ into the matrix factorization approach, one idea is as follows: if two items $i$ and $i^{\prime}$ and two users $u$ and $u^{\prime}$ share attributes in common, then it makes intuitive sense to require that their feature vectors, $q_{i}$ and $q_{i^{\prime}}$ ( $p_{u}$ and $p_{u^{\prime}}$ ), be “close” in the latent space. Then the optimization problem Eq. (3) is extended to

$\displaystyle\min_{P,Q}∼{}L_{\textit{gAB}}=\min_{P,Q}\sum_{(u,i)\in T}(r_{u,i}% -p_{u}^{T}q_{i})^{2}+\lambda\left(\sum_{u}\left\|p_{u}\right\|^{2}+\gamma\sum_% {i}\left\|q_{i}\right\|^{2}\right)-\lambda\sum_{u=1}^{N}\sum_{u^{\prime}=1}^{N% }w(u,u^{\prime})p_{u}^{T}p_{u^{\prime}}-\lambda\gamma\sum_{i=1}^{M}\sum_{i^{% \prime}=1}^{M}w(i,i^{\prime})q_{i}^{T}q_{i^{\prime}},$ (4)

where the subscript “gAB" stands for the “generalized alignment-biased", $w(i,i^{\prime})$ is a similarity coefficient between item $i$ and item $i^{\prime}$ and $w(u,u^{\prime})$ is a similarity coefficient between user $u$ and user $u^{\prime}$ . The similarity coefficients are calculated for each item $i$ and user $u$ as:

$\displaystyle w(i,i^{\prime})=\frac{\textit{exp}[\theta_{1}(a_{i}^{T}a_{i^{% \prime}}-c_{1})]}{1+\textit{exp}[\theta_{1}(a_{i}^{T}a_{i^{\prime}}-c_{1})]},$ $\displaystyle w(u,u^{\prime})=\frac{\textit{exp}[\theta_{2}(b_{u}^{T}b_{u^{% \prime}}-c_{2})]}{1+\textit{exp}[\theta_{2}(b_{u}^{T}b_{u^{\prime}}-c_{2})]},$

where $\theta_{j},∼{}j=1,2$ is a number selected individually from $-\infty$ to $\infty$ . Suggested values are $0.5$ , $1$ , and $1.5$ . Depending on the number of shared attributes $c_{j},∼{}j=1,2$ , it slightly change the similarity coefficient value range. The total sum of coefficients $w(i,i^{\prime})$ and $w(u,u^{\prime})$ is normalized to $1$ for each item $i$ and user $u$ , respectively. The optimization problem Eq. (4) uses a generalized alignment penalty, which penalizes item and user feature vectors by some small amounts pushing them towards each other in the latent space. This effect is called the Differential Shrinkage effect [26] and forms the basis for the generalized alignment-biased algorithm.

3.3.2 Tag informed technique

Since many commercial recommender engines allow users to create personalized tags, Zhen et al. [41] proposed a method to exploit information from these tags. Following the work of Li and Yeung [18], we make two user-specific and two item-specific latent feature vectors as similar as possible if the two users have similar tagging history and the two items have similar attributes by adding a tag-based penalty to the baseline optimization problem:

$\displaystyle\min_{P,Q}∼{}L_{\textit{TG}}=\min_{P,Q}\sum_{(u,i)\in T}(r_{u,i}-% p_{u}^{T}q_{i})^{2}+\lambda\left(\sum_{u}\left\|p_{u}\right\|^{2}+\gamma\sum_{% i}\left\|q_{i}\right\|^{2}\right)+\lambda\sum_{u=1}^{N}\sum_{u^{\prime}=1}^{N}% \|p_{u}-p_{u^{\prime}}\|^{2}w(u,u^{\prime})+\lambda\gamma\sum_{i=1}^{M}\sum_{i% ^{\prime}=1}^{M}\|q_{i}-q_{i^{\prime}}\|^{2}w(i,i^{\prime}),$ (5)

where the subscript “TG" stands for “tag" indicating where the original idea came from, $w(u,u^{\prime})$ is a measure of similarity between two users based on their tagging history and $w(i,i^{\prime})$ is a measure of similarity between two items based on their attributes. We use cosine similarity measures used by Zhen et al. [41]

$\displaystyle w(i,i^{\prime})=\frac{a_{i}^{T}a_{i^{\prime}}}{\|a_{i}\|\|a_{i^{% \prime}}\|},$ $\displaystyle w(u,u^{\prime})=\frac{b_{u}^{T}b_{u^{\prime}}}{\|b_{u}\|\|b_{u^{% \prime}}\|}.$

3.3.3 Alternating gradient descent

With both $P$ and $Q$ being unknown, the optimization problems Eqs (4) and (5) are not convex. They can be solved using an alternating gradient descent algorithm [17], moving along the gradient with respect to $p_{u}$ while keeping $q_{i}$ fixed, and vice versa. Gradient Descent is a local search method for minimization of a function. It achieves the local minima on a training dataset. The local minima might be a good solution for the global minimum also. For example, finding the local minima over the known set of information, namely, known ratings, shall help predict a global set of information, namely, unknown ratings. Let $\nabla^{\textit{gAB}}_{u}$ denotes the derivative of $L_{\textit{gAB}}$ with respect to $p_{u}$ and $\nabla^{\textit{gAB}}_{i}$ , its derivative with respect to $q_{i}$ . Then

$\displaystyle\nabla^{\textit{gAB}}_{u}=\sum_{i\in T_{u.}}-(r_{ui}-p_{u}^{T}q_{% i})q_{i}+\lambda\left(p_{u}-\sum_{u^{\prime}=1}^{N}w(u,u^{\prime})p_{u^{\prime% }}\right),$ (6) $\displaystyle\nabla^{\textit{gAB}}_{i}=\sum_{u\in T_{.i}}-(r_{ui}-p_{u}^{T}q_{% i})p_{u}+\lambda\gamma\left(q_{i}-\sum_{i^{\prime}=1}^{M}w(i,i^{\prime})q_{i^{% \prime}}\right),$ (7)

for every $u=1,2,\ldots,N$ and $i=1,2,\ldots,M$ . Similarly, by considering the relation $\|p_{u}-p_{u^{\prime}}\|^{2}=\|p_{u}\|^{2}+\|p_{u^{\prime}}\|^{2}-2p_{u}^{T}p_% {u^{\prime}}$ , for the Tag informed technique, we have

$\displaystyle\nabla^{\textit{TG}}_{u}=\sum_{i\in T_{u.}}-(r_{ui}-p_{u}^{T}q_{i% })q_{i}+\lambda\left((1+2w_{.u})p_{u}-2\sum_{u^{\prime}=1}^{N}w(u,u^{\prime})p% _{u^{\prime}}\right),$ (8) $\displaystyle\nabla^{\textit{TG}}_{i}=\sum_{u\in T_{.i}}-(r_{ui}-p_{u}^{T}q_{i% })p_{u}+\lambda\gamma\left((1+2w_{i.})q_{i}-2\sum_{i^{\prime}=1}^{M}w(i,i^{% \prime})q_{i^{\prime}}\right),$ (9)

for every $u=1,2,\ldots,N$ and $i=1,2,\ldots,M$ , where

$\displaystyle w_{.u}=\sum_{u^{\prime}=1}^{N}w(u,u^{\prime}),$ $\displaystyle w_{i.}=\sum_{i^{\prime}=1}^{M}w(i,i^{\prime}).$

We can see that, when compared with Eqs (6)–(7), the selective shrinkage effect is somewhat attenuated in Eqs (8)–(9). This is most clearly seen if we normalize the weights to sum to one, i.e., $w_{i·}=w_{.u}=1$ . Then, Eqs (8)–(9) simply become

$\displaystyle\nabla^{\textit{TG}}_{u}=\sum_{i\in T_{u.}}-(r_{ui}-p_{u}^{T}q_{i% })q_{i}+3\lambda\left(p_{u}-\frac{2}{3}\sum_{u^{\prime}=1}^{N}w(u,u^{\prime})p% _{u^{\prime}}\right),$ (10) $\displaystyle\nabla^{\textit{TG}}_{i}=\sum_{u\in T_{.i}}-(r_{ui}-p_{u}^{T}q_{i% })p_{u}+3\lambda\gamma\left(q_{i}-\frac{2}{3}\sum_{i^{\prime}=1}^{M}w(i,i^{% \prime})q_{i^{\prime}}\right),$ (11)

Equations (10)–(11) reveal a curious factor of $\frac{2}{3}$ in front of the weighted centroid, which clearly dampens the Tag informed algorithm’s corresponding shrinkage effect.

3.4 Algorithms

The algorithms are initialized with two matrices $P^{(0)}$ and $Q^{(0)}$ populated with usually small random values of $p_{u}$ and $q_{i}$ , and iteratively updated for all items and users in the representation matrix, namely, for all $u=1,\ldots,N$ and all $i=1,\ldots,M$ . The updating equations are given by

$\displaystyle p_{u}^{(j+1)}=p_{u}^{(j)}-\eta\nabla^{TG}_{u}(p_{u}^{(j)},q_{i}^% {(j)}),$ (12) $\displaystyle q_{i}^{(j+1)}=q_{i}^{(j)}-\eta\nabla^{TG}_{i}(p_{u}^{(j)},q_{i}^% {(j)}),$ (13)

for the Tag informed algorithm, and

$\displaystyle p_{u}^{(j+1)}=p_{u}^{(j)}-\eta\nabla^{gAB}_{u}(p_{u}^{(j)},q_{i}% ^{(j)}),$ (14) $\displaystyle q_{i}^{(j+1)}=q_{i}^{(j)}-\eta\nabla^{gAB}_{i}(p_{u}^{(j)},q_{i}% ^{(j)}),$ (15)

for the generalized alignment-biased algorithm, where $j$ is the number of iteration, and $\eta$ is a learning rate, sometimes called a step size of the gradient. The algorithm stops when the improvement in the $j+1$ iteration becomes less than some threshold $\varepsilon$ . Its output is two $K$ -rank lower-dimensional matrices, which when multiplied together return an approximation of the original matrix. The ratings for all missing values are merely $\hat{r}_{ui}=p_{u}^{T}q_{i}$ . Now, the Tag informed collaborative filtering algorithm can be illustrated by the following pseudo-code:

Algorithm (Tag informed)
Input: $R=\left[r_{ui}\right]_{N\times M}$
Output: $P, Q$ 1: initialize $j\longleftarrow 0$ and choose $P^{(0)}$ , $Q^{(0)}$
2: repear
3: for all $u=1,\ldots,N$ and $i=1,\ldots,M$ do
4: compute $\nabla^{\textit{TG}}_{u}$ and $\nabla^{\textit{TG}}_{i}$ with Eqs (8) and (9)
5: update $p_{u}^{(j+1)}$ and $q_{i}^{(j+1)}$ with Eqs (12) and (13)
6: end for
7: until $[L_{\textit{TG}}(P^{(j)},Q^{(j)})-L_{\textit{TG}}(P^{(j+1)},Q^{(j+1)})]/L_{% \textit{TG}}(P^{(j)},Q^{(j)})<\varepsilon$
8: return P, Q

Similarly, the generalized alignment-biased algorithm can be written as follows:

Algorithm (Generalized alignment-biased)
Input: $R=\left[r_{ui}\right]_{N\times M}$
Output: $P, Q$
1: initialize $j\longleftarrow 0$ and choose $P^{(0)}$ , $Q^{(0)}$
2: repeat
3: for all $u=1,\ldots,N$ and $i=1,\ldots,M$ do 4: compute $\nabla^{\textit{gAB}}_{u}$ and $\nabla^{\textit{gAB}}_{i}$ with Eqs (6) and (7)
5: update $p_{u}^{(j+1)}$ and $q_{i}^{(j+1)}$ with Eqs (14) and (15)
6: end for
7: until $[L_{\textit{TG}}(P^{(j)},Q^{(j)})-L_{\textit{TG}}(P^{(j+1)},Q^{(j+1)})]/L_{% \textit{TG}}(P^{(j)},Q^{(j)})<\varepsilon$
8: return P, Q

3.5 Initialization strategy

We have already mentioned that, when both $P$ and $Q$ are unknown, the optimization problems Eqs (4) and (5) are not convex, which means the alternating gradient descent algorithm will give us local solutions at best. Hence, a good initialization strategy is useful. SVD is a matrix factorization technique commonly used for producing low-rank approximations [3, 4]. Given a matrix $R\in\mathbb{R}^{N\times M}$ with $\text{rank}(R)=r$ , the Singular Value Decomposition (SVD) of $R$ is defined as follows:

$R=\textit{PSQ}^{T}$

where $P\in\mathbb{R}^{N\times N}$ , $Q\in\mathbb{R}^{M\times M}$ and $S\in\mathbb{R}^{N\times M}$ . The matrices $P$ and $Q$ are orthogonal, with their columns being the eigenvectors of $\textit{RR}^{T}$ and $R^{T}R$ , respectively. The middle matrix $S$ is a diagonal matrix with $r$ nonzero elements, which are the singular values of $R$ . Therefore, the effective dimensions of these three matrices $P$ , $S$ and $Q$ are $N\times r$ , $r\times r$ and $M\times r$ , respectively. The first diagonal $r$ elements $(\sigma_{1},\sigma_{2},\ldots,\sigma_{r})$ of S have the property that $\sigma_{1}\geqslant\sigma_{2}\geqslant\ldots\geqslant\sigma_{r}>0$ .

An important property of SVD, which is particularly useful in recommender system, is that it can provide the optimal approximation to the original matrix $R$ using three smaller matrices multiplication. By keeping the first $K$ largest singular values in $S$ and the remaining smaller ones set to zero, we denote this reduced matrix by $S_{K}$ . Then by deleting the corresponding columns of $P$ and $Q$ , which are the last $r-K$ columns of $P$ and $Q$ , we denote these two reduced matrices by $P_{K}\in\mathbb{R}^{N\times K}$ and $Q_{K}\in\mathbb{R}^{M\times K}$ , respectively. The truncated SVD is represented as

$\displaystyle R_{K}=P_{K}S_{K}Q_{K}^{T}.$

So the idea of initialization is that for given $K$ , one should apply truncated SVD to the representation matrix $R$ keeping unknown entries being zero [26]. As the result, the initial matrices $P^{(0)}$ and $Q^{(0)}$ become

$\displaystyle P^{(0)}_{\textit{SVD}}=P_{K}S_{K},$ $\displaystyle Q^{(0)}_{\textit{SVD}}=Q_{K}.$

One more step to the final versions of $P^{(0)}$ and $Q^{(0)}$ is to add some degree of randomness to the SVD initialization. The matrix $P^{(0)}$ was initialized as

$\displaystyle P^{(0)}=kP^{(0)}_{\textit{SVD}}+(1-k)P^{(0)}_{\textit{RANDOM}},$ (16)

where entries in $P_{\textit{RANDOM}}^{(0)}$ are given by the Gaussian distribution, namely, $N(0,\sigma^{2})$ . After a series of experiments we have chosen $\sigma=$ 0.7 as the most efficient. $k=$ 0 means that SVD initialization is not applied as can be seen from Eq. (16). When $k=1$ the SVD results are utilized. Other values between 0 and 1 introduce a degree of randomness to initial $P^{(0)}$ and $Q^{(0)}$ matrices. $k=$ 0.5 is still introducing randomness and allowing the matrix factorization algorithms to learn from the model utilizing the training set. By replacing $P$ with $Q$ in Eq. (16), the same initialization strategy was applied to $Q^{(0)}$ . The evaluation, key algorithm’s parameters and results are discussed in the next section.

4. Evaluation

In this section, we will analyze the accuracy of the predictions and recommendations of our proposed generalized alignment-biased (gAB) and Tag informed (TG) algorithms. Several experiments were performed to test the efficiency of the proposed method using two real datasets including MovieLens [23] and NetFlix [24]. We will compare our algorithms with classic SVD algorithm implemented in Matlab, Bayesian Non negative Matrix Factorization (BNMF) method [11], Imputation-based Multiplicative update rules (IMULT) [29], and Enhanced SVD (ESVD) model [9]. As the results of three competing approaches [9, 11, 29] have been lifted from their papers, it was not possible for us to compare our approach with them in some cases. The proposed method is different from state of the art methods in that it includes an initialization step using SVD and applying user and item similarity latent space to guide the iterative solutions to the true ratings. Our method outperforms ESVD, IMFM, and BNMF in all numerical experiments which are lifted from their papers. Moreover, the accuracy of the Tag informed algorithm appears to be less than the generalized alignment-biased algorithm. We think this is due to its much dampened shrinkage effect. All the algorithms were implemented in MATLAB 7.0 on a personal computer equipped with a 3.20 GHz Intel Core i5-4460 processor and 4 GB RAM.

4.1 Dataset description

We have used real datasets to conduct the experiment: “MovieLens 1M”, “MovieLens 10M” and “NetFlix”. MovieLens and NetFlix [20] are web-based research recommender systems. The MovieLens datasets, first released in 1998, describe people’s expressed preferences for movies. These preferences take the form of $\langle\textit{user, item, rating, timestamp}\rangle$ tuples, each the result of a person expressing a preference (1–5 rating) for a movie at a particular time. These preferences were entered by way of the MovieLens web site, a recommender system that asks its users to give movie ratings in order to receive personalized movie recommendations. There have been four MovieLens datasets released, reflecting the approximate number of ratings in each dataset. We use “MovieLens 1M” and “MovieLens 10M” in our experiments. The Netflix Prize dataset was made available in 2006 as part of the Netflix Prize to improve the accuracy of predictions. Summary statistics for the datasets used in this paper are shown in Table 1. The ratings are converted into a user-item matrix. The rating range is from 1 to 5, where 1 represents dislike and 5 represents a strong preference. All unrated items have a value of zero. In order to compute similarity measure among each pair of movies in MoviLlens datasets, we have represented each movie in a matrix which its columns are: (unknown, Action, Adventure, Animation, Children’s, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western). Each movie is represented by a row in the matrix and in each column, a 1 indicates the movie is of that genre and a 0 indicates it is not. Movies can be in several genres at once. This process has been done for users at the same way. The columns of the information matrix for users are (Age in range 1–10, Age in range 11–20, Age in range 21–30, Age in range 31–40, Age in range 41–50, Age in range 51–60, male, female). As the NetFlix dataset does not contain the genre information of movies, we have extracted these information from IMDB [12]. To this end we have matched movie names from NetFlix to IMDB and then extracted the genre of movies.

Table 1
Quantitative summary of the ratings datasets used. The sole computed column, Density, represents the percentage of cells in the full user-item matrix that contain rating values

Name	Date range	Rating scale	Users	Movies	Ratings	Density
MovieLens 1M	2000–2003	1–5	6,040	3,706	1,000,209	4.47%
MovieLens 10M	1995–2009	1–5	69,878	10,681	10,000,054	1.34%
Netflix	1998–2005	1–5	480,000	17,000	100,000,000	1.178%

4.2 Parameters summary

All the required parameters are summarized in Table 2. We examined our algorithms with factorizations of dimension $K=$ 5, and convergence value $\epsilon=$ 0.005. After a series of experiments we have chosen $\sigma=$ 0.7 as the most efficient. We choose $c_{1}=c_{2}=1$ for generalized alignment biased algorithm, activating the alignment penalty as long as two items and users shared any attribute at all. As for the generalized alignment biased algorithm, large smoothing parameters $\theta_{1}$ and $\theta_{2}$ will cause it to behave very much like alignment biased, whereas small values of $\theta_{1}$ and $\theta_{2}$ will essentially eliminate the effect of the alignment penalty. To focus on main ideas rather than fine details, we only provide an illustration of this algorithm using $\theta_{1}=\theta_{2}=1$ . The parameter $k=$ 0.5 is considered in order to introduce randomness and allowing the matrix factorization algorithms to learn from the model utilizing the training set. For gradient descent algorithms, it is well understood that the learning rate $\eta$ should be kept fairly small to ensure that we are moving in a descent direction at each iteration. On the other hand, for practical reasons (e.g., so that the algorithm does not take forever to finish running) we would like to use the largest $\eta$ feasible one that still ensures that we are moving downhill. So the moderate number $\eta=0.006$ has been used, experimentally. The purpose of the scaling factor $\gamma$ is to balance the two penalties, the one on $\sum\|p_{u}\|^{2}$ and the other on $\sum\|q_{i}\|^{2}$ , so the objective function is not dominated by either the user or the item side of the equation. With this in mind, we used $\gamma=\frac{N}{M}$ . We also used $\lambda=$ 0.075 from empirical experiments.

Table 2
Parameters summary

$\epsilon$	$K$	$\lambda$	$\gamma$	$\eta$	$\theta_{1}$	$\theta_{2}$	$c_{1}$	$c_{2}$	$k$	$\sigma$
0.00	5	0.075	$\frac{N}{M}$	0.006	1	1	1	1	0.5	0.7

4.3 Results

This section represents the results of the proposed algorithms and compares them with some of the recent successful approaches [11, 29, 9]. Analyzing errors of the proposed algorithms in different ways, we conclude that the algorithm Generalized Alignment Biased has a considerable precision in recommendation process. This high precision is resulted by use of the user and item similarity latent space, described in Section 3.3.1.

4.3.1 Accuracy in predictions

The main purpose of recommender systems is to predict rates of new items for a test user that is called ’active user’. In this section we will review two measures in order to analyze the accuracy of our algorithm, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). We use these metrics to measure the closeness of predicted ratings to the actual ones.

$\displaystyle\text{MAE}=\frac{\displaystyle\sum_{(u,i)\in T}{|r_{u,i}-\hat{r}_% {i,u}}|}{|T|},$ $\displaystyle\text{RMSE}=\sqrt{\frac{1}{T}\sum_{i=1}^{|T|}\left(r_{u,i}-\hat{r% }_{u,i}\right)^{2}},$

Figure 1.

RMSE related to two recommendation algorithms on a subset of 700 users of MovieLens 1M dataset.

Where $r_{u,i}$ represents the actual rating of user $u$ to item $i$ , $\hat{r}_{u,i}$ represents the predicted rating of user $u$ to item $i$ and $T$ is the training set of all ratings of user $u$ to item $i$ . For a given recommender, lower values of MAE and RMSE correspond to higher prediction accuracy.

Table 3

RMSE values of gAB, TG and Matlab SVD algorithms for MovieLens 10M dataset for different percentages of user-item matrix data ( $L$ ) and different number of latent features ( $k$ )

Number of latent features	Algorithm	$L=$ 100%	$L=$ 80%	$L=$ 60%	$L=$ 50%
$k=2$	gAB	0.9043	0.9108	1.0451	1.0976
	TG	0.9148	0.9157	0.9189	0.9218
	Matlab SVD	3.7313	3.7612	3.8104	3.8539
$k=$ 5	gAB	0.9149	0.9583	1.1047	1.1843
	TG	0.9531	0.9601	1.1193	1.2074
	Matlab SVD	3.8647	3.9054	3.9535	4.0177
$k=$ 7	gAB	0.9364	0.9984	1.1659	1.2051
	TG	0.9861	1.0037	1.1983	1.2739
	Matlab SVD	3.9638	3.9837	4.1905	4.2747

Table 4

MAE values of gAB, TG and Matlab SVD algorithms for MovieLens 10M dataset for different percentages of user-item matrix data ( $L$ ) and different number of latent features ( $k$ )

Number of latent features	Algorithm	$L=$ 100%	$L=$ 80%	$L=$ 60%	$L=$ 50%
$k=2$	gAB	0.6654	0.6767	0.8082	0.9139
	TG	0.6711	0.6771	0.8912	1.0993
	Matlab SVD	3.2422	3.2904	3.3862	3.4192
$k=$ 5	gAB	0.6930	0.7173	0.9830	1.0962
	TG	0.7018	0.7198	1.1354	1.1973
	Matlab SVD	3.2981	3.3638	3.3907	3.4249
$k=A$ 7	gAB	0.7205	0.7539	1.0503	1.1985
	TG	0.7415	0.7562	1.1073	1.2198
	Matlab SVD	3.3073	3.4082	3.5692	3.8471

Figure 2.

MAE related to two recommendation algorithms on a subset of 700 users of MovieLens 1M dataset.

Detailed results for a subset of data, 700 users, are depicted in Figs 1 and 2. The two curves shown in Fig. 1 demonstrate different trend of RMSE values for two proposed algorithms on a subset of 700 users in MovieLens 1M dataset. The two types of points in Fig. 2 demonstrate different trend of MAE values for two proposed algorithms. As expected, gAB algorithm produces more accurate recommendations than TG algorithm in terms of RMSE and MAE. This is due to its much dampened shrinkage effect. In order to compare the efficiency of the proposed generalized alignment-biased (gAB) and Tag informed (TG) algorithms with classic SVD algorithm implemented in Matlab with different percentages of the user-item matrix dataset, each time a fraction of the ratings were randomly selected. For example when the selected percentage of data is set to $L=$ 80%, it means that 80% of training data (i.e. known rating scores) has been randomly chosen to be used for estimation of the remaining part. Tables 3 and 4 report RMSE and MAE of our proposed algorithms and Matlab SVD algorithm for MovieLens 10M dataset for different number of latent features ( $k=$ 2, 5, 7) and different percentages of the training set. It can be seen that the MAE and RMSE of both proposed gAB and TG algorithms are lower than the one obtained with Matlab SVD algorithm. Moreover, the accuracy of the Tag informed algorithm appears to be less than the generalized alignment-biased algorithm. As the evaluation of three competing matrix factorization methods BNMF [11], IMULT [29], and ESVD [9] have only been reported with 80% of data as training, we compare the RMSE and MAE values for several different number of latent features ( $k=$ 2, 5, 7) for MovieLens 10M dataset while 80% of training data has been randomly chosen. It can be seen from Tables 5 and 6 that gAB algorithm nearly always achieves a higher accuracy compared to TG, ESVD, IMULT, and BNMF methods. For example when $k=$ 7, gAB algorithm attains 0.9984 (0.7539) for RMSE (MAE), whereas for TG, ESVD, IMULT, and BNMF the RMSE (MAE) were respectively reported 1.0037 (0.7562), 1.1937 (0.9381), 1.0086 (0.7763), and 1.0193 (0.7307). Consequently, it can be concluded that the gAB algorithm achieves better performance than TG algorithm and the other recent methods. It should be noted that the TG algorithm has better performance than the other three approaches ESVD, IMULT, and BNMF in most cases. Table 7 shows the average RMSE values of 10 independent runs on two datasets MovieLens and NetFlix with the latent feature value $k=$ 2. It can be seen that the gAB algorithm performs slightly better than other methods. For example the average RMSE value of the gAB algorithm for NetFlix dataset is 0.9319, while it is 0.9513, 0.9452 and 0.9412 for ESVD, BNMF and TG algorithms, respectively.

Table 5

RMSE values of ESVD, IMULT, BNMF, TG and gAB for 80% of MovieLens 10M dataset as training and different number of latent features ( $k$ )

$k$	ESVD	IMULT	BNMF	TG	gAB
2	0.9615	0.9160	0.9229	0.9157	0.9108
5	0.9938	0.9710	0.9598	0.9601	0.9583
7	1.1937	1.0086	1.0193	1.0037	0.9984

Table 6

MAE values of ESVD, IMULT, BNMF, TG and gAB for 80% of MovieLens 10M dataset as training and different number of latent features ( $k$ )

$k$	ESVD	IMULT	BNMF	TG	gAB
2	0.9201	0.7094	0.6760	0.6771	0.6767
5	0.9289	0.7514	0.7203	0.7198	0.7173
7	0.9381	0.7763	0.7307	0.7562	0.7539

Table 7

Average RMSE values for ESVD, BNMF, IMULT, TG and gAB for MovieLens and NetFlix datasets with latent feature ( $k=2$ )

Data set	ESVD	BNMF	IMULT	TG	gAB
MovieLens 10M	0.9318	0.9172	0.9096	0.9101	0.9091
NetFlix	0.9513	0.9452	–	0.9412	0.9319

4.3.2 Quality of recommendations

In this section we will study the quality of the recommendations through two popular measures, Precision and Recall. Precision of a recommender system is measured according to the number of high rated items in the recommendation list that the user likes, as stated in the Eq. (17). Actually, the precision measure is defined as the percentage of items correctly recommended over the number of items recommended.

$\displaystyle\text{precision}=\frac{\left|\text{interesting items}∼{}\bigcap∼{% }\text{recommended items}\right|}{\text{recommended items}}$ (17)

The recall measure is defined as the percentage of items correctly recommended over the number of items that should be recommended.

$\displaystyle\text{recall}=\frac{\left|\text{interesting items}∼{}\bigcap∼{}% \text{recommended items}\right|}{\text{interesting items}}$

In Tables 8 and 9 we show the precision and recall for three algorithms gAB, TG, and Matlab SVD. As may be seen, the proposed algorithms gAB and TG provide much better results than Matlab SVD in terms of precision and recall. The results show that quality of gAB algorithm is higher than TG algorithm.

Table 8

Quality of recommendations measured by Precision (in percentage)

Data set	Algorithm	The border of rate value
		Rate value $\geqslant$ 3.5	Rate value $\geqslant$ 4
MovieLens 1M	Generalized alignment biased	71.58	89.06
	Tag informed	48.01	76.79
	Matlab SVD	18.24	19.32
MovieLens 10M	Generalized alignment biased	70.76	87.0
	Tag informed	47.95	51.05
	Matlab SVD	17.34	19.10

Table 9

Quality of recommendations measured by Recall (in percentage)

Data set	algorithm	The border of rate value
		Rate value $\geqslant 3.5$	Rate value $\geqslant 4$
MovieLens 1M	Generalized alignment biased	36.09	47.25
	Tag informed	26.18	32.82
	Matlab SVD	7.85	11.00
MovieLens 10M	Generalized alignment biased	36.00	42.83
	Tag informed	25.96	32.49
	Matlab SVD	7.39	10.96

5. Conclusion and future work

In this paper, we propose two different matrix factorization based collaborative filtering algorithms which are aware of the predicted content’s metadata. Both approaches use extra penalties with selective shrinkage effects which shrink items and users which share common attributes toward each other. The generalized alignment-biased algorithm try to push item and user feature vectors towards each other in the latent space by penalizing them with some small amounts. The Tag informed algorithm with much dampened shrinkage effect, adds a tag-based penalty to the baseline optimization problem which makes two item and user specific latent feature vectors as similar as possible if the two items and users have similar tagging history. Experiments with two data sets have shown that these content-boosted algorithms can achieve better recommendation accuracy. The performance of the Tag informed algorithm appears to be less accurate than the generalized alignment-biased algorithm. We think this is due to its much dampened shrinkage effect.

There are mutual benefits between recommender systems and social platforms. On one hand, social interactions can be extracted and used as input for the recommender system, as it helps to better understand the user interests and information needs. By considering information in social networks, including user preferences, item’s general acceptance, and influence from social friends, the recommendations would be more trusted by users. Using such additional information about users, we can tackle the problem of cold start and sparsity in the rating matrix on which the matrix factorization is applied. On the other hand, recommender systems can help to improve user participation in social systems by recommending new friends or interesting contents such as images, videos, advertisements, and so on. We aim to integrate social relationships and user-generated tags into the user-based collaborative filtering recommender systems based on the matrix factorization technique.

References

Adomavicius

Tuzhilin

and Khatri

, Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions, Knowledge and Data Engineering, IEEE Transactions 17 (2005), 734–749.

Al Mamunur Rashid

S.K.L.

Karypis

and Riedl

, Clustknn: a highly scalable hybrid model-& memory-based cf algorithm, In Proc. WebKDD, 2006.

Datta

B.N.

, Numerical Linear Algebra and Applications, SIAM 2010, ISBN 978-0-89871-685-6.

Eldén

, Matrix Methods in Data Mining and Pattern Recognition (Fundamentals of Algorithms), Society for Industrial and Applied Mathematics, Philadelphia (2007).

Feuerverger

and Khatri

, Statistical significance of the Netflix challenge, Statistical Science 27 (2012), 202–231.

Forbes

and Zhu

, Content-boosted matrix factorization for recommender systems: Experiments with recipe recommendation, In Proc. the 5th ACM Conference on Recommender Systems, 2011, pp. 261–264.

Goldberg

Roeder

Gupta

and Perkins

, Eigentaste: A constant time collaborative filtering algorithm, Information Retrieval 4 (2001), 133–151.

Zhou

and Ding

, Collaborative Filtering: Weighted Nonnegative Matrix Factorization Incorporating User and Item Graphs, In Proc. the 2010 SIAM International Conference on Data Mining, SIAM, 2010, pp. 199–210.

Guan

and Guan

, Matrix Factorization With Rating Completion: An Enhanced SVD Model for Collaborative Filtering Recommender Systems, In IEEE Access, 2017, pp. 27668–27678.

10.

Hawkins

D.M.

, The problem of overfitting, Journal of Chemical Inform and Comp Sc 44 (2004), 1–12.

11.

Hernando

Bobadillal

and Ortega

, A nonnegative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model, Knowledge-Based Systems 97 (2016), 188–202.

12.

IMDB, the Internet Movie Database, http://www.imdb.com/.

13.

Jamali

and Ester

, A matrix factorization technique with trust propagation for recommendation in social networks, In Proc. the fourth ACM conference on Recommender systems, 2010, pp. 135–142.

14.

Jannach

Zanker

Felfernig

and Friedrich

, Recommender systems: an introduction, Cambridge University Press, 2010.

15.

Kim

B.M.

and Li

, Probabilistic model estimation for collaborative filtering based on items attributes, In Proc. the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, 2004, pp. 185–191.

16.

Koren

, Factorization meets the neighborhood: A multifaceted collaborative filtering model, In Proc. the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 426–434.

17.

Koren

Bell

and Volinsky

, Matrix factorization techniques for recommender systems, Computer 42 (2009), 30–37.

18.

W.J.

and Yeung

D.Y.

, Relation regularized matrix factorization, In Proc. the 21st International Joint Conference on Artificial Intelligence, 2009, pp. 1126–1131.

19.

Yang

Lyu

M.R.

and King

, Sorec: social recommendation using probabilistic matrix factorization, In Proc. the 17th ACM conference on Information and knowledge management, 2008, pp. 931–940.

20.

Maxwell Harper

and Konstan

J.A.

, The MovieLens Datasets: History and Context, ACM Transactions on Interactive Intelligent Systems 5 (2005), 1–19.

21.

Melville

Mooney

R.J.

and Nagarajan

, Content-boosted collaborative filtering for improved recommendation, In Proc. the 18th National Conference on Artificial Intelligence, 2002, pp. 187–192.

22.

Movahedian

and Khayyambashi

M.R.

, A tag-based recommender system using rule-based collaborative profile enrichment, Intelligent Data Analysis 18 (2014), 953–972.

23.

MovieLens, https://grouplens.org/datasets/movielens/.

24.

Netflix dataset used for their competition, http://www.netflixprize.com/download.

25.

Nguyen

A.T.

Denos

and Berrut

, Improving new user recommendations with rule-based induction on cold user data, In Proc. the 2007 ACM Conference on Recommender Systems, 2007, pp. 121–128.

26.

Nguyen

and Zhou

, Content-boosted matrix factorization techniques for recommender systems, Statistical Analysis and Data Mining 6 (2013), 286–301.

27.

Park

S.T.

Pennock

Madani

Good

and DeCoste

, NaïÄ±ve filterbots for robust cold-start recommendations, In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 699–705.

28.

Pazzani

M.J.

, A framework for collaborative, content-based and demographic filtering, Artificial Intelligence Review 13 (1999), 393–408.

29.

Ranjbar

Moradi

Azami

and Jalili

, An imputation-based matrix factorization method for improving accuracy of collaborative filtering systems, Engineering Applications of Artificial Intelligence 46 (2015), 58–66.

30.

Ren

Zhang

and Zhou

, Lazy collaborative filtering for data sets with missing values, IEEE Trans Cybern 43 (2013), 1822–1834.

31.

Ricci

Rokach

Shapira

and Kantor

P.B.

, Recommender Systems Handbook, Springer-Verlag New York, Inc., New York, NY, USA, 1st edition, 2010.

32.

Sánchez

and Luis

, Improving collaborative filtering based recommender systems using pareto dominance, Diss E_Informatica, 2013.

33.

Sarwar

Karypis

Konstan

and Riedl

, Item-based collaborative filtering recommendation algorithms, In Proc. the 10th international conference on World Wide Web. ACM, 2001, pp. 285–291.

34.

Sarwar

Konstan

J.A.

Borchers

Herlocker

Miller

and Riedl

, Using filtering agents to improve prediction quality in the grouplens research collaborative filtering system, In Proc. the 1998 ACM conference on Computer supported cooperative work, 1998, pp. 345–354.

35.

Symeonidis

and Zioupos

, Matrix and Tensor Factorization Techniques for Recommender Systems, Springer, 2016.

36.

Singh

, Scalability and sparsity issues in recommender datasets: a survey, Knowl. Inf. Syst. (2018). doi: 10.1007/s10115-018-1254-2.

37.

and Khoshgoftaar

T.M.

, A survey of collaborative filtering techniques, Advances in Artificial Intelligence, vol. 2009, Article ID 421425, 19 pages, 2009. doi: 10.1155/2009/421425.

38.

Sun

H.F.

Chen

J-L.

Liu

C-C.

Peng

Chen

and Cheng

, JacUOD: A New Similarity Measurement for Collaborative Filtering, Journal of Computer Science and Technology 27 (2012), 1252–1260.

39.

Zahra

Ghazanfar

M.A.

Khalid

Azam

M.A.

Naeem

and Prugel-Bennett

, Novel centroid selection approaches for KMeans-clustering based recommender systems, Information Sciences 320 (2015), 156–189.

40.

Zhao

Feng

and Liu

, Shared collaborative filtering, In Proc. the 5th ACM Conference on Recommender Systems, 2011, pp. 29–36.

41.

Zhen

W.J.

and Yeung

D.Y.

, TagiCoFi: Tag informed collaborative filtering, In Proc. the 3rd ACM Conference on Recommender Systems, 2009, pp. 69–76.

User-item content awareness in matrix factorization based collaborative recommender systems

Abstract

Keywords

1. Introduction

3. The proposed recommender system methodology

3.1 Matrix factorization

3.3.1 Generalized alignment-biased technique

4.1 Dataset description

Table 1 Quantitative summary of the ratings datasets used. The sole computed column, Density, represents the percentage of cells in the full user-item matrix that contain rating values

Table 2 Parameters summary

4.3.1 Accuracy in predictions

References

Table 1
Quantitative summary of the ratings datasets used. The sole computed column, Density, represents the percentage of cells in the full user-item matrix that contain rating values

Table 2
Parameters summary