Tag-aware recommendation based on Bayesian personalized ranking and feature mapping

Abstract

Collaborative filtering recommendation with implicit feedbacks (i.e., clicks, views, check-ins) has been gaining increasing attention in various real applications. Tagging information is the common resource to complement implicit feedbacks to assist collaborative filtering recommendation. However, existing tag-aware recommendation methods still suffer from the problem of high dimension and sparsity of tagging information. They also fail to realize that recommendation is inherent a ranking-oriented optimization task. To this end, we propose a novel tag-aware recommendation framework by incorporating tag mapping scheme into ranking-based collaborative filtering model, to boost ranking-oriented personalized recommendation performance. We first build ranking-oriented optimization model based on Bayesian personalized ranking optimization criterion with matrix factorization, by leveraging implicit feedbacks to learn the latent feature vectors of users and items. Then, we propose an explicit-to-implicit feature mapping scheme, mapping the high-dimensional and sparse explicit tags (i.e., user-tag weighting matrix and item-tag weighting matrix) to low-dimensional and compact implicit features of uses and items. This could serve as the regularization constraint of latent features derived from implicit feedbacks. To further enhance recommendation performance, we also introduce users’ neighbor relationships to regularize user latent features based on manifold learning. Experiments on real-world recommendation datasets show that the proposed recommendation method outperformed competing methods on ranking-oriented recommendation performance.

Keywords

Tag-aware recommendation implicit feedback Bayesian personalized ranking feature mapping

1. Introduction

Recent years have witnessed the prevalence of personalized recommendation in various real applications, which makes it much easier for people to acquire their needs. Researches indicated that recommendation with implicit feedbacks (i.e., clicks, browsing, shopping records, etc.) has been receiving more and more attention than explicit feedbacks (i.e., ratings). This is due to that implicit feedbacks are relatively more abundant in real applications and are easily collected. Both rating-based and ranking-based methods are popular, while some recent studies have demonstrated that ranking-based methods may be more suitable for top- $n$ personalized recommendation. This is due to that top- $n$ recommendation is inherently a ranking task and there is no guarantee that the higher accuracy of rating prediction results in the better ranking results. Nevertheless, the sparse situation of implicit feedbacks (i.e., user-item binary interactions) drives researchers to incorporate additional information to assist item recommendation. Indeed, in real recommendation scenario, different users may care about different features of the same item, we should know what kinds of content information that will attract users [1]. Fortunately, social tagging, a new type of information, has been gaining wide-spread popularity in a variety of Web 2.0 applications, ranging from social bookmarking sites (e.g., Delicious),1

¹
http://www.delicious.com/.

music sites (e.g., Last.fm),2

http://www.lastfm.com/.

movie rating sites (e.g., MovieLens),3

http://amazon.com.

E-commerce sites (e.g., Amazon)4

⁴

http://amazon.com.

to news website (e.g., Weibo),5

⁵

http://www.weibo.com.

etc. Various personalized tags are contributed by users to annotate web resources (also called items). Studies in [2] have made preliminary attempts to analyze the utility of these tags and found that user-generated tags are consistent with the web content and can capture the underlying features/topics of users’ interest quite well. Tagging data constitute a novel source of content data complementing standard implicit feedbacks. Therefore, it would be a worthwhile endeavor to exploit the tagging information for recommendation improvement.

There have been numbers of researches on tag-aware collaborative filtering (CF) recommendation, which are divided into two mainstreams, including neighborhood-based and model-based methods. 1) Neighborhood-based methods recommend items to the target user in two ways: items similar to that the user has selected, items that similar users have selected. Tag-based features takes tags as explicit features in describing user preferences and item properties, and could be used to compute item similarity and user similarity, which can be well incorporated into neighborhood-based CF methods [3, 4]. 2) Model-based methods, including matrix factorization [5, 6, 7], diffusion-based model [8] and random-walk-based algorithm [9], tensor factorization [10, 11, 12], deep learning [13], could absorb tagging information in a nice way to assist item recommendation. For example, Zheng and Li [3] proposed two variants of the standard user-based and item-based methods by calculating user and item similarities based on TF-IDF weighted tag vectors. Ma et al. [7] designed a general framework by incorporating tags issued by users and tags associated with items into probability matrix factorization model, which connect user-item and user-tag (or item-tag) tagging information through the shared user latent feature space (or item latent feature space). Wu et al. [6] proposed a neighborhood-aware probabilistic matrix factorization regularized by user similarity and item similarity in the tag space. Tensor factorization approaches investigate the ternary association among users, items, and tags simultaneously to generate recommendations [10, 11, 12]. Unfortunately, they cannot be applied to our scenario where users only annotate tags to only a few items they have selected. They are unable to capture interactions without tag assignment, due to tensor decomposition algorithms require users, items, and tags to co-occur simultaneously. Besides, this kind of approach is generally expensive.

In addition, there are also works considering other types of explicit content information (e.g., reviews, attributes) to assist item recommendation. Both tags and reviewers are user-generated textual annotations on items. For instance, Zhang et al. proposed an explicit factor model to generate explainable recommendations, which first extracted explicit product features and user opinions, then generated recommendations according to the specific product features to the user’s interests and the hidden features learned [14]. Afterwards, Chen et al. introduced a tensor factorization algorithm to learn to rank user preferences on explicit features, and then combined it with collaborative filtering method to boost the performance of item recommendation [15]. However, tensor factorization for feature ranking is easily affected by the sparsity problem of explicit features.

Despite that encouraging performance of the previous methods, they still suffer from several issues. First, some mainly focus on explicit user-item interactions (i.e., real-valued ratings) other than implicit feedbacks, which is not practical in real-world scenarios where most people are unwilling to give ratings. Second, the rating-based methods may not suitable for the top- $n$ recommendation task. As top- $n$ personalized recommendation is an inherently ranking-oriented task, we care more about users’ relative preference on items in order rather than the absolute ratings, and a ranking-based approach can be more suitable. Besides, they still suffer from the problem of high dimensionality and sparsity of tagging information, which affects the accuracy of recommendation.

Based on the above analysis, we propose a novel tag-aware recommendation framework by incorporating tag mapping scheme into ranking-based collaborative filtering model, to boost ranking-oriented recommendation performance. We first build Bayesian personalized ranking optimization model with matrix factorization, by leveraging collaborative information to learn latent feature vectors of users and items. Then, we propose an explicit-to-implicit feature mapping scheme, to map the explicit tagging features (i.e., user-tag weighting matrix and item-tag weighting matrix) with high dimension and sparsity to low-dimensional and compact implicit features of uses and items. This could well serve as the regularization term of latent user and item features that learned from collaborative information. In order to further enhance recommendation performance, we introduce users’ neighbor information to regularize user latent feature based on manifold learning. It can alleviate the overfitting problem suffered by the matrix factorization model by leveraging the two kinds of regularization constraints. Experiments on real-world recommendation datasets show that the proposed recommendation method can outperform the baselines on several ranking-oriented recommendation evaluation metrics.

The remainder of this paper is organized as follows. Section 2 briefly reviews prior work on collaborative filtering recommendation with implicit feedbacks and tag-aware recommendation. Section 3 describes the details of our proposed model. Section 4 experimentally compares our method with competitive baselines, followed by the conclusions and future work in Section 5.

2. Related work

Two streams of researches are related to this article: collaborative filtering recommendation with implicit feedbacks and tag-aware recommendation.

2.1 Collaborative filtering recommendation with implicit feedbacks

With the ability to leveraging the wisdom of crowds, collaborative filtering techniques have achieved great success in personalized recommendation systems. For collaborative filtering recommendation with implicit feedbacks, both pointwise regression and pairwise ranking approaches are popular in the top- $n$ recommendation scenario. Matrix Factorization (MF) techniques also have gained great popularity for both approaches, which has been one of the most popular basic models.

Pointwise regression approaches [16, 17] take implicit feedbacks as absolute preferences (ratings), and it is a type of rating prediction-based method aiming at predicting ratings of unobserved items and performing personalized ranking indirectly. This kind of approaches reduce the collaborative filtering problem to pointwise regression (or recovery) via minimizing the pointwise square error loss between observed ratings and predicted ratings [16, 18]. Typically, Hu et al. [16] treated ratings of observed items as 1 and 0 otherwise, and exploited weighting regularized matrix factorization strategy to fit the ratings with varying confidential levels. Likewise, Pan et al. [19] formulated the collaborative filtering problems with implicit feedback as one-class collaborative filtering (OCCF) problem, then proposed two frameworks to tackle OCCF problems based on weighted low rank approximation and negative example sampling. Recently, He et al. [17] proposed to weight the missing data based on item popularity to enhance the prediction effectiveness. Nevertheless, the pointwise regression method has some weakness in view of ranking-oriented task, since there is no guarantee that the higher accuracy in rating prediction results in the better ranking results [19]. For instance, the truth ratings of two items are {0.8, 0.4}, and two kinds of predicted ratings are {0.4, 0.8} and {1.2, 0} respectively. Although they both have the same prediction accuracy with the same square errors, it leads to totally opposite ranking orders of items.

Pairwise rank approaches regard implicit feedbacks as relative preferences rather than absolute ones, and are directly optimized for ranking. It assumes that a user likely prefers an observed item to an unobserved item. As a matter of fact, the assumption of pairwise preferences over two items is the relaxation of the assumption of pointwise preferences. Empirically and experimentally, pairwise ranking methods are more competitive than pointwise approaches in real ranking-oriented recommendation scenarios [20]. Bayesian personalized ranking (BPR) proposed by Rendle et al. [21] was the first method with pairwise preference assumption to address the ranking-oriented recommendation problems. Afterwards, various researches based on BPR framework have been made, and attracted many attentions due to their good recommendation quality. On the one side, some related works suggested to improve via exploiting valuable information underlying the raw data, such as improving negative sampling strategy [22], exploiting information from neighbors [6, 23], learning pairwise preferences over user-groups [24] and item-sets [20], integrating pairwise ranking loss and pointwise regression loss [25], etc. On the other side, some researches are proposed to leverage auxiliary information to improve recommendation quality, such as social connections [26], contents of items [15], heterogeneous implicit feedbacks [27], etc.

2.2 Tag-aware collaborative filtering recommendation

There have been numbers of researches on tag-aware collaborative filtering recommendation in recent years. Most researches take tags as explicit features in describing user preferences and item properties. The tag-based features are then incorporated into traditional collaborative Filtering methods, including neighborhood-based and model-based methods, to help enhance recommendation performance.

Neighborhood-based CF methods recommend items to the target user in two ways: items similar to that the user has selected, items that similar users have selected. Tag-based features could be used to compute item similarity and user similarity, which can be well incorporated into neighborhood-based CF methods [3, 13]. For example, Zheng and Li [3] proposed two variants of the standard user-based and item-based methods by calculating user and item similarities based on TF-IDF weighted tag vectors. Zuo et al. [13] proposed a tag-aware personalized recommendation method by incorporating deep learning and user-based CF. It leverages a deep neural network to extract users’ in-depth features from user-generated tags, and then uses user deep features to find user neighbors and perform neighborhood-based CF recommendation.

Model-based CF methods could absorb tagging information in a nice way to assist model-based recommendation, e.g., matrix factorization [5, 6, 7], diffusion-based model [8] and random-walk-based algorithm [9]. Generally, model-based CF are relatively more popular than neighborhood-based CF methods. For instance, Zhen et al. [5] employed tag histories of users to regularize user latent feature vectors learned from probabilistic matrix factorization model, which is based on only user-item interactions. Ma et al. [7] design a general framework by incorporating tags issued by users and tags associated with items into probability matrix factorization model, which connect user-item and user-tag (or item-tag) tagging information through the shared user latent feature space (or item latent feature space). Wu et al. [6] proposed a neighborhood-aware probabilistic matrix factorization regularized by user similarity and item similarity in the tag space. Zhang [8] proposed a personalized recommendation algorithm via integrated diffusion on user-item-tag tripartite graphs, where the information is limited to propagate within user-item and item-tag bipartite graphs. However, the sparse problem also limits the performance of recommender systems both in the conventional user-item setting and in the context of social tagging systems. In case of this issue, Zhang et al. [9] proposed a random-walk-based algorithm to deal with the sparse problem in social tagging data, which captures the potential transitive associations between users and items through their interaction with tags. There are also approaches investigate the ternary association among users, items, and tags simultaneously by leveraging tensor factorization to generate recommendations [11, 12]. However, they are unable to capture interactions without tag assignment, since tensor decomposition algorithms require users, items, and tags to co-occur simultaneously. Indeed, in real recommendation scenarios, users commonly select specific items without assigning tags. There exists more user-item interaction information than user-item-tag tagging information. Thus, the tensor factorization methods cannot be directly applied to our task which aims to leverage tagging information to assist item recommendation.

In addition, some works also consider other types of explicit content information, such as reviews, to assist item recommendation. Both tags and reviewers are user-generated textual annotations on items, and they have something in common in assisting CF recommendation. For instance, Zhang et al. proposed an explicit factor model to generate explainable recommendations, which first extract explicit product features and user opinions, then generate recommendations according to the specific product features to the user’s interests and the hidden features learned [14]. Afterwards, Chen et al. introduced a tensor matrix factorization algorithm to learn to rank user preferences on explicit features, and then combines it with collaborative filtering method to boost the performance of item recommendation [15]. However, tensor factorization for feature ranking is easily affected by the sparsity problem of explicit features.

3. Proposed method

3.1 Problem definition

Let ${\cal U}=(1,\ldots,u\ldots,M)$ be the set of $M$ users, ${\cal I}=(1,\ldots,i\ldots,N)$ be the set of $N$ items, ${\cal T}=(1,\ldots,t\ldots,P)$ be the set of $P$ tags. Our goal is to recommend a personalized ranking list of items for each user $u$ , leveraging both collaborative information (i.e., implicit feedbacks) and tagging information.

Following matrix factorization, we denote users and item as $K$ -dimensional latent feature matrix $U\in\mathbb{R}^{M\times K}$ and $V\in\mathbb{R}^{N\times K}$ respectively, which share the same latent space. The symbols are described in Table 1.

3.2 Tag-aware recommendation model

3.2.1 Bayesian personalized ranking optimization criterion

As selecting the preferred items for each user is an inherently ranking-oriented task, we care more about users’ relative preference on items rather than the rating predictions on them, and that a ranking-based approach can be more suitable than the rating-based criterion. We introduce the current ranking-based optimization criterion, i.e., Bayesian personalized ranking (BPR) with matrix factorization, to perform item ranking.

BPR is derived by a Bayesian analysis of the ranking-based recommendation problem, which tries to find the correct personalized ranking for all items $i\in{\cal I}$ and maximizes the following posterior probability

$\displaystyle p(\Theta|\succ_{u})\propto p(\succ_{u}|\Theta)p(\Theta)$ (1)

where $p(\succ_{u}|\Theta)$ represents the user-specific likelihood function, $p(\Theta)$ is the prior probability function, and $\Theta$ denotes the parameter vector of matrix factorization. $\succ_{u}$ is the desired preference structure for users to items, e.g., user $u$ prefer the observed item $i$ over unobserved item $j$ , and we have the preference pair $i\succ_{u}j$ . In this model, all users are assumed to be independent with each other, and the ordering of each pair of items for a specific user is also presumed independent of the ordering of other pairs. Thus, the user-specific likelihood function $p(\succ_{u}|\Theta)$ can be reformulated as

$\displaystyle\prod\limits_{u\in{\cal U}}p(\succ_{u}|\Theta)=\prod\limits_{(u,i% ,j)\in D_{S}}p(i\succ j_{u}|\Theta)$ (2)

Here, $D_{S}$ represents the training data $D_{S}=\{(u,i,j)|(u,i)\in S\wedge j\in{\cal I}\backslash{\cal I}_{u}^{+}\}$ , where $S$ denotes the user-item interaction pairs, i.e., collaborative information. $p(i\succ_{u}j|\Theta)$ is the individual probability that user $u$ prefers item $i$ to item $j$ , which is commonly defined by

$\displaystyle p(i\succ_{u}j|\Theta)=\sigma(\hat{F}_{u,i,j}(\Theta))$ (3)

where $\sigma$ is the logistic sigmoid function, $\sigma(x)=1/(1+\exp(-x))$ . $\hat{F}_{u,i,j}(\Theta)$ is the real-valued function of model parameter $\Theta$ which captures the specific relationship between user $u$ , item $i$ , and item $j$ . In addition, for matrix factorization, the prior probability $p(\Theta)$ for the model parameter $\Theta=\{U,V\}$ is defined by a general spherical Gaussian distribution with zero mean.

Thus, we can formulate the maximum posterior estimator for $p(\Theta|\succ_{u})$ (Eq. (4)) to derive our generic optimization criterion for personalized ranking

$\displaystyle\mathop{\max}\limits_{\Theta}F=\ln p(\Theta|\succ_{u})=\ln p(% \succ_{u}|\Theta)p(\Theta)=\ln\prod\limits_{(u,i,j)\in D_{S}}p(i\succ j_{u}|% \Theta)p(\Theta)=\sum\limits_{(u,i,j)\in D_{S}}\ln\sigma(\hat{F}_{u,i,j}(% \Theta))+\ln p(\Theta)=\sum\limits_{(u,i,j)\in D_{S}}\ln\sigma(\hat{F}_{u,i,j}% (\Theta))-\lambda_{\Theta}||\Theta||^{2}$ (4)

We can also translate the above formulation into the minimization of the following loss function

$\displaystyle\mathop{\min}\limits_{\Theta=\{U,V\}}L=-\sum\limits_{(u,i,j)\in D% _{S}}\ln\sigma({\hat{F}_{u,i,j}})+\frac{\lambda}{2}\left(\sum\limits_{u\in{% \cal U}}||U_{u}||_{2}^{2}+\sum\limits_{i\in{\cal I}}||V_{i}||_{2}^{2}\right)$ (5)

where $\lambda$ is the regularization parameter of $\Theta=\{U,V\}$ .

Besides, the preference value $\hat{F}_{u,i,j}$ can also be decomposed as

$\displaystyle\hat{F}_{u,i,j}=\hat{F}_{u,i}-\hat{F}_{u,j}$ (6)

In terms of $\hat{F}_{u,i}$ , we apply the generic matrix factorization model to predict it

$\displaystyle\hat{F}_{ui}=U_{u}V_{i}^{T}$ (7)

where $U\in\mathbb{R}^{M\times K}$ and $V\in\mathbb{R}^{N\times K}$ represent the latent feature matrices of users and items respectively. $U_{u}$ denotes the latent feature vector of the specific user $u$ , and $V_{i}$ denotes that of item $i$ .

Thus, we rewrite the above formulation (Eq. (5)) as

$\displaystyle\mathop{\min}\limits_{\Theta=\{U,V\}}L=-\sum\limits_{(u,i)\in S}% \sum\limits_{j\in{\cal I}\backslash{\cal I}_{u}^{+}}\ln\sigma(U_{u}^{T}V_{i}-U% _{u}^{T}V_{j})+\frac{\lambda}{2}\left(\sum\limits_{u\in{\cal U}}||U_{u}||_{2}^% {2}+\sum\limits_{i\in{\cal I}}||V_{i}||_{2}^{2}\right)$ (8)

3.2.2 Regularization 1: Explicit-to-implicit feature mapping

From the tagging information, we derive two matrices: user-tag weighting matrix $A\in\mathbb{R}^{M\times P}$ and item-tag weighting matrix $B\in\mathbb{R}^{N\times P}$ . The non-zeros in $A$ and $B$ indicate the observed relations between users, items and explicit tags. However, the two matrices generally suffer from the problem of sparsity and high dimensionality. Therefore, how to integrate the information into BPR optimization model is a challenging task.

We assume that selecting an item is also influenced by the user’s underlying opinions over various item aspects (i.e., tags). Based on this assumption, we try to capture latent representations of users and items from explicit tagging information. We can build a feature embedding/mapping model over user-tag matrix $A$ and item-tag matrix $B$ to obtain latent features. To this end, we propose an explicit-to-implicit feature mapping scheme, to map $A$ and $B$ from high-dimensional and sparse explicit features to low-dimensional and compact implicit features via mapping matrix $W\in\mathbb{R}^{P\times K}$ . This kind of treatment could well serve as the regularization term of latent user and item features that learned from collaborative information by BPR. In addition, it helps alleviate the overfitting problem suffered by matrix factorization model in a nice way. The regularization term $R_{1}$ with explicit-to-implicit feature mapping scheme is formulated as

$\displaystyle\mathop{\min}\limits_{\theta=\{U,V,W\}}R_{1}=\sum_{u\in{\cal U}}|% |A_{u}W-U_{u}||_{2}^{2}+\sum_{i\in{\cal I}}||B_{i}W-V_{i}||_{2}^{2}$ (9)

In this way, by integrating regularized feature mapping term (Eq. (8)) with BPR optimization model (Eq. (8)), we get a more accurate recommendation model

$\displaystyle\min\limits_{\theta=\{U,V,W\}}L=-\sum\limits_{(u,i)\in S}\sum% \limits_{j\in{\cal I}\backslash{\cal I}_{u}^{+}}\ln\sigma(U_{u}^{T}V_{i}-U_{u}% ^{T}V_{j}){}+\frac{\lambda}{2}\left(\sum\limits_{u\in{\cal U}}||U_{u}||_{2}^{2% }+\sum\limits_{i\in{\cal I}}||V_{i}||_{2}^{2}+\sum\limits_{t\in{\cal T}}||W_{t% }||_{2}^{2}\right){}+\frac{\beta}{2}\left(\sum\limits_{u\in{\cal U}}||A_{u}W-U% _{u}||_{2}^{2}+\sum\limits_{i\in{\cal I}}||B_{i}W-V_{i}||_{2}^{2}\right)$ (10)

where $\beta$ is the regularization coefficient to control the importance of tagging information.

However, the recommendation model above assumes that users are independent with each other, it ignores the fact that users’ selection action is easily influenced by other users that have similar interests. If we integrate the user relationship information into the above model, it would help further enhance recommendation performance.

3.2.3 Regularization 2: User neighbor relationship

To further alleviate the overfitting problem and enhance recommendation performance, we introduce user graph regularization to the original objective function. Graph regularization has been wildly used in dimensionality reduction, clustering and semi-supervised learning [28]. The key assumption of user graph regularization is that if two users $u$ and $v$ are similar, the latent feature vector $U_{u}$ and $U_{v}$ discovered by matrix factorization procedure in this paper are also close to each other.

Inspired by [29], which proposed a similarity-based social regularization term that makes an assumption that every user’s preference is close to the average taste of this user’s friends. In the meanwhile, the similarity-based social regularization term treats all friends differently based on how similar they are. However, we have no explicit social friends in our case, we could just obtain the nearest neighbors of target users according to their tagging history based on similarity measurement.

In real implementation, to reduce the computation complexity, we only consider the top $N(u)=$ 10 neighbor users of each target user $u$ . We try to make the latent feature vector $U_{u}$ of user $u$ is close to the weighted average of that of neighbor users, that is $\sum_{v\in N(u)}\textit{sim}(u,v)U_{v}/\sum_{v\in N(u)}\textit{sim}(u,v)$ . Accordingly, we construct the regularization term as

$\displaystyle\mathop{\min}\limits_{\theta=\{U,V,W\}}R_{2}=\sum\limits_{u\in{% \cal U}}||U_{u}-\sum\limits_{v\in N(u)}\textit{sim}(u,v)U_{v}\left/\phantom{% \frac{0}{0}}\right.\!\!\!\!\!\!\sum\limits_{v\in N(u)}\textit{sim}(u,v)||$ (11)

where $\textit{sim}(u,v)$ represents the similarity between users $u$ and $v$ .

The similarity $\textit{sim}(u,v)$ between the target user $u$ and user $v$ is measured by

$\displaystyle\textit{sim}(u,v)=\frac{A_{u}A_{v}^{T}}{||A_{u}||\cdot||A_{v}||}$ (12)

where $A_{u}$ , $A_{v}$ denote the user-tag weight vector of user $u$ and $v$ .

3.2.4 Unified recommendation model

Integrating the Bayesian personalized ranking optimization objective with two regularization constraints based on explicit-to-implicit feature mapping and the user neighbors respectively, we obtain a unified tag-aware recommendation model

$\displaystyle\mathop{\min}\limits_{\Theta=\{U,V,W\}}L=-\sum\limits_{(u,i)\in S% }\sum\limits_{j\in{\cal I}\backslash{\cal I}_{u}^{+}}\ln\sigma(U_{u}^{T}V_{i}-% U_{u}^{T}V_{j}){}+\frac{\lambda}{2}\left(\sum\limits_{u\in{\cal U}}||U_{u}||_{% 2}^{2}+\sum\limits_{i\in{\cal I}}||V_{i}||_{2}^{2}+\sum\limits_{t\in{\cal T}}|% |W_{t}||_{2}^{2}\right){}+\frac{\beta}{2}\left(\sum\limits_{u\in{\cal U}}||A_{% u}W-U_{u}||_{2}^{2}+\sum\limits_{i\in{\cal I}}||B_{i}W-V_{i}||_{2}^{2}\right){% }+\frac{\gamma}{2}\left(\sum\limits_{u\in{\cal U}}||U_{u}-\sum\limits_{v\in N(% u)}\textit{sim}(u,v)U_{v}\left/\sum\limits_{v\in N(u)}\textit{sim}(u,v)\right.% ||\right)$ (13)

where $\beta$ and $\gamma$ are the regularization coefficients to control the importance of tagging information and user neighbor relationship respectively.

We simply call our integrated model BPR-T. BPR-T helps us find the more compact and informative latent features of users and items.

3.3 Model optimization

To learn the parameters $\Theta=\{U,V,W\}$ in Eq. (11), we design an alternative optimization algorithm, which uses stochastic gradient descent (SGD) with uniformly drawn training triples $(u,i,j)$ to learn the latent vectors $U_{u}$ , $V_{i}$ , $V_{j}$ and implements matrix decomposition to learn the mapping matrix $W$ .

In each epoch, on one hand, when we are updating the latent feature matrix ( $U_{u}$ , $V_{i}$ , $V_{j}$ ), we set the mapping matrix $W$ to be a constant, and update latent feature matrix via SGD

$\displaystyle\delta=-\frac{\partial\ln({\sigma({\hat{F}_{u,{i},j}})})}{% \partial({\hat{F}_{u,{i},j}})}=-({1-\sigma({\hat{F}_{u,\mbox{i},j}})})$ (14) $\displaystyle\frac{\partial L}{\partial U_{u}}=\delta(V_{i}-V_{j})+\lambda U_{% u}+\beta(U_{u}-A_{u}W)$ (15) $\displaystyle\quad{}+\gamma\left(U_{u}-\sum\limits_{v\in N(u)}\textit{sim}(u,v% )U_{v}\left/\sum\limits_{v\in N(u)}\textit{sim}(u,v)\right.\right)$ $\displaystyle U_{u}\leftarrow U_{u}-\eta\frac{\partial L}{\partial U_{u}}$ (16) $\displaystyle\frac{\partial L}{\partial V_{i}}=\delta U_{u}+\lambda V_{i}+% \beta(V_{i}-B_{i}W)$ (17) $\displaystyle V_{i}\leftarrow V_{i}-\eta\frac{\partial L}{\partial V_{i}}$ (18) $\displaystyle\frac{\partial L}{\partial V_{j}}=\delta(-U_{u})+\lambda V_{j}+% \beta(V_{j}-B_{j}W)$ (19) $\displaystyle V_{j}\leftarrow V_{j}-\eta\frac{\partial L}{\partial V_{j}}$ (20)

On the other hand, given the latent feature matrix, the optimization objective w.r.t mapping matrix $W$ is equal to $R_{1}=$ 0, that is

$\displaystyle\frac{\lambda}{2}||W||_{2}^{2}+\frac{\beta}{2}\left(\sum_{u\in{% \cal U}}||A_{u}W-U_{u}||_{2}^{2}+\sum_{i\in{\cal I}}||B_{i}W-V_{i}||_{2}^{2}% \right)=0$ (21)

The updating rule for $W$ can be derived as

$\displaystyle W=(\beta A^{T}A+\beta B^{T}B+\lambda{\rm I})^{-1}(\beta A^{T}U+% \beta B^{T}V)$ (22)

In the practical training process, for each observed pair $(u,i)$ of the training set, we generally draw negative item $j$ from ${\cal I}\backslash{\cal I}_{u}^{\textit{train}}$ uniformly, and construct the item preference $(u,i,j)$ . Then, parameter learning is performed. It is worth to say that, according to [22], one epoch includes $|S|$ many single BPR updates (i.e., one update performs with a triple $(u,i,j)$ ), where $|S|$ is the size of training dataset. The training details are shown in Algorithm 1.

Algorithm 1: BPR-T
1. Input: Training dataset $S$ , user-tag matrix $A$ , item-tag matrix $B$ , regularization parameter $\lambda,\beta,\gamma$ , learning rate $\eta$ , maximum iterations $\kappa$
2. Output: Parameter $\Theta=\{U,V,W\}$
3. Initialize $\Theta$
4. repeat
5. Draw user-item pair $(u,i)$ from $S$ uniformly
6. Draw negative item $j$ from ${\cal I}\backslash{\cal I}_{u}^{\textit{train}}$ uniformly
7. Calculate gradient $\frac{\partial L}{\partial U_{u}}$ using Eq. (3.3)
8. Update $U_{u}$ using Eq. (16)
9. Calculate gradient $\frac{\partial L}{\partial V_{i}}$ using Eq. (17)
10. Update $V_{i}$ using Eq. (18)
11. Calculate gradient $\frac{\partial L}{\partial V_{j}}$ using Eq. (19)
12. Update $V_{j}$ using Eq. (20)
13. Update $W$ using Eq. (21)
14. until convergence
15. return $\Theta=\{U,V,W\}$

After estimating the parameters $\Theta=\{U,V,W\}$ , we leverage matrix factorization (Eq. (7)) to compute the final preference score for a specific user $u$ on a specific item $i$ , and get a ranking list of items to perform recommendation.

4. Experimental results and analysis

4.1 Datasets

To evaluate our recommendation method with implicit feedbacks and tagging information, we perform experiments on two real-world datasets: Lastfm6

⁶
https://grouplens.org/datasets/hetrec-2011.

and Citeulike [30].

Lastfm. Lastfm is the world’s largest online music catalogue, and allows user tagging music tracks and artists. In the dataset, we take artists as items. This dataset contains a subset of 92834 user-artist listening information from 1892 users on 17632 artists from Last.fm online music system. For the purpose of our experiments, we keep users with at least 10 observations and get 62376 observations from 1797 users and 1507 artists [19]. In consideration of computation complexity, we select the top 2000 tags for training.

Citeulike. CiteUlike is a scientific article sharing service where users create personal libraries by posting the articles they like. Each article has information such as title, abstract, authors, publications and keywords. As mentioned before, the content information we used contains the title and abstract. The subset we used contains 5551 users and 16980 articles with 204,986 observed user-item pairs, which is very sparse. For the purpose of our experiments, we keep users with at least 15 observations and get 73788 observations from 2029 users and 3601 articles.

In the comparison, we randomly select 80% of the ratings from the Lastfm and Citeulike as the training data, and leave the remaining 20% as prediction performance testing. We repeat the random splits 5 times to get the average result. The best set of model parameters are tuned by cross-validation over the training set.

4.2 Evaluation metrics

We study the recommendation performance on various commonly used evaluation metrics, including Pre@ $n$ (Precision at the cut-off of $n$ ), MAP@ $n$ (Mean Average Precision at the cut-off of $n$ ), NDCG@ $n$ (Normalized Discounted Cumulative Gain), ARP (Average Relative Position) [20], AUC (Area Under the ROC Curve) [20]. Pre@ $n$ only cares the precision of top n recommendation results, ignoring the importance of ranking. MAP@ $n$ , NDCG@ $n$ , ARP, and AUC pay more attention to the ranking positions of recommended items.

We also define some notations. For item ranking list $l$ , $l(k)$ represents the item located at position $k$ . For each item $i$ , we can also have its position $p_{ui}(1\leqslant p_{ui}\leqslant N)$ . ${\cal U}^{\textit{test}}$ denotes all the users in the test dataset. ${\cal I}_{u}^{\textit{train}}$ denotes the observed item set in the training dataset. ${\cal I}_{u}^{\textit{test}}$ denotes the observed item set in the test dataset.

1.
Pre@ $n$ . It is the average of precision over all users in the test set. $\text{Pre@}n=\frac{1}{|{\cal U}^{\textit{test}}|}\sum_{u\in{\cal U}^{\textit{% test}}}\text{Pre}_{u}(n)$ , where $\text{Pre}_{u}(n)$ is the precision (fractions of retrieved items that are preferred by a specific user $u$ of a cut-off ( $n$ ) rank list. It is defined as $\text{Pre}_{u}(n)=\frac{1}{n}\sum_{k=1}^{n}\delta(l(k)\in I_{u}^{\textit{test}})$ , where $\delta(\cdot)$ is the indicator function and it will return 1 if the item at position $k$ is preferred or 0 otherwise.
2.
MAP@ $n$ . It computes the mean of average precision (AP) with a cut-off $n$ over all users in the test set. $\textit{MAP}=\frac{1}{|{\cal U}^{\textit{test}}|}\sum_{u\in{\cal U}^{\textit{% test}}}AP_{u}\text{@}n$ where $AP_{u}$ is the average of precisions computed at all positions with a preferred item for a specific user $u$ . $AP_{u}$ is defined by $AP_{u}\text{@}n=\frac{1}{\sum_{k=1}^{n}\delta(l(k)\in I_{u}^{\textit{test}})}% \sum_{k=1}^{n}\delta(l(k)\in I_{u}^{\textit{test}})\cdot\text{Pre@}k$ , where $k$ is the position in the rank list $l$ . $\text{Pre}(k)$ is the precision (fractions of retrieved items that are preferred by the user) of a rank list from 1 to $k$ . Specifically, $\text{Prec}(k)=\frac{1}{k}\sum_{s=1}^{k}\delta(l(s)\in I_{u}^{\textit{test}})$ .
3.
NDCG@ $n$ . It penalizes relevant items appearing lower position in a result list. It computes the average of $\textit{NDCG}_{u}$ with a cut-off $n$ over all users in the test set. $\textit{NDCG}\text{@}n=\frac{1}{|{\cal U}^{\textit{test}}|}\linebreak\sum_{u% \in{\cal U}^{\textit{test}}}\textit{NDCG}_{u}\text{@}n$ where $\textit{NDCG}_{u}\text{@}n=\frac{1}{Z_{u}}\textit{DCG}_{u}\text{@}n$ , with $\textit{DCG}_{u}\text{@}n=\sum\nolimits_{k=1}^{n}\frac{2^{\delta(l(k)\in I_{u}% ^{\textit{test}})}-1}{\log(k+1)}$ and $Z_{u}$ is the best score of $\textit{DCG}_{u}\text{@}n$ .
4.
ARP. ARP is defined as the average relative position (RP) over all users in the test dataset, which is defined as $\textit{ARP}=\frac{1}{|{\cal U}^{\textit{test}}|}\sum_{u\in{\cal U}^{\textit{% test}}}RP_{u}$ . $RP_{u}$ is the relative position of user $u$ defined as $RP_{u}=\frac{1}{|{\cal I}_{u}^{\textit{test}}|}\sum_{i\in{\cal I}_{u}^{\textit% {test}}}\frac{p_{ui}}{|{\cal I}|-|{\cal I}_{u}^{\textit{train}}|}$ , where $\frac{p_{ui}}{|{\cal I}|-|{\cal I}_{u}^{\textit{train}}|}$ is the relative position of item $i$ . A lower value of the ARP indicates a better quality.
5.
AUC. AUC is defined as the average $\textit{AUC}_{u}$ over all the users $\textit{AUC}=\frac{1}{M}\sum_{u=1}^{M}\textit{AUC}_{u}$ . $\textit{AUC}_{u}=\frac{1}{|T_{u}|}\sum_{(i,j)\in T_{u}}\delta(\hat{R}_{ui}>% \hat{R}_{uj})$ , where the evaluation item pairs respect with user $u$ is defined as $T_{u}=\{(i,j)|i\in{\cal I}_{u}^{\textit{test}}\wedge j\notin({\cal I}_{u}^{% \textit{train}}\cup{\cal I}_{u}^{\textit{test}})\}$ . A higher value of the AUC indicates a better quality. The trivial AUC of a random guess method is 0:5 and the best achievable quality is 1.

4.3 Baseline methods

We compare the proposed BPR-T (Algorithm 1) with five baselines, which are introduced as below.

1.
Pop [31]. Pop ranks items via popularity of the items. The more visiting users the item has, the higher position in the recommendation list. Note that it is a non-personalized recommendation approach: for any target user, the recommendations are always the same.
2.
WRMF [19]. WRMF (weighted regularized matrix factorization) employs a pointwise rating-based assumption for solving recommendation problems with implicit feedbacks. It leverages flexible weighting scheme on unobserved data to tackle the one-class collaborative filtering problem.
3.
BPR [21]. BPR (Bayesian personalized ranking) employs a pairwise ranking-based assumption for solving recommendation problems with implicit feedbacks. BPR represents the state-of-the-art optimization framework of collaborative filtering for binary relevance data [21]. Matrix factorization is the common choice of representing user’s preference to items. In addition, we employ the uniform sampling in model learning.
4.
NHPMF [6]. NHPMF (neighborhood-aware probabilistic matrix Factorization) explores auxiliary tagging data to regularize latent user feature and item feature to achieve more accurate recommendations. NHPMF leverages the tagging data to select neighbors of each user and each item, and incorporates the neighborhood information into the probabilistic matrix factorization model of the explicit ratings, to ensure similar users (items) will have similar latent features.
5.
LRPPM-CF [15]. LRPPM-CF introduces a tensor matrix factorization algorithm to learn to rank user preferences on items’ explicit features, and then combines it with collaborative filtering method (rating-based optimization objective via matrix factorization) to boost the performance of recommendation.

For fair comparison, we use the same initializations for the model variables $U\sim N(0,0.01)$ , $V\sim N(0,0.01)$ . It is worth to say that, according to [22], one epoch includes $|\text{S}|$ many single BPR updates, where $|\text{S}|$ is the size of training dataset. For the dimension of latent feature matrix $U$ and $V$ , to get the tradeoff between model accuracy and complexity, we fix $d=$ 20 uniformly. For the learning rate, we uniformly fix the learning rate as $\eta=$ 0.01. For the regularization parameters $\lambda,\beta,\gamma$ , we fix $\lambda$ and search the best values of $\beta,\gamma$ via grid search. We will give the detail of parameter influence to the recommendation result, and select the optimal parameters for the final comparisons.
4.4 Influence of regularization parameters

We analyze how the two regularization parameters $\beta$ and $\gamma$ affect the final recommendation performance when other parameters are fixed. Specifically, parameter $\beta$ determines the importance of the auxiliary tagging information. Parameter $\gamma$ controls the importance of the neighbor-based user relationship. The influences of different $\beta$ and $\gamma$ on Lastfm dataset and Citeulike dataset are shown in Figs 1 and 2 respectively.

Figure 1.

Effect of parameters $\beta$ and $\gamma$ on (a) Pre@5, (b) MAP@5, (c) NDCG@5, (d) MRR for Lastfm dataset.

Figure 2.

Effect of parameters $\beta$ and $\gamma$ on (a) Pre@5, (b) MAP@5, (c) NDCG@5 for Citeulike dataset.

For Lastfm dataset, we set the value of $\beta$ is in the range of {0.00001, 0.0001, 0.001, 0.01, 0.1} and that of $\gamma$ is in the range of {0.0001, 0.001, 0.01, 0.1}. From Fig. 1 we can observe that, the values of Pre@5, MAP@5 and NDCG@5 vary with the change of $\beta$ and $\gamma$ . The optimal values locate at ( $\beta$ , $\gamma$ ) $=$ (0.01, 0.01) in terms of the evaluation metrics. Specifically, fixing $\gamma$ , with the increase of $\beta$ , the values of Pre@5, MAP@5 and NDCG@5 increase at first and then gradually decrease. Similarly, fixing $\beta$ , with the increase of $\gamma$ , recommendation performance increase at first and then gradually decrease. But, the amplitude of variation with the vary of $\beta$ is somewhat smaller than that of $\gamma$ , which indicates that BPR-T is slightly more sensitive to user neighbor relationship than tag mapping. This result demonstrates that both parameters play important roles in determining recommendation performance. Both the tagging information and neighbor information contribute to the recommendation results comparatively. When $\beta$ and $\gamma$ both reduce to zeros, BPR-T reduces to BPR, and the evaluated values are much lower than that of BPR-T. We can thus see the advantage of our model, which takes full advantage of the useful information hidden in both tagging information and user relationship information.

For Citeulike dataset, we set the value of $\beta$ is in the range of {0.0001, 0.001, 0.01, 0.1, 1} and that of $\gamma$ is in the range of {0.0001, 0.001, 0.01, 0.1}. As is observed from Fig. 2, the values of Pre@5, MAP@5 and NDCG@5 vary with the change of $\beta$ and $\gamma$ , and obtain the optimal values at ( $\beta$ , $\gamma$ ) $=$ (0.01, 0.01). The results also indicate the importance of both tagging and user relationship information, and setting the suitable regularization parameters of $\beta$ and $\gamma$ is critical for recommendation performance of our model. Specifically, fixing $\gamma$ , with the increase of $\beta$ , the values of Pre@5, MAP@5 and NDCG@5 increase at first and then gradually decrease. Similarly, fixing $\beta$ , recommendation performance presents the similar trend with the increase of $\gamma$ . The rangeability with the vary of $\beta$ is larger than that of $\gamma$ , which indicates that BPR-T is slightly more sensitive to tag mapping than user neighbor relationship.

Therefore, in our experiments, we set $\beta=$ 0.01 and $\gamma=$ 0.01 for the Lastfm dataset, and $\beta=$ 0.01 and $\gamma=$ 0.01 for the Citeulike dataset.

4.5 Personalized recommendation results and analysis

We compare our model BPR-T with the mentioned baselines on the Lastfm and Citeulike datasets on several evaluation metrics, including Pre@ $n$ , MAP@ $n$ , NDCG@ $n$ , AUC, and ARP.

We firstly study how the Pre@ $n$ , MAP@ $n$ , NDCG@ $n$ vary with the size of cutoff (i.e., top $n$ ) in the recommendation lists. Since in the real applications, users pay more attention to the less top $n$ items, we set the values of $n$ as 3, 5, 7, 10, respectively. The top- $n$ recommendation performance is shown in Fig. 3 for Lastfm dataset and Fig. 4 for Citeulike dataset. For the above metrics, the larger value means the better recommendation performance.

Figure 3.

Top- $n$ recommendation performance comparisons on Lastfm dataset in terms of several metrics: (a) Pre, (b) MAP, (c) NDCG.

Figure 4.

Top- $n$ recommendation performance comparisons on Lastfm dataset in terms of several metrics: (a) Pre, (b) MAP, (c) NDCG.

We also give the recommendation result in terms of the other two ranking-oriented evaluation metrics: AUC and ARP, which is independent of the cutoff n of ranking list(see Table 1). In view of AUC, the larger value means the better recommendation ability. For ARP, the smaller value indicates the better recommendation performance. Bold-font indicates the best results.

Table 1

Performance comparisons in terms of ARP and AUC on Lastfm and Citeulike datasets

Datasets	Methods	AUC $\uparrow$	ARP $\downarrow$
Lastfm	Pop	0.8845	0.1023
	WRMF	0.9172	0.0835
	BPR	0.9213	0.0793
	NHPMF	0.9235	0.0800
	LRPPM-CF	0.9255	0.0782
	BPR-T	0.9300	0.0709
Citeulike	Pop	0.8985	0.1022
	WRMF	0.9147	0.0856
	BPR	0.9219	0.0784
	NHPMF	0.9256	0.0812
	LRPPM-CF	0.9288	0.0723
	BPR-T	0.9352	0.0654

From Figs 3 and 4 and Table 1, we can have the following observations,

Our method BPR-T performs best among baselines in terms of top- $n$ evaluation metrics (Pre@ $n$ ) and ranking-oriented evaluation metrics (MAP@ $n$ , NDCG@ $n$ , AUC, ARP) on both datasets. The results clearly demonstrate the advantage of our method in personalized recommendation task.

BPR-T beats LRPPM-CF and NHPMF. The results, on one hand, mean that in terms of the datasets in this paper the ranking-based recommendation methods are more qualified than rating-based methods for top- $n$ recommendation with implicit feedbacks. As top- $n$ personalized recommendation is an inherently ranking-oriented task, users care more about users’ relative preference on items rather than the rating predictions on them, and a ranking-based approach can be more suitable for item recommendation with implicit feedbacks. On the other hand, our method BPR-T takes full consideration of both tagging information and user relationship information, which contribute greatly for performance enhancement. In addition, we also argue that LRPPM-CF is easily affected by the noisy problem of tag recommendation and ranking, and NHPMF is influenced by the problem of sparsity and high dimensionality of tagging information when calculating the similarity between users (items) leveraging tagging information.

The methods including BPR-T, LRPPM-CF and NHPMF perform better than BPR, WRMF and Pop, this is due to that the latter three methods only use the implicit feedbacks to make recommendation, while the former methods leverage both tagging information and implicit feedbacks to perform recommendation. The comparison results indicate the significance of integrating tagging information with traditional collaborative information, since tagging information provides explicit content-based representation of users and items.

In view of all the compared methods, we can see that all methods beat Pop method, which is a non-personalized recommendation strategy. This result demonstrates the effectiveness of personalized recommendation methods, including WRMF, BPR, NHPMF, LRPPM-CF, and our BPR-T.

In view of all the personalized recommendation methods, we can see that all methods beat WRMF, which shows the effectiveness of the pairwise preference assumptions of BPR, BPR-T, and the effectiveness of leveraging extra tagging information in NHPMF, LRPPM-CF in our case. Though, all the baselines still perform worse than our proposed method BPR-T.

From the comparative results, we also observe that the evaluation values of generated by all the methods on Lastfm dataset are much higher than those on Citeulike dataset, it is because that the Citeulike dataset is sparser than the Lastfm dataset.

5. Conclusions

This paper presents a new tag-aware recommendation algorithm which integrates tagging information with collaborative information to enhance ranking-oriented recommendation performance. Based on the popular ranking-based recommendation model (i.e., BPR) with matrix factorization, we employ two critical regularization constraints by leveraging explicit-to-implicit feature mapping and user neighbor relationship, to help alleviate the overfitting problem and enhance ranking-oriented recommendation performance. Evaluation on two real-world datasets shows that our proposed method significantly outperformed several competitive baselines on several recommendation measure metrics.

For future works, we are mainly interested in extending our method in two aspects, (1) studying how to leverage tag recommendation strategy to further alleviate the sparsity problem of tagging information, (2) further researching the explanation of recommendation results in a clearer way.

Footnotes

Acknowledgments

This work is supported by the Natural Science Foundation of China under Grant 61371196. The authors would like to thank the anonymous reviewers for their constructive comments.

References

Chen

Kan

M.-Y.

and Chen

, TriRank: review-aware explainable recommendation by modeling aspects, in: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, Australia, 2015.

Guo

and Zhao

, Tag-based social interest discovery, in: Proceedings of the 17th International Conference on World Wide Web, 2008, pp. 675–684.

Zheng

and Li

Q.D.

, A recommender system based on tag and time information for social tagging systems, Expert Systems with Applications 38(4) (2011), 4575–4587.

Zeng

and Li

, How useful are tags? An empirical analysis of collaborative tagging for web page recommendation, in: Proceedings of the International Workshops on Intelligence and Security Informatics, 2008, pp. 320–330.

Zhen

and Yeung

D.-Y.

, Tagicofi: tag informed collaborative filtering, in: Proceedings of the 3rd ACM International Conference on Recommender Systems, 2009, pp. 69–76.

Chen

Liu

Bao

and Zhang

, Leveraging tagging for neighborhood-aware probabilistic matrix factorization, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, October 29–November 2, 2012, pp. 1854–1858.

Zhou

T.C.

Lyu

M.R.

and King

, Improving recommender systems by incorporating social contextual information, ACM Transactions on Information Systems 29(2) (2011), 9:1–9:23.

Zhang

Z.-K.

Zhou

and Zhang

Y.-C.

, Personalized recommendation via integrated diffusion on user-item-tag tripartite graphs, Physica A: Statistical Mechanics and its Applications 389(1) (2010), 179–186.

Zhang

Zeng

D.D.

Abbasi

Peng

and Zheng

, A random walk model for item recommendation in social tagging systems, ACM Transactions on Management Information Systems 4(2) (2013), 1–24.

10.

Peng

Zeng

Zhao

and Wang

F.-y.

, Collaborative filtering in social tagging systems based on joint item-tag recommendations, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, Ontario, Canada, 2010, pp. 809–818.

11.

Ifada

and Nayak

, Tensor-based Item Recommendation using Probabilistic Ranking in Social Tagging Systems, in: Proceedings of the 23th International Conference on World Wide Web, Seoul, Korea, 2014, pp. 805–810.

12.

Rafailidis

and Daras

, The TFC Model: tensor factorization and tag clustering for item recommendation in social tagging systems, IEEE Transactions on Systems Man and Cybernetics – Part A Systems and Humans (2013), 1–25.

13.

Zuo

Zeng

Gong

and Jiao

, Tag-aware recommender systems based on deep neural networks, Neurocomputing 204 (2016), 51–60.

14.

Zhang

Lai

Zhang

Liu

and Ma

, Explicit factor models for explainable recommendation based on phrase-level sentiment analysis: ACM, 2014, 83–92.

15.

Chen

Zhang

and Qin

, Learning to rank features for recommendation over multiple categories, in: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016.

16.

Koren

and Volinsky

, Collaborative filtering for implicit feedback datasets, in: Proceedings of the 8th International Conference on Data Mining, 2008.

17.

Zhang

Kan

M.-Y.

and Chua

T.-S.

, Fast matrix factorization for online recommendation with implicit feedback, in: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016, pp. 549–558.

18.

and Li

, Robust ranking algorithms for one-class collaborative filtering, Acta Automatica Sinica 41(2) (2015), 405–418.

19.

Pan

Zhou

Cao

Liu

N.N.

Lukose

Scholz

et al., One-class collaborative filtering, in: Proceeding of International Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008.

20.

Pan

and Chen

, CoFiSet: collaborative filtering via learning pairwise preferences over item-sets, in: Proceedings of the 6th ACM International Conference on Web Search and Data Mining, 2013.

21.

Rendle

Freudenthaler

Gantner

and Schmidt-Thieme

, BPR: Bayesian personalized ranking from implicit feedback, in: Proceedings of the 25th Conferenceon Uncertainty in Artificial Intelligence, 2009, pp. 452–461.

22.

Rendle

and Freudenthaler

, Improving pairwise learning for item recommendation from implicit feedback, in: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, 2014, pp. 273–282.

23.

Hong

and Zhu

, Point-of-interest recommendations: learning potential check-ins from friends, in: Proceedings of the 22nd International ACM SIGKDD Conference on Knowledge Discovery and Data Mining August, 2016, pp. 13–17.

24.

Pan

and Chen

, GBPR: group preference based Bayesian personalized ranking for one-class collaborative filtering, in: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, 2013, pp. 2691–2697.

25.

Zhao

and Guo

, Improving Top-N Recommendation with Heterogeneous Loss, in: Proceedings of the 25th International Joint Conference on Artificial Intelligence, 2016.

26.

Zhao

McAuley

and King

, Leveraging social connections to improve personalized ranking for collaborative filtering, in: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, 2014, pp. 261–270.

27.

Tang

Long

Chen

B.-C.

and Agarwal

, An empirical study on recommendation with multiple types of feedback, in: Proceedings of the 22nd International ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2016.

28.

and Shen

Y.D.

, User graph regularized pairwise matrix factorization for item recommendation, in: Proceedings of the 7th International Conference on Advanced Data Mining and Applications, 2011.

29.

Zhou

Liu

Lyu

M.R.

and King

, Recommender systems with social regularization, in: Forth International Conference on Web Search & Web Data Mining, 2011, pp. 287–296.

30.

Wang

and Blei

D.M.

, Collaborative topic modeling for recommending scientific articles, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, USA, ACM, 2011, pp. 448–456.

31.

Shi

Karatzoglou

Baltrunas

Larson

Oliver

and Hanjalic

, Climf: learning to maximize reciprocal rank with collaborative less-is-more filtering, in: Proceedings of the 6th ACM International Conference on Recommender Systems, 2012, pp. 139–146.

Tag-aware recommendation based on Bayesian personalized ranking and feature mapping

Abstract

Keywords

1. Introduction

1 http://www.delicious.com/.

2.1 Collaborative filtering recommendation with implicit feedbacks

2.2 Tag-aware collaborative filtering recommendation

3. Proposed method

3.1 Problem definition

3.2 Tag-aware recommendation model

3.2.1 Bayesian personalized ranking optimization criterion

4.1 Datasets

6 https://grouplens.org/datasets/hetrec-2011.

Footnotes

Acknowledgments

References

¹
http://www.delicious.com/.

⁶
https://grouplens.org/datasets/hetrec-2011.