Pairwise deep learning to rank for top-N recommendation

Abstract

Recent researches indicate that pairwise learning to rank methods could achieve high performance in dealing with data sparsity and long tail distribution in item recommendation, although suffering from problems such as high computational complexity and insufficient samples, which may cause low convergence and inaccuracy. To further improve the performance in computational capability and recommendation accuracy, in this article, a novel deep neural network based recommender architecture referred to as PDLR is proposed, in which the item corpus will be partitioned into two collections of positive instances and negative items respectively, and pairwise comparison will be performed between the positive instances and negative samples to learn the preference degree for each user. With the powerful capability of neural network, PDLR could capture rich interactions between each user and items as well as the intricate relations between items. As a result, PDLR could minimize the ranking loss, and achieve significant improvement in ranking accuracy. In practice, experimental results over four real world datasets also demonstrate the superiority of PDLR in contrast to state-of-the-art recommender approaches, in terms of Rec@N, Prec@N, AUC and NDCG@N.

Keywords

Pairwise comparison neural network learning to rank item recommendation

1 Introduction

Currently, Recommender Systems (RS) have been ubiquitously applied in various types of applications, and achieved tremendous success [1], such as online business, social network and so on. According to statistics, RS could affect as much as 35% sales on Amazon and 80% movie watching in Netflix [2]. In general, RS is effective and efficient in information retrieval and filtering, which aim to help users to discover the most valuable information from a large number of choices. Moreover, it’s promising to introduce deep learning methods into recommender systems, although still suffering from such limited samples and computational sources, which could lead to low performance.

As we know, the well-known Probabilistic Matrix Factorization (PMF) only involve the observed numerical ratings assigned by users over items for low-rank feature vectors learning [3]. More precisely, PMF just focuses on rating prediction with the products of the learned vectors, finally, PMF return top-N recommendation list to the user according to the ranked predicted ratings. However, there’s no consideration for some additional auxiliary information in traditional PMF, moreover, in practice, it’s always infeasible to collect the explicit feedback, such as numerical ratings. In practice, it’s already demonstrated that accurate rating prediction couldn’t guarantee high performance in top-N recommendation. Fortunately, the Learning to Rank (LtR) method could provide an alternative solution to address these problems, which could overcome the data sparsity, learn each user’s interests and preference, and provide high performance recommendation with implicit feedback.

In general, there are three types of learning to rank algorithms: pointwise, pairwise [4, 5] and listwise [6]. Actually, the rating based collaborative filtering is just belong to the pointwise learning to rank method, in which the predicted ratings will be regarded as the preference degree for the user to the items [3 , 8]; Listwise method will outperform pointwise method and pairwise method, but it needs heavy computational cost. Bayesian personalized ranking (BPR) is the famous pairwise learning to rank method [9], which splits the item corpus into positive instances with explicit feedback and negative items with implicit feedback, and then performs pairwise comparison between positive instances and negative samples. In contrast to previous researches, BPR could directly optimize for ranking with the implicit feedback, which could be available in most applications. Moreover, the extensive versions of pairwise learning to rank could achieve significant improvement in item recommendation. However, in real life, insufficient negative samples couldn’t solve the long tail distribution over items, and even lead to low performance recommendation. In addition, a serious problem that the learning to rank method need to address is the high computational complexity, especially in large-scale datasets [10, 11].

In recent years, deep neural networks (DNN) [12] have developed remarkably in information processing due to its powerful capability in information processing [13], and DNN has been ubiquitously applied in various domains, such as computer vision, audio recognition, recommender systems and natural language processing. In contrast to traditional recommender approaches, it’s promising to introduce deep neural networks into RS, furthermore, DNN based RS have achieved tremendous success [14, 15], such as multilayer perceptron (MLP) [1, 16], convolutional neural network (CNN) [17], autoencoder (AE) [18, 19] and so on. In practice, with the powerful capability in computation, DNN based recommender systems could be prone to learn much more expensive feature representations automatically, furthermore, it’s already verified that DNN based RS could work well over unbalanced datasets, and yield significant improvement in contrast to conventional recommender approaches. However, DNN based RS still need improvements, such as model explicable and interactions exploitation between users and items, which could be the bottlenecks in recommendation accuracy.

In this article, a novel Pairwise Deep Learning to Rank recommendation architecture referred to as PDLR is proposed, which is an integration of pairwise learning to rank method and deep neural network, which aims to overcome the drawbacks in computational capability and recommendation accuracy. The corresponding graphical illustration of PDLR is presented in Fig. 1, from which we can see that PDLR generally contains embedding layer, pairwise interaction layer, ranking-aware attention layer and output layer. More specifically, in the architecture of PDLR, the embedding layer could learn expensive feature representations for users and items through average pooling operation, and the ranking-aware attention layer could specify the importance of each item to the user. In contrast to previous researches, it’s also reasonable and interpretable for PDLR to perform full pairwise comparison for the positive instances with each negative sample, which could learn each user’s preference to the negative item in the sense of accuracy. Furthermore, the experimental analysis over real world datasets also shows that it’s promising to introduce the deep neural network into recommender algorithm. The contributions of this article are as follows:

A generic deep neural network based pairwise comparison learning to rank framework referred to as PDLR is proposed, which could provide high performance top-N recommendation with the powerful computational capability of neural network.

PDLR could deep exploit the interactions among items as well as the relationship between the user and items through the pairwise interaction layer, and accurately calculate user’s preference degree towards each item, accordingly, PDLR could achieve significant improvement in personalized recommendation.

Empirical experiments over four real world datasets indicate the superiority of PDLR, which could outperform state-of-the-art recommender algorithms significantly, especially in top-N personalized recommendation.

Fig. 1

Deep neural architecture for pairwise learning to rank for top-N recommendation. In general, PDLR includes the embedding layer, pairwise interaction layer, hidden layer, ranking-aware attention layer and the output layer.

2 Related work

In this section, we will present some related work about PDLR in detail, including learning to rank recommendation and deep neural network based recommendation.

2.1 Learning to rank recommendation

Learning to rank (LtR) recommendation approaches could utilize the implicit feedback for recommendation, such as clicks, purchases, view history and so on, which are available in most information systems, moreover, LtR methods could provide top-N recommendation via minimizing the ranking loss, instead of rating prediction loss. Rendle et al. propose Bayesian personalized ranking from implicit feedback (BPR) [9], which provides a generic optimization solution for personalized ranking. To accelerate the model training for BPR, there are several sampling strategies proposed for negative items, such as random sampling, static sampling and adaptive sampling [5 , 20], but in practice, the high performance of each sampling solution may be not consistent over real life datasets. To further improve the performance of BPR, Liu et al. propose a method called CPLR in the sense of collaborative filtering, which partitions the negative items into collaborative set and negative set, and could obtain much better recommendation results [4]. By contrast, Qiu et al. partition the negative items according to the auxiliary action, such as view and like, which could also achieve high performance in top-N recommendation [21].

2.2 Deep neural network based recommendation

Previous researches have confirmed that Deep Neural Network (DNN) based Recommender Systems (RS) own powerful capability in computation and information processing, which could deal with various kinds of user’s feedback and other auxiliary information. Zhang et al. provide a comprehensive survey on DNN based RS recently [13], such as Restricted Boltzmann Machines (RBM) [22], Multilayer Perceptron (MLP) [1], Convolutional Neural Network (CNN) [17, 23], Autoencoder [24, 25] and so on. In practice, DNN based RS could effectively and efficiently capture intricate relationship within user’s social network, as well as interactions between users and items, therefore, DNN based RS could capture user’s preference and achieve significant improvements in contrast traditional recommender approaches. Autoencoders and denoising autoencoders [24, 25] try to reconstruct user’s explicit ratings. RecNet could minimize the pairwise ranking loss with implicit feedback with the neural network [26], which could also perform preference exploitation and presentation learning for each user with implicit feedback. MIND [27] tries to deep exploit user’s diverse interests, which could achieve high performance in practice. DeepICF [28] is a combination framework of nonlinear neural network and item-based collaborative filtering, which could capture the intricate interactions within item corpus in practice.

3 Pairwise deep learning to rank

In this section, we will concretely present the neural architecture for pairwise deep learning to rank, which generally including embedding layer, pairwise interaction layer, hidden layers, ranking-aware attention layer and output layer. In this article, $𝕌 = {u_{1}, \dots, u_{n}}$ denotes the set of users, $𝕀 = {v_{1}, \dots, v_{m}}$ denotes the collection of items, and d denotes embedding size. Note that, for sake of simplicity, the users and items are with the same embedding size.

3.1 Neural architecture

In this article we will illustrate the proposed neural architecture for pairwise learning to rank, which generally including embedding layer, pairwise interaction layer, hidden layer and output layer. In PDLR, for user $u \in 𝕌$ , the item corpus $𝕀$ will be split into two collections: $𝕀_{u}^{+}$ and $𝕀_{u}^{-}$ , where $𝕀_{u}^{+}$ denotes the collection of positive items with explicit feedback, such as numeric ratings, and $𝕀_{u}^{-}$ denotes the collection of negative items with no feedback, moreover, $𝕀 = 𝕀_{u}^{+} \cup 𝕀_{u}^{-}$ . Actually, the output of this framework is the predicted distribution over negative items according to user u’s historical behavior records. The corresponding graphical illustration of PDLR is presented in Fig. 1. Specifically, it’s assumed that user $u \in 𝕌$ prefers items in $𝕀_{u}^{+}$ to items in $𝕀_{u}^{-}$ , furthermore, items in $𝕀_{u}^{-}$ will be regarded as the potential candidates to be recommended to user u. Essentially, the goal of this neural network based architecture is to learn the probability distribution over items for each user, and return items with top-N high probability.

Embedding layer. With the input one-hot encoding vectors for each item, here, suppose s items selected from $𝕀_{u}^{+}$ and t items selected from $𝕀_{u}^{-}$ respectively. Moreover, let $x_{+}^{i}$ denote the sparse one-hot encoding vector for the i-th item v_i selected from $𝕀_{u}^{+}$ , and $x_{-}^{j}$ denote the sparse one-hot encoding vector for the j-th item v_j selected from $𝕀_{u}^{-}$ .

Here, let variables $z_{+} = {z_{+}^{1}, z_{+}^{2}, \dots, z_{+}^{s}}$ and $z_{-} = {z_{-}^{1}, z_{-}^{2}, \dots, z_{-}^{t}}$ denote the representation vectors for items in $𝕀_{u}^{+}$ and $𝕀_{u}^{-}$ respectively, moreover, $z_{+}^{i} : = (f_{i}^{1}, f_{i}^{2}, \dots, f_{i}^{K}), f_{i}^{K} = W_{0 +; i}^{K} x_{+}^{i}, z_{-}^{j} : = (f_{j}^{1}, f_{j}^{2}, \dots, f_{j}^{K}), f_{j}^{K} = W_{0 -; j}^{K} x_{-}^{j},$ (1) where $W_{0 +} \in ℝ^{s \times K \times d}$ and $W_{0 -} \in ℝ^{t \times K \times d}$ are parameters for embedding layer. Note here, d denotes the embedding length for all items in $𝕀$ .

In the following, we could get the dense representation vectors for each item through average pooling operation, which can retain useful information and reduce computational complexity. Here, suppose the i-th d-dimensional representation vector for s items with explicit feedback is p_i, and j-th d-dimensional representation vector for t items with no feedback is q_j, more precisely, $p_{i} = (p_{i}^{1}, p_{i}^{2}, \dots, p_{i}^{d}), p_{i}^{l} = f_{ave} {z_{+}^{i}}, q_{j} = (q_{j}^{1}, q_{j}^{2}, \dots, q_{j}^{d}), q_{j}^{l} = f_{ave} {z_{-}^{j}},$ (2) where 0 < i ≤ s, 0 < j ≤ t, and f_ave denotes the average function.

Identically, with the one-hot encoded vector for user $u \in 𝕌$ , we can obtain the dense representation vector $u_{u} \in ℝ^{d}$ , through the embedding layer and average pooling layer. The generated d-dimensional feature vectors p_i, q_j and u_u will be utilized for pairwise interaction exploitation.

Pairwise interaction layer. As stated above, the embedding layer could automatically learn the expensive feature representations for each item and user. So far, we can obtain the sparse representation vectors for each item. It’s already demonstrated that pairwise comparison method could speed up the model learning to rank and improve the recommendation performance [9]. Inspired by that, here, for user u, we perform pairwise comparison between items in $𝕀_{u}^{+}$ and $𝕀_{u}^{-}$ through the neural network, more precisely, for each pair (v_i, v_j), s.t. 0 < i ≤ s, 0 < j ≤ t, the pairwise interaction could be learned in the form of f_int (u_u, p_i, q_j), which could be decomposed as follows:

$f_{int} (u_{u}, p_{i}, q_{j}) = f_{int} (u_{u}, p_{i}) - f_{int} (u_{u}, q_{j}), = 〈 u_{u}, p_{i} 〉 - 〈 u_{u}, q_{j} 〉,$ (3) where 〈· 〉 denotes the element-wise product, which could capture the second-order interactions between the user and target item. Note here we will perform pairwise comparison for each item in $𝕀_{u}^{-}$ with items in $𝕀_{u}^{+}$ respectively, the incentive of which is to learn the item similarity and minimize the ranking loss via conducting element-wise product over embedding vectors. Furthermore, the comparison result of each item v_j in $𝕀_{u}^{-}$ will be regarded as the preference degree for user u towards item v_j. In other words, it will be great likely for user u to prefer to item $v_{j} \in 𝕀_{u}^{-}$ if item v_j is with large similarity to the items in $𝕀_{u}^{+}$ . This method could deep exploit the interaction between each user and items as well as the relations between items, and improve the ranking accuracy significantly in practice, which is also certificated in Section 1.

To this end, for pairwise comparison layer, let c_u denote the comparison results for user u, accordingly, we can have, $c_{u} = {c_{u}^{1}, c_{u}^{2}, \dots, c_{u}^{t}}, with c_{u}^{j} = σ (\sum_{i = 1}^{s} f_{int} (u_{u}, p_{i}, q_{j})),$ (4) where σ (x) =1/(1 + exp (x)), moreover, $c_{u}^{j} \in (0, 1)$ , which could indicate user u’s preference degree towards the potential candidate $v_{j} \in 𝕀_{u}^{-}$ , in other words, it also could be the probability for user u to take behavior over v_j.

Output layer. So far, the generated vector c_u could uncover the pairwise interaction between the items in $𝕀_{u}^{+}$ and $𝕀_{u}^{-}$ . Here, we further exploit the intricate interactions within the negative items through a three-layer neural network, which is shown in Fig. 1. The hidden layer is fully connected with the pairwise interaction layer with rectified linear (ReLU) activation functions. With the learned preference vector c_u, the output of the framework could be obtained via: $e_{2} = ReLU (W_{2} \cdot ReLU (W_{1} c_{u} + b_{1}) + b_{2}),$ (5) where $W_{1} \in ℝ^{1 \times t}$ and $W_{2} \in ℝ^{1 \times t}$ are weights for hidden layers L₁ and L₂ respectively, and b₁ and b₂ are bias vectors.

As mentioned before, the generated $e_{2} \in ℝ^{t}$ is a dense layer which gives the predicted preference degree for each corresponding negative item in $𝕀_{u}^{-}$ , more precisely, e₂ = {l₁, l₂, ⋯ , l_t}. For $v_{j} \in 𝕀_{u}^{-}$ , Pr (v_j|u) denote the probability for user u to take action over v_j, which could be proportional to the corresponding l_j ∈ e₂. In other words, we can employ l_j to indicate the probability for user u to take behavior over item $v_{j} \in 𝕀_{u}^{-}$ , which means $\Pr (v_{j} | u) \propto l_{j}, v_{j} \in 𝕀_{u}^{-}, 0 < j \leq t .$ (6)

Specifically, here, we design a ranking-aware attention layer Att (l_j), with softmax function, to learn the weight for each item, which could specify the importance of each item to the user. Accordingly, we can obtain the probability for user u to take behavior over item v_j as follows: $\hat{\Pr} (v_{j} | u) = l_{j} Att (l_{j}) = l_{j} \frac{\exp (l_{j})}{\sum_{j = 1}^{t} \exp (l_{j})} .$ (7)

With the ranked $\hat{\Pr} (v_{j} | u), s . t . 0 < j \leq t$ , we could easily obtain the top-N recommendation list for user u.

3.2 Model training

As stated before, PDLR could directly optimize for top-N recommendation with the implicit feedback. With the learned probability distribution for items, here, cross-entropy is employed in model training to minimize the ranking loss: $L = - \sum_{j = 1}^{t} \Pr (v_{j} | u) \log \hat{\Pr} (v_{j} | u) - (1 - \Pr (v_{j} | u)) \log (1 - \hat{\Pr} (v_{j} | u)) + λ {∥ Θ ∥}_{F}^{2},$ (8) where Pr () is the ground truth, Θ denotes the parameters for the neural network, and the regularization term is employed to avoid over-fitting.

In practice, pairwise comparison between a large number of items in $𝕀_{u}^{+}$ and $𝕀_{u}^{-}$ will lead to heavy computational cost, because the corresponding computational cost is $O (| 𝕀_{u}^{+} | | 𝕀_{u}^{-} |)$ , however, it’s verified that the model could be well learned with only part of items in $𝕀_{u}^{+}$ and $𝕀_{u}^{-}$ in practice, then the representation vectors for the user and items could be obtained. To this end, here, suppose randomly select $N_{+}$ items from $𝕀_{u}^{+}$ , and randomly select $N_{-}$ items from $𝕀_{u}^{-}$ . Then the computational cost will be reduced to $O (N_{+} N_{-})$ , where $N_{+} ⪡ | 𝕀_{u}^{+} |, N_{-} ⪡ | 𝕀_{u}^{-} |$ .

The well-known stochastic gradient descent (SGD) is employed for parameter learning for the neural network, with back-propagation algorithm to compute the error gradients. The deep neural network is capable of exploiting the pairwise relations between items, and the output could indicate the probability distribution for the candidate items. In practice, this neural network could achieve convergence with only part of the explicit items and implicit items, which could be random subset from the items with explicit feedback and implicit feedback respectively. The generated vectors propagate from the embedding layer to the output layer, by contrast, the errors are back-propagated through the network. The detail of the network learning is presented in Algorithm 1.

Algorithm 1 Model training for PDLR
Input: the user-item pairs with explicit feedback; the derived $𝕀_{u}^{+}$ and $𝕀_{u}^{-}$ ;
Output: model parameter Θ;
1: for all user $u \in 𝕌$ do
2: repeat
3: randomly select $N_{+}$ items from $𝕀_{u}^{+}$ ;
4: randomly select $N_{-}$ items from $𝕀_{u}^{-}$ ;
5: propagate binary vectors from the embedding layer to output;
6: back-propagate ranking error throughout the network;
7: until convergence
8: end for
9: return: model parameter Θ

4 Experimental analysis

In this section, to evaluate the performance of PDLR, a series of experiments are performed over Movielens, Amazon Movie (Amazon-m), Douban Movie (Douban-m) and Lastfm respectively, and we will compare the performance of PDLR with other benchmark recommendation methods. In addition, we will also investigate the impacts of parameters over the recommendation performance.

4.1 Datasets

The Movielens dataset contains 881,563 observed ratings assigned by 5,748 users over 3,811 movies 1 . The subset of original Amazon-m dataset contains 116,342 observed ratings assigned by 6,562 users over 4,569 movies 2 . The Douban-m dataset contains 122,508 observed ratings assigned by 2,872 users over 12,416 movies 3 . The Lastfm dataset contains 88,104 explicit user-listened artist relations assigned by 1,753 users to 16,428 artists 4 . With the outlier and users and items with less than 10 records being removed, the statistic of each dataset are reported in Table 1, which indicates that the datasets are rather sparse. Moreover, 5-fold cross validation is employed in each experiment, and each dataset will be split randomly into a training set (80%) and a testing set (20%).

Table 1
Statistic of each dataset

Dataset #users #items #feedback density

Movielens 5,748 3,811 881,563 4.02%

Amazon-m 6,562 4,569 116,342 0.388%

Douban-m 2,872 12,416 122,508 0.344%

Lastfm 1,753 16,428 88,104 0.306%

Dataset	#users	#items	#feedback	density
Movielens	5,748	3,811	881,563	4.02%
Amazon-m	6,562	4,569	116,342	0.388%
Douban-m	2,872	12,416	122,508	0.344%
Lastfm	1,753	16,428	88,104	0.306%

In addition, the observed ratings in Movielens, Amazon-m and Douban-m will be regarded as the explicit feedback for items in experimental settings, and the rest will be regarded as implicit feedback.

4.2 Evaluation metrics

Due to the essential goal of the proposed PDLR is to provide accurate top-N recommendation to each user, here, the popular Rec@N, Prec@N, AUC and NDCG@N are employed as the evaluation metrics for PDLR, and Rec@N and Prec@N are defined as follows: $Rec @ N = \frac{1}{| 𝕌_{test} |} \sum_{u \in 𝕌_{test}} \frac{| π (u) \cap 𝕀_{u}^{+} |}{| 𝕀_{u}^{+} |}, Prec @ N = \frac{1}{| 𝕌_{test} |} \sum_{u \in 𝕌_{test}} | π (u) \cap 𝕀_{u}^{+} | / N,$ (9) where $𝕌_{test}$ denotes the set of users in testing set, and π (u) denotes the recommendation list for user u by PDLR. The definition for AUC is: $AUC = \sum_{u \in 𝕌_{test}} \frac{\sum_{(v_{i}, v_{j}) \in S_{u}} r_{v_{j}} - | S_{u} | \cdot (| S_{u} | + 1) / 2}{| 𝕌_{test} | \cdot | S_{u} | \cdot (N \cdot M - | S_{u} |)},$ (10) where $S_{u}$ denotes the collection of training pairs with positive comparison results, and r_{v
_j} is the ranking label for v_j. Generally, AUC ∈ (0.5, 1), and large values of AUC indicate high performance. The definition for NDCG @ N is as follows: $NDCG @ N = \frac{1}{| 𝕌_{test} |} \sum_{u \in 𝕌_{test}} {\frac{1}{IDCG}}_{u} \sum_{k = 1}^{N} \frac{r_{k}}{\log_{2} (k + 1)},$ (11) where IDCG_u is the ideal value for DCG_u, N is the number for recommendation results, r_k is the label for the item ranked at position k, and ${DCG}_{u} = \sum_{k = 1}^{N} \frac{r_{k}}{\log_{2} (k + 1)}$ .

4.3 Benchmark methods

We will compare the performance of PDLR with the following recommender approaches:

BPR: Bayesian personalized ranking provides a generic optimization criterion for optimal personalized ranking, which could leverage the implicit feedback for item recommendation in most information systems [9].

RankNet: RankNet is a two-layer neural network for pairwise ranking, which takes the features of each sample as input, and then performs pairwise comparison for the samples, and back propagates the errors throughout the network [29].

DeepICF: DeepICF could capture the nonlinear and high-order interactions between items through a nonlinear neural network, which could be greatly helpful to decision-making for each user [28].

RecNet: RecNet is a neural network for collaborative filtering, which could jointly learn the representations for users and items as well as the preference relations via pairwise comparions. In practice, RecNet performs well with implicit feedback, and could provide high performance in item recommendation [26].

NCR: Neural network based collaborative ranking could learn user’s pairwise preference between items and capture the relations of latent factors. Actually, NCR aims to resort to the powerful computational capacity of neural network for pairwise learning to rank [30].

For PDLR, grid search method is employed for parameter tuning over each real world dataset, and the learning rate is set to η = {0.01, 0.001, 0.0001}. The parameter for regularization term is set to: λ = {0.05, 0.01, 0.005, 0.001}, and the embedding size for each user and items is set to: d = {8, 16, 32, 64}.

4.4 Performance comparison

In this section, a series of experiments are carried out over each real life dataset to evaluate the effectiveness and practicability of PDLR, moreover, we will further compare the performance of PDLR with the benchmark recommender approaches, in terms of Rec@N and Prec@N, NDCG@N and AUC.

The corresponding experimental results of the performance comparison for BPR, RankNet, DeepICF, RecNet, NCR and PDLR are reported in Table 2, in terms of Rec@N, Prec@N, NDCG@N and AUC. Here, note that the embedding size is set to d = 32 over each dataset for comparison. From Table 2 we can see that:

Overall, the proposed PDLR could achieve high performance over Movielens, Amazon-m, Douban-m and Lastfm, in contrast to BPR, RankNet, DeepICF, RecNet and NCR. Take Movielens for example, the values of Rec@5, Rec@10, Prec@5, Prec@10, NDCG@5, NDCG@10 and AUC are 0.1104, 0.1373, 0.2932, 0.3208, 0.3901, 0.4233 and 0.8208 respectively, which are much better than that of other compared approaches.

The performance of BPR is a little inferior to that of other methods. Actually, RankNet, DeepICF, RecNet, NCR and PDLR are extensive methods of BPR, and in practice these methods could overcome some drawbacks of BPR, and achieve better performance.

The values of Rec@10, Prec@10 and NDCG@10 are a little better than that of Rec@5, Prec@5 and NDCG@5 respectively.

Table 2
Performance comparison for each algorithm (d = 32)

Dataset Methods Rec@5 Rec@10 Prec@5 Prec@10 NDCG@5 NDCG@10 AUC

Movielens BPR 0.0423 0.0532 0.2316 0.2351 0.2863 0.3201 0.7158

RankNet 0.0521 0.0741 0.2544 0.2109 0.3288 0.3743 0.7213

DeepICF 0.0919 0.0955 0.2186 0.2252 0.3411 0.4108 0.7545

RecNet 0.0724 0.0921 0.2311 0.2615 0.3372 0.3927 0.8121

NCR 0.0924 0.1109 0.2516 0.2733 0.3894 0.4013 0.7644

PDLR 0.1104 0.1373 0.2932 0.3208 0.3901 0.4233 0.8208

Amazon-m BPR 0.0594 0.0477 0.1565 0.1679 0.2656 0.2946 0.7211

RankNet 0.0416 0.0959 0.1628 0.1611 0.2898 0.3856 0.7201

DeepICF 0.0542 0.0638 0.1541 0.1509 0.3522 0.4019 0.7524

RecNet 0.0579 0.0912 0.1682 0.1837 0.3509 0.4138 0.7243

NCR 0.0552 0.0810 0.1574 0.1794 0.3492 0.3994 0.7299

PDLR 0.0753 0.0951 0.1790 0.1911 0.3768 0.4517 0.7916

Douban-m BPR 0.0486 0.0572 0.1665 0.1952 0.2661 0.2806 0.6812

RankNet 0.0599 0.0806 0.1922 0.2311 0.2859 0.3056 0.6952

DeepICF 0.0636 0.087 0.1965 0.2408 0.3255 0.3104 0.8468

RecNet 0.0724 0.1013 0.2135 0.2399 0.3378 0.3606 0.8093

NCR 0.0711 0.0955 0.2092 0.2407 0.3501 0.3424 0.7308

PDLR 0.0972 0.1229 0.2481 0.2674 0.3644 0.3873 0.8327

Lastfm BPR 0.0681 0.0617 0.1544 0.1879 0.2986 0.3160 0.7105

RankNet 0.0722 0.0835 0.1825 0.2032 0.3788 0.3523 0.7124

DeepICF 0.0908 0.0898 0.1799 0.2504 0.3945 0.3597 0.7506

RecNet 0.0816 0.0992 0.1877 0.2511 0.3824 0.3841 0.7296

NCR 0.0861 0.0955 0.1706 0.2668 0.3871 0.3808 0.7648

PDLR 0.1128 0.1337 0.2016 0.2746 0.4092 0.4115 0.8327

Dataset	Methods	Rec@5	Rec@10	Prec@5	Prec@10	NDCG@5	NDCG@10	AUC
Movielens	BPR	0.0423	0.0532	0.2316	0.2351	0.2863	0.3201	0.7158
	RankNet	0.0521	0.0741	0.2544	0.2109	0.3288	0.3743	0.7213
	DeepICF	0.0919	0.0955	0.2186	0.2252	0.3411	0.4108	0.7545
	RecNet	0.0724	0.0921	0.2311	0.2615	0.3372	0.3927	0.8121
	NCR	0.0924	0.1109	0.2516	0.2733	0.3894	0.4013	0.7644
	PDLR	0.1104	0.1373	0.2932	0.3208	0.3901	0.4233	0.8208
Amazon-m	BPR	0.0594	0.0477	0.1565	0.1679	0.2656	0.2946	0.7211
	RankNet	0.0416	0.0959	0.1628	0.1611	0.2898	0.3856	0.7201
	DeepICF	0.0542	0.0638	0.1541	0.1509	0.3522	0.4019	0.7524
	RecNet	0.0579	0.0912	0.1682	0.1837	0.3509	0.4138	0.7243
	NCR	0.0552	0.0810	0.1574	0.1794	0.3492	0.3994	0.7299
	PDLR	0.0753	0.0951	0.1790	0.1911	0.3768	0.4517	0.7916
Douban-m	BPR	0.0486	0.0572	0.1665	0.1952	0.2661	0.2806	0.6812
	RankNet	0.0599	0.0806	0.1922	0.2311	0.2859	0.3056	0.6952
	DeepICF	0.0636	0.087	0.1965	0.2408	0.3255	0.3104	0.8468
	RecNet	0.0724	0.1013	0.2135	0.2399	0.3378	0.3606	0.8093
	NCR	0.0711	0.0955	0.2092	0.2407	0.3501	0.3424	0.7308
	PDLR	0.0972	0.1229	0.2481	0.2674	0.3644	0.3873	0.8327
Lastfm	BPR	0.0681	0.0617	0.1544	0.1879	0.2986	0.3160	0.7105
	RankNet	0.0722	0.0835	0.1825	0.2032	0.3788	0.3523	0.7124
	DeepICF	0.0908	0.0898	0.1799	0.2504	0.3945	0.3597	0.7506
	RecNet	0.0816	0.0992	0.1877	0.2511	0.3824	0.3841	0.7296
	NCR	0.0861	0.0955	0.1706	0.2668	0.3871	0.3808	0.7648
	PDLR	0.1128	0.1337	0.2016	0.2746	0.4092	0.4115	0.8327

From the neural architecture of PDLR, we can see that PDLR could perform pairwise comparison with the pairwise interaction layer. More precisely, items with implicit feedback will be equally regarded as the potential candidates for each user, and PDLR will perform pairwise comparison for each item with implicit feedback to the items with explicit feedback, afterwards, the corresponding results could indicate the user’s preference degree towards the item, which could also be regarded as the probability for the user to take behavior over the item in future. In this regard, PDLR could deep capture the intricate relations between items, and perform pairwise comparison for personalized ranking.

Essentially, PDLR owns the merits of computational capability and recommendation accuracy, since it’s an integration of deep neural network and pairwise comparison. Therefore, with the deep neural network, PDLR could deal with tasks with high computational cost, moreover, PDLR could perform representation vectors learning for each user and items via interaction exploitation. Accordingly, PDLR could overcome the drawbacks of other methods, such as BPR, RankNet, DeepICF, RecNet and NCR, and achieve significant improvement in item recommendation tasks, which is also verified in the experimental results.

In summary, from the experimental analysis over Movielens, Amazon-m, Douban-m and Lastfm, we could conclude that the performance of PDLR is stable and effective over real life datasets, which could provide much more effective and accurate top-N recommendation in contrast to state-of-the-art recommender approaches.

4.5 Discussion

In this section, we will investigate the impacts of embedding size over the performance, and perform convergence analysis of model training. Below, we will do research on top-N performance for PDLR.

4.5.1 Impact of embedding size

The embedding size of the users and items could affect the performance of PDLR. Note here, for sake of simplicity, the user and items are with the same embedding size d. In this section, we do research on the impact of embedding size over the performance of PDLR, and the corresponding results are reported in Fig. 2, from which we could see that: (1) The values of Rec@5, Rec@10 and Rec@20 increase with the increasing embedding size over each dataset; (2) We could obtain the optimal value of Rec@N while the embedding size is around d = 32 for Movielens, Amazon-m, Douban-m and Lastfm. While d > 32, the values of Rec@N vary slightly with the increasing d.

Fig. 2

Investigation of embedding size for PDLR in terms of Rec@N over Movielens, Amazon-b, Douban-m and Lastfm.

In practice, the large value of embedding size may lead to heavy computational cost, therefore, we also need to balance the recommendation performance and computational cost. Figure 2 indicates that the embedding size could be set to d = 32 for PDLR in real applications.

4.5.2 Convergence analysis

In this section, we will focus on convergence analysis for the model training, furthermore, to investigate the functionality of ranking-aware attention layer, we will do performance comparison between the ranking-aware attention layer (Here, referred to as PDLR_ATT) and the network without ranking-aware attention layer (referred to as PDLR_HI) over each real life dataset, and the corresponding results are reported in Fig. 3.

Fig. 3

Convergence analysis for PDLR in terms of AUC over Movielens, Amazon-b, Douban-m and Lastfm.

From Fig. 3 we could see that (1)After several (around 20), both of PDLR_HI and PDLR_ATT could quickly achieve convergence training iterations over each dataset; (2) While achieving convergence, PDLR_ATT could obtain a little bigger AUC values than PDLR_HI over each dataset; (3) Values of AUC of PDLR_HI and PDLR_ATT are stable and reliable over each dataset. Take Lastfm for example, the value of AUC for PDLR_ATT is about 0.832, the improvement of which could reach 5.65% in contrast to PDLR_HI, which indicates that PDLR_ATT slightly outperforms PDLR_HI.

Note that during the model training phase, we find that the proposed recommender engine could also achieve convergence just with part of items with explicit feedback and implicit feedback as input, however, in this settings, insufficient samples may lead to low performance in real world applications, although this could decrease the computational cost significantly.

In summary, the experimental results in Fig. 3 demonstrate the significant superiority of PDLR_ATT with ranking-aware attention layer. In practice, the proposed PDLR_ATT could deal with large volume of datasets with the neural architecture effectively and efficiently, and the ranking-aware attention layer of PDLR_ATT could indicate user’s preference to each item, as a result, PDLR_ATT could further improve the recommendation performance, by contrast, PDLR_HI is without the ranking-aware attention layer.

4.5.3 Top-N performance

Due to the final goal of the proposed PDLR is to provide top-N recommendation, in this section, we conduct top-N performance comparison (N = {10, 20, 30, 40, 50}) for PDLR and other compared methods in terms of NDCG@N over each dataset, and the corresponding experimental results are shown in Fig. 4, from which we can find that the results share the similar trends over Movielens, Amazon-m, Douban-m and Lastfm: The top-N performance of PDLR is stable w.r.t. N, and PDLR outperforms BPR, RankNet, DeepICF and RecNet significantly. Take Lastfm for example, the NDCG@20 are 0.321, 0.358, 0.362, 0.383 and 0.418 for BPR, RankNet, DeepICF, RecNet and PDLR respectively, and the improvement for PDLR could even reach 8.1% on average while compared to other methods. In practice, a small number of pairwise comparison between the positive instance and the negative samples may lead to inaccurate parameter setting, such as DeepICF and NCR, therefore, to overcome this drawback, PDLR could learn a much more accurate preference degree for each negative sample via the pairwise interaction layer, which could improve the recommendation performance significantly, and the experimental results in Fig. 4 also certificate the superiority of PDLR.

Fig. 4

Top-N performance comparison for BPR, RankNet, DeepICF, RecNet and PDLR in terms of NDCG@N over Movielens, Amazon-b, Douban-m and Lastfm.

5 Conclusion

Recommender systems are effective and efficient in dealing with such data sparsity and cold start problems, and have been ubiquitously applied in various information systems. Currently recommender systems have become the crucial role to address information overload in various information systems. To overcome the data sparsity and provide accurate recommendation for users, in this article, a novel neural network based pairwise comparison method referred to as PDLR is proposed, which could overcome the drawbacks of previous recommender approaches, and perform pairwise learning to rank through comparison between positive instances and negative samples. In practice, PDLR could learn expensive feature representations for each user and item through the embedding layer, and further enhance the recommendation performance with the ranking-aware attention layer. With the powerful computational capability of neural network, PDLR is capable of deep exploiting the interactions among items, as well as the relations between each user and items, which are greatly helpful to accurate top-N recommendation in theory. Experimental analysis over four real world datasets confirms the superiority of PDLR, which could outperform state-of-the-art recommender approaches significantly, especially in recommendation accuracy.

As future work, we intend to investigate the performance of neural network based recommender systems over datasets with large volume, including some online applications. We will also try to leverage some available auxiliary information, such as the item attributes and social network, to further boost the recommendation performance for PDLR.

Footnotes

Acknowledgments

This work is supported by the National Natural Science Foundation (Grant Nos. 61872298, 61532009, 61802316 and 61602389).

References

Covington

, Adams

and Sargin

, Deep neural networks for youtube recommendations, in Proceedings ofthe 10th ACM Conference on Recommender Systems, 2016, pp. 191–198.

Gomez-Uribe

C.A.

and Hunt

, The netflix recommender system: Algorithms, business value, and innovation, ACM Trans Manage Inf Syst 6(4) (2016), 1–13.

Salakhutdinov

and Mnih

, Probabilistic matrix factorization, in Advances in neural informationprocessing systems, 2008, pp. 1257–1264.

Liu

, Wu

and Zhang

, Cplr:collaborative pairwise learning to rank for personalized recommendation, Knowledge-Based Systems 148 (2018), 31–40.

Rendle

and Freudenthaler

, Improving pairwise learning for item recommendation from implicit feedback, in Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pp. 273–282.

Wang

, Huang

, Liu

T.Y.

, Ma

, Chen

and Veijalainen

, Ranking-oriented collaborative filtering: Alistwise approach, ACM Trans Inf Syst 35(2) (2016), 1–28.

Koren

, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in Proceedingsof the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 426–434.

, Ge

, Liu

, Chen

, Long

and Huang

, Modeling users’ preferences and social links in socialnetworking services: a joint-evolving perspective, in Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 279–286.

Rendle

, Freudenthaler

, Gantner

and Schmidt-Thieme

, Bpr: Bayesian personalized ranking from implicitfeedback, in Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, 2009, pp. 452–461.

10.

Guo

, Wu

, Wang

and Tan

, Adaptive pairwise learning for personalized ranking with content and implicit feedback, in 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology(WI-IAT), 2016, pp. 369–376.

11.

Zhou

, Zhou

Y.L.

, Li

J.P.

and Memon

H.M.

, Lsrec: Large-scale social recommendation with online update, Expert Systems with Applications 162 (2020), pp. 1–13.

12.

Hinton

G.E.

, Osindero

and Teh

Y.W.

, A fast learning algorithm for deep belief nets, Neural Comput 18(7) (2006), 1527–1554.

13.

Zhang

, Yao

, Sun

and Tay

, Deep learning based recommender system: A survey and new perspectives, ACM Comput Surv 52(1) (2019), 5.

14.

and Chua

T.-S.

, Neural factorization machines for sparse predictive analytics, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 355–364.

15.

Ebesu

and Fang

, Neural citation network for contextaware citation recommendation, in Proceedings ofthe 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 1093–1096.

16.

Gao

, Pantel

, Gamon

, He

and Deng

, Modeling interestingness with deep neural networks, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 2–13.

17.

Kim

, Park

, Oh

, Lee

and Yu

, Convolutional matrixfactorization for document context-aware recommendation, in Proceedings of the 10th ACM Conference on Recommender Systems, 2016, 233–240.

18.

Dong

, Yu

, Wu

, Sun

, Yuan

and Zhang

, A hybrid collaborative filtering model with deep structurefor recommender systems, in AAAI, 2017, pp. 1309–1315.

19.

Strub

, Gaudel

and Mary

, Hybrid recommender system based on autoencoders, in Proceedings of the 1stWorkshop on Deep Learning for Recommender Systems, 2016, pp. 11–16.

20.

Guo

, Wu

, Wang

and Tan

, Personalized ranking with pairwise factorization machines, Neurocomputing 214 (2016), 191–200.

21.

Qiu

, Liu

, Guo

, Sun

, Zhang

and Hai

T.N.

, Bprh: Bayesian personalized ranking for heterogeneousimplicit feedback, Information Sciences 453 (2018), 80–98.

22.

Salakhutdinov

, Mnih

and Hinton

, Restricted boltzmann machines for collaborative filtering, in Proceedings of the 24th international conference on Machine learning, 2007, pp. 791–798.

23.

Zhou

, Li

, Zhang

, Wang

and Shah

, Deep learning modeling for top-n recommendation with interestsexploring, IEEE Access 6 (2018), 51440–51455.

24.

Sedhain

, Menon

, Sanner

and Xie

, Autorec: Autoencodersmeet collaborative filtering, in Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 111–112.

25.

and She

, Collaborative variational autoencoder for recommender systems, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 305–314.

26.

Sidana

, Trofimov

, Horodnitskii

, Laclau

, Maximov

and Amini

M.-R.

, Representation learning andpairwise ranking for implicit feedback in recommendation systems, arXiv preprint arXiv:1705.00105, 2017.

27.

, Liu

, Wu

, Xu

, Huang

, Zhao

, Kang

, Chen

, Li

and Lee

D.L.

, Multi-interest network with dynamic routing for recommendation at tmall, arXiv preprint arXiv:1904.08030, 2019.

28.

Xue

, He

, Wang

, Xu

, Liu

and Hong

, Deep item-based collaborative filtering for top-nrecommendation, ACM Trans Inf Syst 37(3) (2019), 33.

29.

Burges

, Shaked

, Renshaw

, Lazier

, Deeds

, Hamilton

and Hullender

G.N.

, Learning to rank usinggradient descent, in Proceedings of the 22nd International Conference on Machine learning, 2005, pp. 89–96.

30.

Song

, Yang

, Cao

and Xu

, Neural collaborative ranking, arXiv preprint arXiv:1808.04957, 2018.

Pairwise deep learning to rank for top-N recommendation

Abstract

Keywords

1 Introduction

2.1 Learning to rank recommendation

2.2 Deep neural network based recommendation

3 Pairwise deep learning to rank

3.1 Neural architecture

4.1 Datasets

Table 1 Statistic of each dataset Dataset #users #items #feedback density Movielens 5,748 3,811 881,563 4.02% Amazon-m 6,562 4,569 116,342 0.388% Douban-m 2,872 12,416 122,508 0.344% Lastfm 1,753 16,428 88,104 0.306%

4.4 Performance comparison

4.5.1 Impact of embedding size

Footnotes

Acknowledgments

References

Table 1
Statistic of each dataset

Dataset #users #items #feedback density

Movielens 5,748 3,811 881,563 4.02%

Amazon-m 6,562 4,569 116,342 0.388%

Douban-m 2,872 12,416 122,508 0.344%

Lastfm 1,753 16,428 88,104 0.306%