Counteracting the filter bubble in recommender systems: Novelty-aware matrix factorization

Abstract

The search for unfamiliar experiences and novelty is one of the main drivers behind all human activities, equally important with harm avoidance and reward dependence. A recommender system personalizes suggestions to individuals to support and guide them in their exploration tasks. Personalization mechanisms and recommender systems limit serendipitous encounters by selectively guessing the next item to show to users and potentially leading them into so-called filter bubbles. In the ideal case, these recommendations, except of being accurate, should be also novel. However, up to now most platforms fail to provide both novel and accurate recommendations. For example, a well-known recommendation algorithm, such as matrix factorization (MF), tries to optimize only the accuracy criterion, while disregarding the novelty of recommended items. In order to counteract the filter bubble, we propose two models, denoted as popularity-based and distance-based NMF, that allow to trade-off the MF performance with respect to the criteria of novelty, while only minimally compromising on accuracy. Our experimental results demonstrate that we attain high accuracy by recommending also novel items.

Keywords

Recommendation algorithms Evaluation Novelty Collaborative Filtering Matrix Factorization

1 Introduction

The filter bubble is the unique universe of information created around the users by prediction engines or recommendation algorithms [26]. Based on the principle that users consume media of their interest, the filter bubble creates a tinted view of the world around the users by recommending items from the catalogues. A concrete example of the filter bubble is when a user buys a book on an e-commerce platform and repeatedly receives recommendations only about books discussing the same topic. This kind of recommendation engine are based on incomplete evidence of interest and neglect the inherent need of users for non-obvious recommendations [31].

Recommender systems research aims primarily at providing accurate item recommendations [19] while ignoring many times additional quality criteria such as the novelty of a recommended item [3]. There are many definitions of item novelty [3]. For example, the popularity-based novelty focuses on discovering non-popular products that match the crowd’s interest. In terms of MF for providing novel item recommendations, related work [9, 32] observed that by raising the dimensionality of the MF model (i.e., by increasing the number of latent factors), we can recommend items coming from the long tail (i.e. more novel items), but with big losses in terms of accuracy. In addition, an increased number of latent factors directly affects the efficiency of MF models. Generating novel recommendations lead to the following benefits:

Provide non-trivial recommendations.

Discover items that could not have been found by the users themselves and increasing the array of choices (avoids the filter bubble).

The business can leverage revenues from market niches (sales diversity).

In this paper, we extend our previous work on generating novel item recommendations based on matrix factorization [7]. We propose a MF method that simultaneously recommends accurate and novel items. Our proposed method, denoted as NMF, has the advantage of controlling through a regularization term how novel items will be recommended, without increasing the number of latent factors of MF [9, 32]. Moreover, we introduce an integrated way to evaluate novelty, denoted as Novelty-nDCG, which is based on the well-known nDCG [4], but adjusted for our case-scenario in recommender systems, and distinguishes a more novel item from a less novel item. N-nDCG also can be used with different definitions [3, 30] of item novelty, as will be described later. However, item novelty should not be considered equal to the diversity of a recommendation list.

In the remainder of this paper, Section 2 discusses the related work. In Section 3.1, we define item novelty, and the evaluation of a recommendation list of items. Then, we propose a framework for novel MF. Section 4 presents our experimental results on two well-known datasets. Finally, Section 5 concludes and describes future work.

2 Related work

Recommender systems’ effectiveness cannot be measured by considering only the accuracy of recommendations. Jannach et al. [18], outline that the research community is becoming increasingly aware of this problem, and that aspects related to the users’ experience like explanations, novelty and serendipity are starting to receive more attention.

Furnas et al. [13] proposed Singular Value Decomposition (SVD) to factor a matrix into three matrices. An instance of SVD, known as classic matrix factorization (MF), searches for two matrices (U and V), whose multiplication gives an approximation of the original matrix A. That is, if we have a matrix A with n rows and m columns, we can find two matrices, one U with n rows and k columns and one V with m rows and k columns, such that UV^⊤ produces A with the blank entries filled and a small deflection of the initial values. Another MF method is known as CUR Matrix Decomposition [23], because the initial matrix is factorized to 3 matrices (C, U and R). One quick observation about CUR decomposition is that row and column that are used to construct matrices C and R are randomly selected from matrix A. It is obvious that this selection will affect CUR-approximation.

Several methods have been proposed to compute matrices U and V. For example, Lee and Seung [21] proposed the definition of a cost function (i.e., ∥A - UV ∥ ²), which can be minimized either by using multiplicative update rules or by using additive update rules of the well-known gradient descent method. In addition, Dhillon and Sra [11] proposed multiplicative update rules that incorporate weights for the importance of each element of the approximation predicted matrix $\hat{A}$ . Please notice that the objective function ∥A - UV ∥ ² is convex either in U only or V only. However, since it is not convex in both variables together, we can only guarantee finding a local minimum solution, rather than a global minimum of the cost function. Thus, since in general the problem has not an exact solution, the computation of U and V is commonly approximated numerically with methods, such as gradient descent or alternating least squares (ALS). Recently, Lin [22] proposed an algorithm to resolve the convergence issues of the optimization procedure. His algorithm guarantees the convergence to a stationary point. However, Lin’s algorithm requires even more execution time per iteration than the slow in execution time of Lee and Seung [21] MF algorithm.

As far as item novelty is concerned, Jannach et al. [17] mention in their research that recommender systems aim at boosting recommendations from the long tail of the item popularity distribution, as it increases sales of novel items. There are several works that try to provide both accurate and novel [3 , 30] or diversified item recommendations [4, 5], where a diversified item recommendation list tries to capture more aspects of the user’s interest. In terms of MF, related work [9, 32] has claimed that by increasing the number of latent factors of the basic MF model [20], we can more accurately recommend novel items. A different research direction in MF formulates the item recommendation problem not as a classification problem, but as a ranking problem using pairs of positive items (in the train set) and negative items (not in the train set) as pairwise input. For example, Bayesian Personalized Ranking (BPR) [27] optimizes a simple ranking loss such as AUC (the area under the ROC-curve) and uses matrix factorization as the ranking function, that can be directly optimized using a stochastic gradient algorithm. Similarly to BPR, Ning and Karypis [24, 25] proposed a set of Sparse LInear Methods (SSLIM), which involve an optimization process to learn a sparse aggregation coefficient matrix based on both a user-item purchase matrix and side information on items.

In contrast to the aforementioned work of Cremonesi et al. [9], and Yin et al. [32], our proposed method incorporates an additional constraint term for novelty into the basic MF formula. Note, that this is in analogy to [6] where an additional constraint term models the perceived utility of users of the different parameters of the rating summary statistics like the average rating value or the total number of rating. In contrast, here the information on the novelty of an item is taken from an external resource of a user-item novelty matrix, which will be defined in the next section. While novelty and accuracy of recommended items are seen as a key feature of the recommendation utility in real scenarios, to our knowledge, there is not much work relating them and systematically measuring trade-offs.

It is useful to make a clear distinction between novelty, diversity and serendipity. Vargas et al. [30] explain that the novelty of an item refers to how different an item is with respect to what has already been experienced by a user or the community. While diversity refers to a set of items, and it is related to how different items are with each other. While serendipity [10] refers to how surprising and interesting is an item for a user. Tomeo et al. [28] extended the regression tree to generate diversified recommendations lists in a multi-attribute setting. In the same direction, Di Noia et al. [12] proposed a method for diversified recommendations by introducing an adaptive multi-attribute diversification method according to the hypothesis that a user who selected many diverse items in the past could be more willing to receive diverse recommendations. Wasilewski and Hurley [16] have proposed a matrix factorization framework to trade-off between the accuracy of item recommendations and the diversity of the items in the recommendation list. In the following, we argue why there is very small overlap between their and our work, by identifying two important differences. The first is that similar to the previous approaches, their MF model computes the pair-wise ranking loss of the objective function (not the element-wise square loss like our methodology). In other words, our MF model is element-wise and predicts the missing values of the user-item rating matrix, whereas their model tries to optimize items’ pairwise ranking. The second difference is that we are exploring the trade-off between item recommendation accuracy and item novelty, whereas they explored the trade off between item recommendation accuracy and item diversity. This difference is discussed further in the discussion section. De Gemmis et al. [10] proposed a methodology to propose non-obvious items and to measure their serendipity by measuring via web-cam the facial expressions of users.

3 Novelty

In this section, we will define the novelty of an item for a target user. We want to be able to measure if an algorithm will recommend more novel items to the users. Table 1 summarizes the symbols used in the following sections.

Table 1
Symbols and definitions

Symbol Definition

k number of nearest neighbors

L _u recommendation list for user u

Top - N size of recommendation list

NN(u) nearest neighbors of user u

P _τ threshold for positive ratings

I domain of all items

U domain of all users

R domain of the rating scale

u, v some users

i, j some items

I _u set of items rated by user u

U _i set of users rated item i

r _u,i the rating of user u on item i

|T| size of the test set

N _i novelty of item i

Symbol	Definition
k	number of nearest neighbors
L _u	recommendation list for user u
Top - N	size of recommendation list
NN(u)	nearest neighbors of user u
P _τ	threshold for positive ratings
I	domain of all items
U	domain of all users
R	domain of the rating scale
u, v	some users
i, j	some items
I _u	set of items rated by user u
U _i	set of users rated item i
r _u,i	the rating of user u on item i
\|T\|	size of the test set
N _i	novelty of item i

The premise of recommender systems is to suggest to users non-trivial items that match their interest, i.e. to make novel item recommendations. By doing this, businesses can increase their profits, since these novel items usually might have higher profit margins. Moreover, users will not get bored and disappointed by just getting trivial recommendations of popular items. In the following, we will define the novelty of a recommended item and how to measure the novelty of a recommendation list.

3.1 Popularity-based item novelty

Figure 1 depicts the item popularity distribution of a well-known dataset, MovieLens 1ML [14], where items are ranked depending on how frequently they have been rated by users. As it is shown in Fig. 1, the ratings of items follow a long-tailed distribution and the novel items correspond to the long-tail items of this item popularity distribution, where few users have rated or interacted with, whereas items of low novelty correspond to popular items.

Fig.1

Popularity distribution of items.

Related work [3] in recommender systems has proposed a lot of definitions of item novelty. However, for a recommender system that consists only of a user-item rating matrix (without any other information about categories of items, domains of users’ interests, etc.), the simple popularity-based novelty definition [3] is more suitable, also known as global long-tail novelty, which focuses on discovering relatively unknown items (coming from the long-tail of the item popularity distribution). Based on the aforementioned arguments, novelty can be defined as the opposite of popularity, which means that an item is more novel if fewer people are aware of it. Thus, we adopt the notion of user inverse frequency [3, 30] to measure the novelty N_i of a recommended item i, by taking the inverse of its popularity, as can be shown by Equation 2: $Novelty (i) = - Popularity (i)$ (1) where Popularity(i) corresponds to the probability that an item is rated or observed or had any other type of an interaction with a user. $Novelty (i) = N_{i} = - \frac{| U_{i} |}{| U |}$ (2) where U_i is the set of users that rated item i, and U is the set of all users.

Based on Equation 2, an item can be considered as more novel, if the users have interacted less with it (i.e., it received less ratings, or it is not enough purchased or it is less observed/viewed). In order to highlight the existence of highly novel items (favoring few very novel items and penalizing many less novel items), we can consider the logarithm of the novelty, as it is shown in Equation 3 [3]:

$N_{i} = - \log_{2} \frac{| U_{i} |}{| U |}$ (3)

The maximal novelty achieved on an items will be considered as:

$N_{\max} = - \log_{2} \frac{1}{| U |},$ (4) where U is the set of users, |U_i|, from Equation 3, the number of times item i was rated is considered to be null.

3.2 Distance-based item novelty

There are recommender systems which possess information - except the user-item rating matrix - about the categories that the items’ belong to, or the focus of users’ interests. For example, in news articles recommendation, we know the category that each article belongs to (i.e., politics, sports, etc.). Thus, when we recommend a news article about sports to a user that has seen a lot of articles about sports, this recommendation cannot be considered to be the same degree of novelty than in case we provide the same article to a person that has never seen an article about sports before. That is, for every user the same item may be differently novel. In the following section, we will define how we can capture this notion of novelty of an item for a user.

Differently to the case of popularity-based item novelty, in the distance-based model [3], also known as unexpectedness, for capturing an item’s novelty we define a distance function between the target item i from the set of items I and the set of items I_u that a user has already interacted with (the user’s past experience). We can formulate this novelty as shown in Equation 5: $N_{u, i} = \frac{\sum_{\begin{matrix} \forall j \in I_{u} \end{matrix}} d (i, j)}{| I_{u} |}$ (5)

Please notice that the distance between two items can be also considered as the complement (i.e. d (i, j) =1 - sim (i, j)) of any similarity measure (cosine-based, Jaccard coefficient, etc.) in terms of the item features (i.e., the category that an item belongs to, the features of an item, etc.) or the user’s item categories profile 1 (i.e. the item categories that a user prefers). To capture how novel a topic category is for a user, we can use Equation 6, which is based on the well-known subtopic recall metric (S-recall) [3], but adjusted to our case scenario: $N_{u, C} = \frac{1}{| {i \in I_{u} : i belongs to category C} |}$ (6) where i is an item and C is the set of all topic categories. Thus, when a user u has interacted with many items that belong in the same category C, then this category will be not so novel for her. In the results we show measurements considering the topic coverage notion of novelty.

An item will be novel if it belongs to a category that has never been seen before, thus, N_max = 1 . 2

3.3 Novelty of recommendation list

For a user u who is recommended N different items, we define as novelty of the L_u recommendation list of items, as follows: $N_{L_{u}} = - \frac{1}{N} \sum_{\forall i \in L_{u}} N_{u, i},$ (7) where N_u,i is the novelty as explained in Section 3.1 if using the popularity-based notion, or Section 3.2 if using the distance-based notion. We have to mention that the aforementioned definitions of Novelty cannot penalize the fact that an item that is less novel is ranked in the recommendation list L_u, above another item that is more novel. To do this, we will define in the following the N-nDCG.

Thus, to obtain a more fine-grained level of granularity we adopt the notion of Novel - normalized Discounted Cumulative Gain (N - DCG_u) [4], which also takes under consideration the relative position of the recommended items inside L_u.

The first step in the computation of N - DCG_u is the creation of the gain vector. In our case, the gain vector for each item l in Lu, consists of its Novelty (N_l) (i.e. as defined in Equation 3 or Equation 6). The second step in the computation of N - DCG_u applies the Discounted Cumulative Gain to the aforementioned gain vector, as shown in Equation 8. $N - {DCG}_{u} = N_{l_{1}} + \sum_{i = 2}^{N} \frac{N_{l_{i}}}{\log_{2} i}$ (8) Based on Equation 8, we discount the gain at each rank inside L_u to penalize items, which are recommended lower in the ranking, reflecting the additional user effort in order to reach them and take the corresponding explanation [30].

The third step is to normalize the N - DCG_u against the “ideal” gain vector. In our case, the “ideal” gain vector considers all recommended items in L_u as having maximum Novelty, N_max (i.e. as defined in Equation 3 or Equation 6). That is, all recommended items in L_u are considered as never seen by any user. Thus, the ideal N-IDCG is calculated as: $N - IDCG = N_{\max} + \sum_{i = 2}^{N} \frac{N_{\max}}{\log_{2} i}$ (9)

Finally, the N - nDCG_u is the ratio between N - DCG_u to N-IDCG: $N - {nDCG}_{u} = \frac{N - {DCG}_{u}}{N - IDCG}$ (10)

3.4 Other novelty metrics

In this section, we adopt two additional metrics for evaluating the novelty. From the work of Vargas et al. [31] we use Expected Popularity Complement (EPC) to measure the popularity based novelty and the Expected Profile Distance (EPD) to measure the distance based novelty, as follows: $EPC = C \sum_{i \in L} disc (k) p (rel | i_{k}, u) N_{i})$ (11) $EPD = C^{'} \sum_{i \in L, j \in u} disc (k) p (rel | i_{k}, u) p (rel | j, u) d (i_{k}, j)$ (12) where C is a constant, k is the position of an item in the recommendation list L, and p (rel|i_k, u) = 1 if the item is in the test set else it is 0, and $disc = \frac{1}{\log_{2} (k + 2)}$ .

3.5 Matrix factorization

Matrix factorization methods are used in recommender systems to derive a set of latent factors, from the user × item rating matrix, to characterize both users and items by this factor vector. The user-item interactions are modeled as the inner product of the latent factors space [20]. Accordingly, each item j will be associated with a vector of factors v_j, and each user i is associated with a vector of factors u_i. An approximation of the rating of a user i on an item j can be derived as the inner product of their factor vectors: ${\hat{r}}_{ui} = u_{i} v_{j}^{T}$ (13)

The u(user) and v(item) factor matrices are cropped to k features and initialized at small values. Each feature is trained until convergence (where convergence specifying the number of updates to be computed on a feature before considering it converged, it can be either chosen by the user or calculated automatically by the package). On each loop the algorithm predicts ${\hat{r}}_{ij}$ , calculates the error and the factors are updated as follows: $v_{jk} \leftarrow v_{jk} + λ * ((r_{ij} - u_{i} v_{j}^{T}) * u_{ik} - γ * v_{jk})$ (14) $u_{ik} \leftarrow u_{ik} + λ * ((r_{ij} - u_{i} v_{j}^{T}) * v_{jk} - γ * u_{jk})$ (15)

The attribute λ represents the learning rate, while γ corresponds to the regularization term.

3.6 Novel matrix factorization

In this section, we propose an algorithmic framework to trade-off between accuracy and novelty in matrix factorization.

For popularity-based novelty, to provide more novel item recommendations, we add an additional soft constraint for novelty into the classic regularized matrix factorization formula as shown in Equation 16: $G_{N o v e l} = \sum_{i}, j \in R {(r_{i} j - u_{i} v_{j}^{T})}^{2} + \frac{β}{2} (‖ u_{j} ‖^{2} + ‖ v_{j} ‖^{2}) + δ ‖ u_{i} - v_{j} ‖ N_{i} j,$ (16) where δ controls the novelty vector and N_ij holds the information of how novel item j is for user i, and β weights the effect of the L1 regularization term. Please notice that ∥u_i - v_j∥ constrains the representations of the user/item vectors in the latent space, such that they are close to each other (i.e., their difference is close to zero), in order to minimize the objective function. In other words, we want to bring the user closer to the novel items in the latent space. To do this, we use the Manhattan distance, which overcomes the problem of Euclidean distance’s metric over high dimensional spaces, since it does not place more emphasis on outliers, which may dominate other smaller weights computed for other normal data points [1]. Then, to minimize the objective function G_novel, we compute the error of the difference among the real and the predicted rating values of items by using a numerical method, such as Gradient Descent, and by applying the following update rules: $\begin{matrix} u_{i}^{'} \leftarrow u_{i} + η \cdot (2 \cdot (r_{ij} - u_{i} \cdot v_{j}^{T}) \cdot v_{j} - \\ β \cdot u_{i} - λ \cdot sgn (u_{i} - v_{j}) \cdot N_{ij}) \end{matrix}$

$\begin{matrix} v_{j}^{'} \leftarrow v_{j} + η \cdot (2 \cdot (r_{ij} - u_{i} \cdot v_{j}^{T}) \cdot u_{i} - \\ β \cdot v_{j} - λ \cdot sgn (u_{i} - v_{j}) \cdot N_{ij}) \end{matrix}$

Henceforth, we call this method Novel Matrix Factorization (NMF). Please notice that MF is just a simplified special case of NMF and can be easily derived from it.

4 Experimental results

In this section, we compare experimentally our approach NMF with the Matrix Factorization [20] algorithm (MF). Moreover, we will use the Maximal Marginal Relevance re-ranker (MMR) [2] combined with the MF algorithm, such that we have in our experiments also a variation of MF, which focuses on providing novel item recommendations. In particular, for re-ranking the item recommendation list provided from classic MF algorithm with MMR, we adapt the following greedy objective function of Saúl Vargas [29], as shown by Equation 17: $a r g m a x [(1 - λ) * {\hat{r}}_{n} o r m (u, i) + λ * a v g_{j} \in L_{u} (1 - s i m (i, j))],$ (17)

where ${\hat{r}}_{norm} (u, i)$ is the normalised predicted rating of user u over item i. It is normalised on the [0,1] scale, such that it can be combined with Jaccard similarity 3 (see the second term of Equation 17), which measures the dissimilarity of items. In particular, it measures the average similarity of an item with all other items which have already taken a position inside the L_u recommendation list, which is to be re-constructed for user u. As can be shown by the λ parameter of Equation 17, there is a trade-off between how relevant an item is being considered by a user, and how much this item differs from the items, which have been already included inside the currently constructed recommendation list. In our experiment we kept the trade-off λ = 0.5.

We implemented the experiment using the functionalities of rrecsys [8] and proxy. 4 To ensure reproducibility of experimental results we share our implementation. 5

4.1 Data sets

Our experiments are performed with four datasets, MovieLens 100K (ML100K), MovieLens 1ML (ML1M), MovieLens 20 ML (ML20ML) [15], and Yelp 6 (see Table 2).

Table 2
Dataset

Characteristic ML-100K ML-1M ML-20M Yelp - L.V.

# of ratings 100,000 1,000,209 20,000,263 215,318

# of users 943 6,040 138,493 5,180

# of items 1,682 3,952 27,278 4,111

# of genres 19 18 19 130

Average # of genres per item 1.67 1.99 1.99 2.02

Rating’s domain [1,5] [1,5] [1,5] [1,5]

Characteristic	ML-100K	ML-1M	ML-20M	Yelp - L.V.
# of ratings	100,000	1,000,209	20,000,263	215,318
# of users	943	6,040	138,493	5,180
# of items	1,682	3,952	27,278	4,111
# of genres	19	18	19	130
Average # of genres per item	1.67	1.99	1.99	2.02
Rating’s domain	[1,5]	[1,5]	[1,5]	[1,5]

ML100K consists of 100,000 ratings assigned by 943 users on 1,682 movies. ML1M contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 users. ML20ML consists of 20 million plus ratings, consisting of 27,278 items rated by 138,493 users. All the MovieLens datasets have at least 20 ratings per user (∀u ∈ U : |I_u|>20). The Yelp dataset consists of a large collection of ratings on businesses. We took a sub-dataset of the dataset consisting only of restaurants, where the novelty hypothesis is more plausible (e.g. if a user eats pizza one day, the next time she might want to try something else). In addition, we limited the recommendations to Las Vegas since the dataset contains more businesses from this city. Still, the dataset displays large sparsity, thus we took a sub-sample of 40 ratings per user (∀u ∈ U : |I_u|>40).

4.2 Experimental protocol and evaluation

Our evaluation considers the division of items of each target user into two sets: (i) the training set E^T is treated as known information and, (ii) the test set E^P is used for testing and no information from the test set is allowed for learning for computing predictions. It is obvious that, and . Therefore, for a target user we generate the recommendations based only on the items in E^T.

In addition to the N-nDCG metric introduced in Section 3.3 we use classic precision and nDCG metrics for measuring the accuracy performance of recommendations. We perform all experiments with 4-fold double-cross validation, with a training-test split percentage, 75% -25%. The default size of the recommendation list N is set to 10, except to the cases where written differently. All algorithms predict the items of the target users’ in the test set. For ML100K, ML1M and ML20M the number of latent factors, update cycles, the regularization term β, the learning rate η to 80, 100, 0.001 and 0.001, respectively. For Yelp-L.V. we set the number of latent factors, update cycles, the regularization term β and the learning rate η is set to 80, 50, 0.0001 and 0.001, respectively. For MMR we keep the trade-off at 0.5, while for the NMF we variate the Novelty-regTerm.

4.3 Sensitivity analysis of NMF

In this section, we want to explore how the performance of both popularity-based and distance-based NMF in terms of providing novel and accurate recommendations is affected, as we increase the impact of the regularization term δ, which controls novelty in Equation 16. Figures 2a, 2b, 2c and 2d show the Popularity-based NMF as we increase δ. As shown, N-nDCG and nDCG are negatively correlated, which signifies that as we increase N-nDCG the nDCG drops. In Figures 3a, 3b, 3c and 3d we depict the distance-based NMF as we increase δ. For the first three datasets, the trend that we noticed with the popularity-based NMF can be also noticed with the distance-based NMF. That is, as N-nDCG increases, accuracy drops and vice versa. While on the Yelp dataset, Fig. 3c, we notice that novelty (both in terms of N-nDCG and EPD) and precision increase together. Given the scenario, this could relate to the users’ inherent need for novelty and the notion of discovery [31]. Although, this first experimental result requires further and intensive investigation. Please notice that the difference in terms of N-nDCG between distance-based and popularity-based NMF is related to the different notions of novelty (see Section 3.1 and 3.2). In summary, for all Movielens datasets, as we increase δ, NMF recommends more novel items but the recommendation accuracy drops drastically, while for the Yelp dataset on the distance-based NMF novelty and precision seem correlated.

Fig.2

Sensitivity Analysis of Popularity-based NMF for (a) the ML100K, (b) the ML1M, (c) the ML20M and (d) Yelp - Las Vegas data sets.

Fig.3

Sensitivity Analysis of Distance-based NMF for (a) the ML100K, (b) the ML1M, (c) the ML20M and (d) Yelp - Las Vegas data sets.

4.4 Comparison with other algorithms

Table 3 shows the performance results for popularity-based novelty (ie., pop-NMF) and MF on four datasets, respectively, when we provide top-10 item recommendations. As it is shown in Table 3, our pop-NMF outperforms MF in terms of a more balanced performance between accuracy and novelty in all four data sets. The reason is that we put in the objective function of the classic matrix factorization and additional soft constraint, which pushes the more novel items to be recommended to the target user. These recommended items are those items which have not been seen by the users in the database (not the popular ones).

Table 3
Algorithms’ comparison performance with top-10 recommended items on 4 data sets

Dataset Algorithm RegTerm (δ) Prec. nDCG N-nDCG EPC

ML100K MF – 12.3% 13.6% 28.7% 19.2%

MF + MMR^* – 3.7% 6.6% 62.8% 18.5%

Pop-NMF 0.2 7.3% 7.7% 30.9% 25.8%

Pop-NMF 0.5 4.6% 4.7% 32.3% 30.8%

ML1M MF – 11.6% 12.6% 17.6% 11.3%

MF + MMR^* – 3.2% 6.0% 56.8% 10.0%

Pop-NMF 0.2 5.1% 5.0% 20.6% 14.4%

Pop-NMF 1 0.8% 0.8% 38.3% 35.4%

ML20M MF – 6.5% 7.0% 18.2% 10.4%

MF + MMR^† – – – – –

Pop-NMF 0.02 3.7% 3.7% 19.4% 14.8%

Pop-NMF 0.08 2.8% 3.3% 32.8% 9.8%

Yelp-L.V. MF – 4.7% 4.9% 13.5% 12.6%

MF + MMR^* – 3.2% 3.7% 17.2% 16.6%

Pop-NMF 0.50 4.5% 4.8% 13.9% 13.1%

Pop-NMF 1 4.3% 4.5% 14.6% 14.8%

Dataset	Algorithm	RegTerm (δ)	Prec.	nDCG	N-nDCG	EPC
ML100K	MF	–	12.3%	13.6%	28.7%	19.2%
MF + MMR^*	–	3.7%	6.6%	62.8%	18.5%
Pop-NMF	0.2	7.3%	7.7%	30.9%	25.8%
Pop-NMF	0.5	4.6%	4.7%	32.3%	30.8%
ML1M	MF	–	11.6%	12.6%	17.6%	11.3%
MF + MMR^*	–	3.2%	6.0%	56.8%	10.0%
Pop-NMF	0.2	5.1%	5.0%	20.6%	14.4%
Pop-NMF	1	0.8%	0.8%	38.3%	35.4%
ML20M	MF	–	6.5%	7.0%	18.2%	10.4%
MF + MMR^†	–	–	–	–	–
Pop-NMF	0.02	3.7%	3.7%	19.4%	14.8%
Pop-NMF	0.08	2.8%	3.3%	32.8%	9.8%
Yelp-L.V.	MF	–	4.7%	4.9%	13.5%	12.6%
MF + MMR^*	–	3.2%	3.7%	17.2%	16.6%
Pop-NMF	0.50	4.5%	4.8%	13.9%	13.1%
Pop-NMF	1	4.3%	4.5%	14.6%	14.8%

^*Trade-off set at 0.5. ^†Server terminated the process due to resource starvation.

While compared to MMR, for ML100K, ML1M and Yelp, our proposed approach displayed again a balanced trade-off between novelty (N-nDCG and EPC) and precision (precision and nDCG).

Lastly, for the distance-based NMF as shown in Table 4, our Cat-NMF method provides in both data sets (ie., ML and Yelp) more novel item recommendations, when it is compared with MF + MMR, with minimum losses in terms of precision/nDCG. We were not able to get results on ML20M with MMR due to size of the dataset, it either required too much time (>5 days) or incurred in memory starvation.

Table 4

Algorithms’ comparison performance with top-10 recommended items on 4 data sets

Dataset	Algorithm	RegTerm (δ)	Prec.	nDCG	N-nDCG	EPD
ML100K	MF	–	12.0%	12.7%	6.9%	3.0%
	MF + MMR^*	–	10.4%	12.1%	10.6%	4.2%
	Cat-NMF	0.5	11.2%	11.5%	16.1%	4.5%
	Cat-NMF	0.9	10.7%	11.0%	21.8%	6.2%
ML1M	MF	–	11.5%	12.4%	9.9%	4.0%
	MF + MMR^*	–	11.0 %	12.1 %	10.4%	6.8%
	Cat-NMF	0.4	9.7%	9.5%	40.6%	7.5%
	Cat-NMF	0.7	7.8%	7.4%	52.1%	9.7%
ML20M	MF	–	6.5%	7.1%	6.1%	2.6%
	MF + MMR^†	–	–	–	–	–
	Cat-NMF	0.06	5.9%	5.6%	37.9%	6.1%
	Cat-NMF	0.1	5.2%	4.8%	50.3%	7.7%
Yelp-L.V.	MF	–	4.5%	4.5%	26.1%	27.2%
	MF + MMR^*	–	4.3%	4.7%	21.8%	18.7%
	Cat-NMF	0.5	4.9%	4.9%	30.8%	32.7%
	Cat-NMF	1	5.1%	5.2%	34.2%	33.7%

^*Trade-off set at 0.5. ^†Server terminated the process due to resource starvation.

5 Conclusions and future work

In this paper, we proposed a new framework for novel matrix factorization, denoted as NMF, that provides both novel and accurate item recommendations. In particular, this article introduced the distance-based item novelty, which extends the simple popularity-based item novelty model. Our empirical results have revealed the trade-off relationships between algorithmic item accuracy and novelty, and our proposed distance-based NMF effectively deals with both of these two aspects. In future work we also want to consider the diversity of recommendation lists. Finally, we want to perform more offline experiments with additional datasets, but also online evaluations of our NMF algorithm with real users to assess if and how users notice the increased novelty according to our proposed measure.

Footnotes

To capture the interaction between users and the item categories they have interacted with, we can build a user-category profile, composed of the user-item rating profile and the item-category profile (e.g., their dot product).

Please notice that every item in our datasets belongs to at least 1 category.

Jaccard similarity is particularly adequate for binary data. In this case we are considering the similarity in the context of the topic coverage.

References

Aggarwal

C.C.

, Hinneburg

, Keim

D.A.

. On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory. pp. 420–434. Springer.

Carbonell

, Goldstein

. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR 98. ACM.

Castells

, Hurley

N.J.

, Vargas

. Novelty and diversity in recommender systems. In: Ricci

, Rokach

, Shapira

(eds.) Recommender Systems Handbook, 2nd edition. pp. 881–918.

Charles

, Kolla

, Cormack

, Vechtomova

, Ashkan

, Buttcher

, MacKinnon

. Novelty and diversity in information retrieval evaluation. In: SIGIR Conference. pp. 659–666. SIGIR 2008. ACM.

Cheng

, Wang

, Ma

, Sun

, Xiong

. Learning to recommend accurate and diverse items. In: Proceedings of the 26th International Conference on World Wide Web. pp. 183–192. WWW ’17.

Coba

, Rook

, Zanker

, Symeonidis

. Decision making strategies differ in the presence of collaborative explanations: Two conjoint studies. In: Proceedings of the 24th International Conference on Intelligent User Interfaces. pp. 291–302. IUI ’19, ACM, New York, NY, USA.

Coba

, Symeonidis

and Zanker

. Novelty-aware matrix factorization based on items’ popularity. In: International Conference of the Italian Association for Artificial Intelligence (2018). pp. 516–527. Springer.

Coba

, Zanker

. Replication and Reproduction in Recommender Systems Research - Evidence from a Case-Study with the rrecsys Library. In: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Arras, France, June, 2017, Proceedings. Springer International Publishing, Cham (2017).

Cremonesi

, Koren

, Turrin

. Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the Fourth ACM Conference on Recommender Systems. pp. 39–46. RecSys ’10, ACM, New York, NY, USA.

10.

De Gemmis

, Lops

, Semeraro

and Musto

, An investigation on the serendipity problem in recommender systems, Information Processing and Management51(5), 695–717.

11.

Dhillon

I.S.

and Sra

. Generalized nonnegative matrix approximations with bregman divergences. In: NIPS. pp. 283–290 .

12.

Di Noia

, Rosati

, Tomeo

and Di Sciascio

, Adaptive multi-attribute diversity for recommender systems, Information Sciences382, 234–253.

13.

Furnas

, Deerwester

and S.e.a.: Dumais, Information retrieval using a singular value decomposition model of latent semantic structure. In: Proccedings ACM SIGIR Conference. pp. 465–480

14.

Harper

F.M.

and Konstan

J.A.

. The movielens datasets: History and context, ACM Trans Interact Intell Syst5(4), 19:1–19:19.

15.

Harper

F.M.

and Konstan

J.A.

. The MovieLens Datasets, ACM Transactions on Interactive Intelligent Systems5(4), 1–19.

16.

Hurley

N.J.

. Personalised ranking with diversity. Proceedings of the 7th ACM conference on Recommender systems - Rec- Sys ’132(1), 379–382.

17.

Jannach

, Lerche

, Kamehkhosh

and Jugovac

. What recommenders recommend: An analysis of recommendation biases and possible countermeasures, User Modeling and User-Adapted Interaction25(5), 427–491.

18.

Jannach

, Resnick

, Tuzhilin

and Zanker

. Recommender systems—beyond matrix completion, Commun ACM59(11), 94–102.

19.

Jannach

, Zanker

, Ge

and Groening

. Recommender Systems in Computer Science and Information Systems – A Landscape of Research123, 76–87.

20.

Koren

, Bell

and Volinsky

. Matrix Factorization Techniques for Recommender Systems, Computer42(8), 42–49.

21.

Lee

D.D.

and Seung

H.S.

. Learning the parts of objects by nonnegative matrix factorization, Nature401, 788–791.

22.

Lin

C.J.

. On the convergence of multiplicative update algorithms for nonnegative matrix factorization, IEEE Transactions on Neural Networks18(6), 1589–1596.

23.

Mahoney

M.W.

and Drineas

. Cur matrix decompositions for improved data analysis, Proceedings of the National Academy of Sciences106(3), 697–702.

24.

Ning

, Karypis

. Slim: Sparse linear methods for top-n recommender systems. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 497–506. IEEE.

25.

Ning

, Karypis

. Sparse linear methods with side information for top-n recommendations. In: Proceedings of the sixth ACM conference on Recommender systems. pp. 155–162. ACM.

26.

Pariser

. The filter bubble: What the Internet is hiding from you. 2011, Penguin Press.

27.

Rendle

, Freudenthaler

, Gantner

, Lars

S.T.

. BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. pp. 452–461. UAI ’09, AUAI Press, Arlington, Virginia, United States.

28.

Tomeo

, Di Noia

, de Gemmis

, Lops

, Semeraro

and Di Sciascio

, Exploiting regression trees as user models for intent-aware multi-attribute diversity. In: 2nd Workshop on New Trends in Content-Based Recommender Systems – 2015 ( CBRecSys co-located with RecSys). pp. 2–9.

29.

Vargas

. New approaches to diversity and novelty in recommender systems. In: Fourth BCS-IRSG symposium on future directions in information access (FDIA 2011), Koblenz. vol. 31.

30.

Vargas

, Castells

. Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems. pp. 109–116. RecSys ’11, ACM, New York, NY, USA.

31.

Vargas

and Castells

. Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the fifth ACM conference on Recommender systems – RecSys ’11. pp. 109. ACM Press, New York, USA.

32.

Yin

, Cui

, Li

, Yao

and Chen

. Challenging the long tail recommendation, Proceedings of the VLDB Endowment5(9), 896–907.