Real-time incremental recommendation for streaming data based on apache flink

Abstract

Collaborative filtering (CF), one of the most famous methods for building recommendation systems, recommends relevant items to users or predicting ratings of users’ unknown items. Matrix factorization (MF) models are well-known model to deal with predicting the rating problem. However, the recommendation system based on matrix factorization is hard to keep up with the rapidly changing real-world data. When ratings on new users or new items come, the static model can not fit well on new data. As a consequence, if the current thing does not apply, the prediction accuracy will lose. In addition, it is a significant computation cost to rebuild the model on the whole data. To capture these changes, in this paper, we construct an online-and-offline Collaborative Filtering with a multi-method model to improve the traditional CF method, called Online SGD with Offline Knowledge (OSGDO for short). Besides, we propose a real-time incremental recommendation framework on Apache Flink, which is a scalable stream and batch data processing platform. Meanwhile, we implement our proposed method on our proposed framework. Our method proves to be good at online training when new observations arrive. And the results of experiments show that the dynamic training process we proposed is more efficient than rebuild the model on all the data. At the same time, our algorithm performs well in practice and can achieve impressive accuracy quickly when it is tested with the well-known data sets of MoviesLens and Netflix.

Keywords

Collaboratie filtering online learning incremental learning recommendation system low-rank matrix factorization Apache Flink

1. Introduction

Recommendation systems play an important role in large-scale e-commerce platform like Amazon, Google, and Netflix to solve information overload problem [3, 19, 22]. Traditional CF models for recommendation systems are built on static training data sets and make predictions when new data come, which are unable to adapt to real-time or incremental scenarios. However, users’ behaviours in large-scale e-commerce system usually change fast, so that the user-item rating-matrix changes rapidly in real time [21]. A large number of prior work focuses on improving traditional models by using online learning algorithms. In [1], an online Stochastic gradient descent (SGD) method on MF with (or without) features by minimizing the square loss is proposed to convert a batch-trained algorithm into an online version. Rendle and Schmidt-Thieme [26] propose a regularized kernel MF models which can be updated online. Agarwal et al. [2] present an online learning model which combines feature-based regression and user-item specific learning in a single framework named FOBFM. Yu et al. [36] design one-sided least squares (One-sided LS) to reduce computation and storage cost of incremental learning of MF models. In this line of work, online learning algorithms, updating recommendation model incrementally, achieve good performance and reduce the response time (the time from a user providing input to receiving recommendations from system [34]).

Batch-processing platforms and stream-processing platforms are widely used in large-scale systems. In batch-processing distributed frameworks, Apache Hadoop [8] and Apache Spark [28] are well-known. They all solve the problem of big data offline processing. Hadoop has a rich ecosystem, so it is more adaptable than Spark. However, Spark’s processing speed is faster than Hadoop because it stores intermediate results in memory while Hadoop stores them on disk. In stream-processing distributed frameworks, there are some popular tools for choosing, including: Apache Spark, Apache Storm [13] and Apache Flink [6]. Apache Storm is an open source streaming-processing framework. It empowers developers to build real-time distributed processing systems, which can process the unbounded streams of data very fast. It is also called Hadoop for real-time data. For Apache-Spark, its stream processing component is based on Micro-Batching approach. The incoming data stream is split into receivers and created as Micro-Batches and then processed like other Spark tasks. About Apache Flink, for stream data, processing procedure will execute continuously as long as data are being produced. Besides, Apache Flink supports different notions of time (event-time, ingestion-time, processing-time) in order to give programmers high flexibility in defining how events should be correlated [7].

In some scenarios, some work applies recommendation algorithms to the batch-processing frameworks (e.g. Apache Hadoop, Apache Spark et al.) for solving “big data” problem in recommendation system [37]. Meng et al. [23] propose a keyword-aware service recommendation application which implements on Apache Hadoop. Verma et al. [30] present a recommendation system for a large amount of data available on the web by using Hadoop. Wang et al. [33] use a weighted method which combines CF and the content-based recommendation algorithms to implement a fast recommendation system on Apache Spark. Furthermore, stream-processing frameworks are also frequently utilized by large-scale e-commerce recommendation platform, so that the “real-time” challenges can be solved [11]. Correspondingly, some work applies recommendation algorithms in the stream processing frameworks (e.g. Apache Storm, Apache Flink etc.) to complete the recommendation tasks. Huang et al. [11] build a real-time stream recommender system on Apache Storm by making use of item-based CF, the content based, and the demographic based algorithms. In addition, Ciobanu and Lommatzsch [7] develop a news recommender system on Apache Flink. In this line of work, most of them use traditional recommendation algorithms, because they are classic and practical. However, little work in this branch utilizes the advantages of incremental recommendation algorithms.

Inspired by the above observations, in this paper, we absorb the advantages of online learning algorithms to propose an online learning algorithm: OSGDO and construct a Real-time Incremental Recommendation Framework for streaming data system (RIRF for short) based on Apache Flink. Our framework treats new information generated by users as streaming data and uses OSGDO to incrementally update our recommendation model by using the information. Also, we make use of numerous historical data to initialise our model, so that our framework can both consider short-term memory of information which provided by new streaming data. Meanwhile, it also holds a long-term memory of information contained in the historical data. In a nutshell, our framework combines incremental recommender system with streaming data-processing platform.

The main contributions of our work are as follows:

•
We propose a novel recommendation framework called RIRF for short, which can apply incremental learning in large-scale recommendation system. Especially, our framework is the first time to combine online learning and offline training with Apache Flink. Meanwhile, we introduce the general framework of RIRF in Section 3.2.
•
We propose a novel online learning algorithm named OSGDO, which utilizes historical data and online learning to update recommendation model incrementally. We introduce our algorithm in Section 3.3.
•
We conduct extensive experiments based on widely-used benchmark data sets (MovieLens and Netflix). The results of experiments in Section 4.2 prove that our algorithm and framework have good performance both in accuracy and efficiency.

2. Related work

This section briefly reviews the background of some major groups of related work, including traditional CF and online CF.

CF recommender systems are generally classified into two kinds: memory-based methods and model-based methods. Memory-based recommendation methods are based on the ratings that users rated. User-based CF and item-based CF belong to the memory-based recommendation algorithms. Memory-based methods are easy to implement and understand, they are widely used in real world [27, 5, 17]. However, there are some limitations in some aspects. First, they are more easily effected by the data sparsity problem because the more raw data they have, the higher accuracy of predicting they can achieve. Furthermore, they manipulate the ratings directly in memory, the time complexity of calculation and recommendation is high. Besides, the memory consumption can be potentially very expensive.

In general, model-based approaches train a predefined model in the training step that explains observed ratings, which is used to make recommendations later. Usually, model-based CF methods can achieve better performance [15]. Various approaches of model-based are as follows, including SVD, NMF [16], biased SVD [25], PMF [24], SVD $++$ [14]. In addition, Model-based methods have several advantages. First, the recommendation can explain by their model easily. Second, the recommendation model only need to be trained once, then they can make recommendations more efficiently than memory-based methods. However, most model-based methods can only apply to the batch-training scene. If new data arrive, the existed model cannot fit the new data well. Then it must retrain the full data set to build a new model to fit new data, which is a cost of time and resource. However, online learning algorithms can solve the problem well.

Online learning algorithms are extensively studied recently to cope with incremental recommendation problem. Part of work converts the offline MF algorithm into online MF algorithm [1, 18], and some work combines multi-task processing with collaborative filtering algorithms [32]. Meanwhile, some work integrates the results of offline training algorithm into online regression algorithms [2]. There is also some work, which combines offline MF and online matrix MF [21, 31, 36].

The following studies are similar to our work. Among them, an early work is [1], a novel online version of MF with (or without) features using SGD is proposed to deal with incremental recommend problem. However, it does not take regularization effects into consideration. In [31], an incremental update of MF model by SGD is studied, named ISGD. However, they initialise the latent factor vectors randomly, which may affect the accuracy.

In [20], the authors propose an incremental Regularized MF model with linear biases which can incrementally update specified latent feature through constructing the expression over new data based on the trained model over historical data. In other words, it can incrementally learn from new data rather than retrain the entire recommendation model. Besides, it incorporates with linear biases, which can increase the recommendation accuracy. However, it needs to store intermediate training results in external files after each epoch, which leads to high storage space. And its linear biases are not trained along with the latent factors but estimated through the unbiased estimator using fixed parameters, which may not able to reflect real information.

Other incremental learning methods also have their own characteristics. Wang et al. [32] focus on online multi-task CF, which trades off between efficacy and efficiency. It not only updates the weight vectors of the user related to the current observed data but also the weight vectors of some other users according to the users interaction matrix. However, it would be slightly less efficient due to the cost of multi-task learning.

Luo et al. [21] design a general incremental-and-static combined scheme for MF-based CF recommenders, whose main idea is dividing rating-matrix $R$ into independent sub-matrices. Yet it would lose prediction accuracy. Meanwhile, the rating-variation-effect maybe also exists in sub-matrices. In contrast, we focus on combining offline knowledge and online learning algorithms. Besides, in online updating parse, we only update the latent factor vector related with coming data to address the rating-variation-effect.

In [2], the authors propose an incremental recommender model named FOBFM which combines feature-based regression and user-item specific learning, yet their model requires extra information about users or items. The information is similar to the age data or gender data, rather than factors trained by MF. And the information is not available under many circumstances like in the Netflix Prize.

Another related work proposes online CF algorithm using ALS in [36], which turns traditional ALS into an online version (One-sided ALS) and uses the result of traditional ALS as initial conditions for One-sided ALS. However, it only updates the one-side latent factor vector at each update procedure when new data coming. For instance, if an old user rates a new item, the One-Sided ALS only updates the latent factor vector of the item instead of updating the latent factor vectors of the user and the item together. In contrast, we think that the vectors of the user and the item will change synchronously after a user rating an item. Therefore, we update the related latent factor vectors of the user and the item at the same time.

Inspired by related work, our work proposes a novel incremental learning algorithm OSGDO, which combines offline knowledge and online MF. We update both the user’s and the item’s latent factor vectors and biases when new data come. Compare with the models we mentioned earlier, our algorithm’s idea is more concise and effective. Moreover, we implement our algorithm in Apache Flink and propose an incremental recommend framework named RIRF. To the best of our knowledge, no existing work has attempted to combine offline training and online learning in Apache Flink.

3. Real-time incremental recommendation framework

In this section, at first, we briefly review the background of CF and MF algorithms which are widely used in the recommendation system. Then we introduce the general structure of RIRF and the detail of our proposed online learning algorithm for incremental recommendation system: online SGD with offline knowledge.

3.1 Recommendation algorithms

CF methods are one of the major approaches to build recommendation systems [18]. MF is one of the most popular methods of CF due to its outstanding performance in rating prediction [15]. It decomposes a large sparse rating matrix into two or more small latent factor matrices, in which each row means user’s or item’s features. In MF algorithms, user-item matrix is denoted by $R_{m\times n}$ , where $m$ represents the number of users and $n$ represents the number of items. Generally, we decompose the user-item matrix into two matrices: $U_{m\times k}$ and $V_{k\times n}$ which mean user latent factor matrix and item latent factor matrix. Therefore, an approximate score matrix $\hat{R}$ can be computed by multiplying $U_{m\times k}$ and $V_{k\times n}$ . All the missing rating $r_{ij}$ in matrix $R$ can be approximated by the inner product of latent factor vectors (also called feature vectors): $U_{i}$ and $V_{j}^{t}$ . $U_{i}$ is the $i_{th}$ row in $U$ which means the $i_{th}$ user feature factors. $V_{j}$ is the $j_{th}$ row in $V$ which means the $j_{th}$ item feature factors either.

In order to learn latent factor vectors to make $\hat{R}$ approximate $R$ well. The system minimizes the squared error on the set of known ratings. Usually, it uses a regularized model to avoid the over-fitting problem. In general, the object function of MF methods is defined as:

$\displaystyle\min_{U_{*},V_{*}}\sum_{(i,j)\in D}(R_{ij}-U_{i}V_{j}^{t})^{2}+% \lambda(\|U_{i}\|^{2}+\|V_{i}\|^{2})$ (1)

where $D$ is the set of the $(i,j)$ pairs, $R_{ij}$ is the real rating that user $u$ rates item $i$ (the training set), and $\lambda$ is the L2-regularization strength parameter. The regularization term $\lambda(||U_{i}||^{2}+||V_{j}||^{2})$ is used to avoid overfitting. Matrices $U$ and $V$ are usually randomly initialised in MF initialisation process.

Although MF achieves high accuracy in static data sets, it has a potential problem: when new observations are coming, it should rebuild the model to apply incoming data, which costs too much time and resources [29].

Recently, online learning algorithms achieve attractive performance in incremental recommendation. Instead of rebuilding the whole model, some online learning algorithms only need to update part of the model that related incoming data. Therefore, the online learning algorithms are more adaptable to the scene in which information changes rapidly than the traditional recommendation algorithms. In [1], an online MF algorithm is proposed. In [36], One-sided LS for incremental learning is proposed and a combined method of One-sided LS and offline ALS is proposed in the work as well.

3.2 General framework of RIRF

We construct a framework in order to combine incremental recommendation with Apache Flink. Apache Flink is the most popular stream processing framework currently. It utilises in-memory computation to reduce disk IO. In addition, it has its own memory management within JVM. So that, it can spill data or something else to disk when memory is not enough. Moreover, the stream processing of Apache Flink is a real stream processing. It would process data immediately and pass them to next operator without waiting for a whole data batch like Apache Spark Streaming. Our framework is designed for processing incoming stream composed of new observations and updating the recommendation model continuously. Besides, our proposed online updates algorithm uses the Propagation Cut-Off Mechanism to ensure that it only updating the information related to new observations rather than the whole model.

Figure 1.

Overview of RIRF.

The framework is mainly composed of six parts: DataReceiver Component, Offline Component, Online Component, Recommender Model, Storage Module and Result Return Module. All of them are interrelated and interdependent. The general framework of RIRF is shown in Fig. 1 and details of these components are as follows:

•

DataReceiver Component: It delivers data to Recommender Model or Online Component. We divide the data into two types: recommended input data and incremental information data. The format of recommended input data is like $(\textit{user},\textit{item})$ . It represents a recommendation task, which means that we need to give a prediction rating to measure the extent to which the user likes the item. And the format of incremental information data is similar to $(\textit{user},\textit{item},\textit{rating})$ . It represents an updating task, which means that we need to use this information to update the recommendation model.

•

Offline Component: It is mainly used to train historical data and get the latent factor vectors and biases for users and items. Most of traditional MF algorithms can serve as Offline Component, such as SVD, NFM etc. The result of the Offline Component is the initialisation condition of the Online Component.

•

Online Component: When incremental information data come, the Online Component learns new information incrementally and update the latent factor vectors and biases of users and items online. Note that these latent factor vectors and biases are initialised by using the results of Offline Component.

•

Recommender Model: The main function of this component is using the latest recommendation model to predict users’ ratings of items.

•

Storage Module: The main function of the Storage Module is saving the offline training results, which contains the latent factor vectors and biases about users and items. In addition, it also stores the prediction ratings of related users and items.

•

Result Return Module: The Result Return Module obtains ratings about related users and items from the Storage Module and displays the corresponding information.

In this part, we describe our framework’s workflow. At first, we use historical data to build an offline recommendation model as the Offline Component. Meanwhile, we persist the model into the Storage Module. Then we determine the corresponding operations based on the different types of data received by the DataReceiver Component. There are two types of data: recommended input data and incremental information data. From Fig. 1, we can see that, when incremental information data come, we (1) deliver data to Online Component, and (2) learn new information and update recommendation model by making use of online learning algorithm; when recommended input data come, we (1) deliver data to Recommender Module, (2) predict the ratings by making use of latest latent vector contained in Recommender Module, and (3) store results to Storage Module. Then, we display these results by making use of Result Return Module.

3.3 Online SGD with offline knowledge in Online Component

In our proposed RIRF, online learning algorithms, which can update the latent factor vectors and the biases of users and items online, are used in the Online Component. Despite the fact that most of existing online learning algorithms based on MF are able to be applied to our Online Component, they are still not clear and efficient enough. Thus, we propose a new online learning algorithm named OSGDO for adapting Apache Flink. In this section, we first expatiate our propagation cut-off mechanism and then introduce the details of our proposed algorithm: OSGDO.

Figure 2.

The difference between traditional way and Cut-Off Mechanism when updating latent factor vectors on $p_{u}$ and $q_{i}$ .

3.3.1 Propagation Cut-Off mechanism

When new data come, the model needs to update the users’ latent factor vectors and items’ latent factor vectors respectively. Since the ratings that the user rates on the previous corresponding items are fixed, when the user’s vector changes, the vectors of these items would change either. In addition, it is same to the case when the item’s latent factor vector changes. For instance, in Fig. 2a, $p_{u}$ and $q_{i}$ stand for the latent factor vectors of user $u$ and item $i$ respectively. Meanwhile, $q_{a}$ and $q_{n}$ are the vectors of the items, which are rated by user $u$ . Besides, $p_{c}$ and $p_{x}$ are the vectors of user $c$ and $x$ who rate item $a$ . When a new observation $\left<u,i,r\right>$ coming, the $p_{u}$ would be updated in the online learning phase. However, ratings of item $a$ , item $n$ and other items are fixed. As we mention before, a rating is the inner product of related user’s vector and related item’s vector. In the case of the score being fixed, these item latent vectors including $q_{a}$ and $q_{n}$ would change together on account of the updating of $p_{u}$ . Also, $p_{c}$ and $p_{x}$ would change later in the similar reason. These changes on users and items would propagate widely and make the updating procedure become more complicated, as depicted in Fig. 2a. In order to avoid large-scale dissemination of updates and ensure the process speed of recommendation, we propose a concept named Propagation Cut-Off mechanism that we restrict the updating operations to users and items that directly correspond to incoming data. And we implement it into our proposed algorithm. That is, OSGDO only updates the latent factor vectors and biases corresponding to user’s id and item’s id which are included in Step1, as depicted in Fig. 2b.

3.3.2 Online SGD with offline knowledge

As mentioned before, our Online Component is built on the results of Offline Component. So that, we use the Offline Component to build an offline recommendation model to obtain offline knowledge including latent factor matrices and bias sets, as depicted in Fig. 3. $O P$ and $O Q$ are the user’s latent factor matrix and item latent factor matrix respectively. And all the biases of users and items are included in User bias set and Item bias set.

Figure 3.

Offline MF’s results: Latent matrices and bias sets.

Given an observation like $(u,i,r)$ in incremental data set, we obtain the user latent factor vector of $u$ and the item latent factor vector of $i$ from $O P$ and $O Q$ respectively. Assuming that $\textit{offuser}_{u}$ is the latent factor vector of $u$ and $\textit{offitem}_{i}$ is the latent factor vector of $i$ . If both of user $u$ and item $i$ exist in $O P$ and $O Q$ , we initialise the latent factor vector $\textit{onuser}_{u}$ and $\textit{onitem}_{i}$ of OSGDO with $\textit{offuser}_{u}$ and $\textit{offitem}_{i}$ respectively. If user $u$ or item $i$ doesn’t exist in $O P$ and $O Q$ , we randomly initialise the corresponding vector according to a gaussian distribution, whose mathematical expectation is 0 and variance is 0.1.

Finally, we use $\textit{onuser}_{u}$ and $\textit{onitem}_{i}$ as latent factor vectors to approximate prediction rating to real rating, then update the recommendation model after several iterative calculations. However, it would be unwise to explain the full rating value by using inner product: $\left<\textit{onuser}_{u},\textit{onitem}_{i}^{T}\right>$ . For example, in a data set, it exhibits tendencies for some users to give higher ratings than others, and for some items to receive higher ratings than others, which would make the recommendation result less accurate. An effective way to further improve the prediction accuracy of MF is combining with biases [15]. Therefore, we define $\textit{onuserbias}_{u}$ as the user’s bias and $\textit{onitembias}_{i}$ as the item’s bias. For biases, we also initialise it in a similar way. If the user has rated other items or the item has been rated before, we initialise their $\textit{onuserbias}_{u}$ and $\textit{onitembias}_{i}$ with related $\textit{offuserbias}_{u}$ and $\textit{offuserbias}_{i}$ respectively, which are results of Offline Component. Otherwise, we initialise it to 0. The model only updates the latent factor vectors and biases which are related to $u$ and $i$ rather than whole latent factor vectors and biases. The loss on this observation is $L(R,(u,i,r))$ where $R$ is the rating matrix and $r$ is the real rating of user $u$ and item $i$ . We use the square loss as the loss function, although other convex loss functions can be implemented too. Thus, the lost function with regularization is defined as:

$\displaystyle L(R,(u,i,r))=\frac{1}{2}(r-\hat{r}_{ui})^{2}+\frac{\lambda}{2}(% \|\textit{onuser}_{u}\|^{2})+\frac{\lambda}{2}\textit{onuserbias}_{u}^{2}$ (2) $\displaystyle\quad{}+\frac{\lambda}{2}(\|\textit{onitem}_{i}\|^{2})+\frac{% \lambda}{2}\textit{onitembias}_{i}^{2}$ $\displaystyle\hat{r}_{ui}=\mu+\textit{onuserbias}_{u}+\textit{onitembias}_{i}+% \sum_{l=1}^{k}\textit{onuser}_{ul}\textit{onitem}_{il}$ (3)

$\textit{onuser}_{u}$ and $\textit{onitem}_{i}$ are the latent factor vectors of user $u$ and item $i$ in online learning phase respectively. $\lambda$ is L2-regularization strength parameter for avoiding over-fitting. The $k$ is the dimension of latent factor vectors and $\mu$ is the average rating of the training set. The elements of $\textit{onitem}_{i}$ are $\textit{onitem}_{il}$ , which measure the extent to which the item possesses those factors, positive or negative. Besides, the elements of $\textit{onuser}_{u}$ are $\textit{onuser}_{ul}$ , which measure the extent of interest that the user has in items that are high on the corresponding factors, positive or negative [15]. To a certain extent, it reduces the computational complexity since the updating phase is only dependent on $k$ and not on a full matrix which can be quite large.

The online update formulas for latent factor vectors and biases are as follows:

$\displaystyle\textit{onuser}_{u}\leftarrow\textit{onuser}_{u}+\gamma((r_{ui}-% \hat{r}_{ui})\cdot\textit{onitem}_{i}-\lambda\textit{onuser}_{u})$ (4) $\displaystyle\textit{onitem}_{i}\leftarrow\textit{onitem}_{i}+\gamma((r_{ui}-% \hat{r}_{ui})\cdot\textit{onuser}_{u}-\lambda\textit{onitem}_{i})$ (5) $\displaystyle\textit{onuserbias}_{u}\leftarrow\textit{onuserbias}_{u}+\gamma(r% _{ui}-\hat{r}_{ui}-\lambda\textit{onuserbias}_{u})$ (6) $\displaystyle\textit{onitembias}_{i}\leftarrow\textit{onitembias}_{i}+\gamma(r% _{ui}-\hat{r}_{ui}-\lambda\textit{onitembias}_{i})$ (7)

where $\gamma$ is the learning rate of OSGDO. Meanwhile, $\hat{r}_{ui}$ is the predicted rating, which is presented at Eq. (3).

Our algorithm combines SGD with momentum as its optimization method. Compared with traditional SGD, it uses the gradient at a single rating as an estimate of the global gradient. One disadvantage of the SGD method is that its update direction is completely dependent on the current batch, so its update is unstable. A suitable solution is to use the Momentum mechanism, which can speed up the speed of convergence. Therefore, the Eqs (4) and (5) of updating latent factor vectors would be replaced with Eqs (9) and (11).

$\displaystyle\textit{user}_{\textit{mom}}\leftarrow(\theta\textit{user}_{% \textit{mom}}+\gamma((\hat{r}_{ui}-r_{ui})\cdot\textit{onitem}_{i}+\lambda% \textit{onuser}_{u})$ (8) $\displaystyle\textit{onuser}_{u}\leftarrow\textit{onuser}_{u}-\textit{user}_{% \textit{mom}}$ (9) $\displaystyle\textit{item}_{\textit{mom}}\leftarrow(\theta\textit{item}_{% \textit{mom}}+\gamma((\hat{r}_{ui}-r_{ui})\cdot\textit{onuser}_{u}+\lambda% \textit{onitem}_{i})$ (10) $\displaystyle\textit{onitem}_{i}\leftarrow\textit{onitem}_{i}-\textit{item}_{% \textit{mom}}$ (11)

We define the symbol $\textit{user}_{\textit{mom}}$ and $\textit{item}_{\textit{mom}}$ as the values of the user’s momentum and item’s momentum respectively. And $\theta$ is the attenuation parameter for momentum, usually 0.9.

In conclusion, we use the latent factor vectors and biases obtained from Offline Component in RIRF as the initialisation parameters of OSGDO and update the recommendation model after multiple rounds of iterative online learning.

4. Experiments

In this section, we perform extensive experiments to study the efficiency of our OSGDO method in terms of prediction accuracy and learning time. In our experiments, we do not use any extra characteristics such as gender and age of users or information about movies.

We compare the proposed algorithm with other existing online CF algorithms. Besides, we conduct multiple experiments on each data set to ensure the stability and accuracy of the results of these experiments.

4.1 Experimental setting

4.1.1 Date sets

In our experiments, we use five different and classical data sets in our experiments including MovieLens 100K, MovieLens 1M, MovieLens 10M, MovieLens 20M [9] and Netflix [4]. All of the MovieLens data sets are collected by the GroupLens Research Project at the University of Minnesota through the MovieLens website (https://grouplens.org/datasets/movielens/). Besides, the Netflix data set is provided by Netflix (http://www.netflixprize.com).

MovieLens 100K is composed of 100,000 ratings from 1,683 movies rated by 943 users. In addition, it provides other users’ information such as gender and age. Its rating density is 6.30%.

MovieLens 1M contains 1,000,209 ratings from 6,040 users of 3,883 movies with a rating scale on [0, 5] interval. The density of it’s rating matrix is 4.25%. Meanwhile, there are some user and item features in the data set such as users’ age, occupation and gender, and movies’ genres.

MovieLens 10M contains 10,000,054 ratings and 95,580 tags applied to 10681 movies by 71,567 users of the online movie recommender service MovieLens. In addition, its rating density is 1.31%.

MovieLens 20M contains 20,000,263 ratings and 465,564 tag applications across 27,278 movies. And it is created by 138,493 users. In our experiments, we do not use the tags information.

Netflix data set contains 100,480,507 ratings by 480,189 users on 17,770 items, its rating density is 1.18%. In our experiment, we use its subset containing the ratings on the first 1,000 items (ranked by items ID) of the Netflix data set. Note that we choose this subset because it can represent the whole Netflix data set well. The subset we choose contains 5,010,199 ratings by 404,555 users on the first 1,000 items and its rating density is 1.24%.

4.1.2 Evaluation method

The recommender is evaluated by prediction accuracy and execution time spent on online learning, which is measured in elapsed time. To evaluate the performance of our RIRF framework and incremental learning OSGDO, we use Root Mean Squared Error (RMSE) [10] to evaluate prediction accuracy, which is a widely used metric for evaluating the statistical accuracy of recommendation algorithms. Its formula is as follows:

$\displaystyle\textit{RMSE}=\sqrt{\left.\sum_{u,i\in D}\left({r_{ui}-\hat{r}_{% ui}}\right)^{2}\middle/\left|D\right|\right.}$ (12)

In the Eq. (12), $D$ stands for the validation data set, $r_{ui}$ is the real rating and $\hat{r}_{ui}$ is the prediction rating. In rating-oriented recommendation system, low RMSE usually means high prediction accuracy.

4.1.3 Experiment settings

Our experiments are running on a server with 8 processors and 8 GB memory. In addition, we use another server with the same configuration as Redis server to store latent factor vectors and biases. We implement our RIRF framework and OSDGO algorithm in Apache Flink to evaluate the execution time in this platform. We use Surprise’s [12] implementation of SVD to generate the offline knowledge of latent matrices and bias sets before online learning of OSGDO. Moreover, there are many other offline training methods to be chosen.

4.2 Experimental result

In order to prove that our proposed framework and algorithm can deal with the problem that the traditional recommendation system encounters in online scenes, we split the data set according to the following division plan. At first, we divide each data set into 80% and 20% percent: 80% of the data set is used as offline training data $D_{\textit{off}}$ and 20% of the data set is set as test data $D_{\textit{test}}$ . In our experiments, we use $D_{\textit{off}}$ of data set to build an offline recommendation model to simulate an offline recommendation scenario. Besides, we further partitioned $D_{\textit{test}}$ into $D_{\textit{inc}}$ and $D_{\textit{validation}}$ . And we use the $D_{\textit{inc}}$ as the incremental data to simulate the online scene. In addition, the $D_{\textit{validation}}$ is used to verify the accuracy of our algorithms. We denote the number of ratings as $N_{\textit{inc}}$ , which are chosen per user or item. We collect these data for incremental learning and denote them as $D_{\textit{inc}}$ . For each user or item in the incremental learning, their $N_{\textit{inc}}$ ratings are randomly selected from $D_{\textit{test}}$ and the rest is used for prediction marked as $D_{\textit{validation}}$ .

4.2.1 Accuracy comparison

In this section, we implement OSGDO as the algorithm in Online Component of RIRF and use SVD (based on Surprise) as the offline algorithm in the Offline Component of RIRF. We compare our method OSDGO with other emerging online CF algorithms. Please note that, some users or items do not have a lot of rating data in these data sets, so that we set $N_{\textit{inc}}$ to 10 in the following comparisons. Meanwhile, we fix the L2-regularization strength parameter equals to 0.02. Specifically, the compared algorithms in our experiments include:

•
“OCF”: An online matrix factorization algorithm, which can learn a low-rank matrix online by online gradient descent method. It is described in [1];
•
“DA-OCF”: The online MF algorithms with Dual-Averaging method, which is proposed in [18];
•
“OMTCF-VI”: An online Multi-Task Collaborative Filtering (OMTCF) [32];
•
“One-sided LS”: An incremental learning approach, which derived from Alternating Least Squares (ALS). It is raised in [36];
•
“IRMFB”: An incremental Regularized MF with linear biases method, which is proposed in [20];
•
“SICF”: A scalable item-based collaborative filtering method by using incremental update and local link prediction, which is described in [35];
•
“FOBFM”: A model learns item-specific factors quickly through online regression and offline training with historical data by expectation maximization (EM) algorithm [2].

First, on the data set of MovieLens 100K, we compare OSGDO with several methods, including OCF, DA-OCF, and OMTCF-VI. Then, on MovieLens 1M, we compare OSGDO with OCF, DA-OCF, OMTCF-VI, IRMFB, SICF, FOBFM. Besides, we also make some comparisons on large-scale data sets. On MovieLens 10 M, we compare OSGDO with OMTCF-VI, One-sided LS, OCF, and DA-OCF. In addition, we also compare our method with One-sided LS on MovieLens 20 M. On Netflix data set, we compare OSGDO with IRMFB.

According to Table 1, several observations can be drawn from the results. Please note that the results of OCF, DA-OCF, OMTCF-VI (one of the most efficient algorithms in OMTCF algorithms), and One-sided ALS are collected from Wang et al. [32] and Yu et al. [36] respectively. In addition, the remaining compared algorithms’ results are collected from corresponding work.

Table 1
Overall results in different data sets

Dataset Method RMSE

MovieLens 100 K OSGDO 0.9415

OCF 1.0781

DA-OCF 1.2722

OMTCF-VI 1.0521

MovieLens 1 M OCF 1.0441

DA-OCF 1.0636

OMTCF-VI 0.9766

IRMFB 0.9018

SICF 0.8671

OSGDO 0.8612

FOBFM 0.8429

MovieLens 10 M OCF 0.9700

DA-OCF 1.0262

OMTCF-VI 0.9515

OSGDO 0.9429

One-sided LS 1.173

MovieLens 20 M OSGDO 0.7973

One-sided LS 1.156

Netflix OSGDO 0.9465

IRMFB 0.9589

For data set MovieLens 100 K, we set the number of iterations and the learning rate to 4 and 0.003 respectively. Meanwhile, we set the dimension of the latent factor vectors to 30. With the above parameters, we compare OSGDO with OCF, DA-OCF, and OMTCF-VI. We observe that our algorithm OSGDO achieves significantly best performance with smaller RMSE values. This shows that our method is more effective than other methods in improving the online prediction performance of incremental recommendation in MovieLens 100 K.

In experiments of MovieLens 1 M, we set the number of iterations to 2, the learning rate to 0.004 and the dimension of the latent factor vectors to 20. And we can see that OSGDO has better accuracy than OCF, DA-OCF, OMTCF-VI, IRMFB, SICF. We believe that our results are better than them owing to the initialisation strategy mechanism. In these methods, the OMTCF-VI, DA-OMF initialises latent factor matrices randomly which will affect the predicting accuracy. IRMF-B initialises latent factor matrices with offline training results. However, it uses unbiased estimator to update biases rather than updating along with the latent factor vectors. From the results, our algorithm is slightly worse than the FOBFM algorithm in terms of RMSE values, but the difference is very small. It is noted that, the FOBFM keep multiple models for each item, then it picks the best model to predict. It will occupy a large amount of memory space. Besides, it needs extra information such as user age or user gender which is usually not available in some scenes. In contrast, we do not require too much memory space to store extra models. Also, our approach does not require an explicit dimensionality reduction step in the offline learning phase.

On MovieLens 10 M, we set the number of iterations to 5. And the dimension of the latent factor vectors and the learning rate are respectively 10 and 0.005. The result shows that our method performs better than One-sided LS, OCF, DA-OCF, and OMTCE-VI. Besides, on MovieLens 20 M, we set the number of iteration and the learning rate are 1 and 0.004 respectively. Meanwhile, the dimension of the latent factor vectors is set to 50. We also achieve higher accuracy than One-sided ALS.

On Netflix data set, we set the number of iterations to 3, the learning rate to 0.005, and the dimension of latent factor vectors is equal to 30. We compare our algorithm with IRMF-B. The result shows that our method performs better than IRMF-B.

We calculate the mean of the RMSE and the standard deviation of the RMSE of our algorithm on all the data sets we used and present them in Table 2. From all the results in Tables 1 and 2, it can be seen that OSGDO has good performance on these data sets. Therefore, the OSGOD is a practical online learning method in incremental recommendation field for the online dynamic environment and our framework RIRF is an effective incremental recommendation framework.

Table 2
Mean and stand deviation of RMSE of different data sets

Dataset Mean of RMSE Stand deviation of RMSE

MovieLens 100 K 0.9415 1.0209 $\times 10^{-5}$

MovieLens 1 M 0.8612 4.2808 $\times 10^{-6}$

MovieLens 10 M 0.9429 5.2380 $\times 10^{-4}$

MovieLens 20 M 0.7973 2.7165 $\times 10^{-5}$

Netflix 0.9465 2.4200 $\times 10^{-5}$

4.2.2 Execution time comparison

Dataset	Method	RMSE
MovieLens 100 K	OSGDO	0.9415
	OCF	1.0781
	DA-OCF	1.2722
	OMTCF-VI	1.0521
MovieLens 1 M	OCF	1.0441
	DA-OCF	1.0636
	OMTCF-VI	0.9766
	IRMFB	0.9018
	SICF	0.8671
	OSGDO	0.8612
	FOBFM	0.8429
MovieLens 10 M	OCF	0.9700
	DA-OCF	1.0262
	OMTCF-VI	0.9515
	OSGDO	0.9429
	One-sided LS	1.173
MovieLens 20 M	OSGDO	0.7973
	One-sided LS	1.156
Netflix	OSGDO	0.9465
	IRMFB	0.9589

Dataset	Mean of RMSE	Stand deviation of RMSE
MovieLens 100 K	0.9415	1.0209 $\times 10^{-5}$
MovieLens 1 M	0.8612	4.2808 $\times 10^{-6}$
MovieLens 10 M	0.9429	5.2380 $\times 10^{-4}$
MovieLens 20 M	0.7973	2.7165 $\times 10^{-5}$
Netflix	0.9465	2.4200 $\times 10^{-5}$

Figure 4 shows the time cost of OSGDO and SVD (based on Surprise) when training 1000 incremental data. We consider these data as the incoming stream data of users’ behaviors. The time cost of SVD means it rebuilds recommendation model with whole data including incremental data and the time cost of OSGDO means that the model incrementally learns from coming data and update the model at the same time. We can see that among these data sets, our method is stable and the cost of time obviously is less than retraining all the data with SVD. Therefore, our framework RIRF and online learning algorithms OSGDO are more effective than traditional batch-training methods.

Figure 4.

A time cost comparison between OSGDO and SVD.

4.2.3 Impaction of parameters

In this section, we study the performance of OSGDO and Surprise’s SVD on MovieLens 1 M data set. Note that, in other data sets, we also tune these parameters in the similar way either. We conduct multiple experiments on this data set and finally determine values of four variables: the learning rate, the dimension of latent factor vextors, the number of incremental training iterations, and the number of $N_{\textit{inc}}$ in the incremental learning phase. The following four figures describe four comparative experiments about four variables respectively. Figure 5a is the performance of OSGDO and traditional SVD at the different number of iterations. Figure 5b is the trend of OSGDO and traditional SVD when they use different learning rates. Figure 5c is the prediction accuracy rate of OSGDO and traditional SVD under different dimensions of latent factor vectors. And Fig. 5d describes the changes in RMSE of OSGDO and traditional SVD when $N_{\textit{inc}}$ changes.

Figure 5.

Comparing OSGDO, All training SVD and Normal SVD on MovieLens 1 M. The RMSE is shown on the y-axis.

In these pictures, All training SVD means training with all data including $D_{\textit{off}}$ and $D_{\textit{inc}}$ . In contrast, Normal SVD means training only with $D_{\textit{off}}$ . At first, we set the dimension of latent factor vectors and the number of learning rate to 40 and 0.005 respectively. Meanwhile, we set the L2-regularization strength parameter to 0.02. Figure 5a shows the effect of the number of iterations on these three methods, where the abscissa of the OSGDO method corresponds to the lower X axis and the other two methods correspond to the higher X axis. We set the abscissa in this way because OSDGO is an incremental learning algorithm based on offline SVD, which has been iterated already 20 times (the offline SVD performs better when the number of iterations is about 20). In this figure, we can find that our proposed OSGDO algorithm is close to other two offline training algorithms in accuracy. Besides, Our algorithm performs better than normal SVD and works best when the number of online iterations is 2. Please note that our method is slightly worse than All training SVD because it rebuilds model using $D_{\textit{off}}$ and $D_{\textit{inc}}$ , so that it could achieve higher accuracy than our algorithm. In Fig. 5b, we fix the number of iterations and gradually increase the learning rate from 0.001 to 0.005. As we can see, All training SVD and Normal SVD are very sensitive to its learning rate and the OSDGO is generally better than them. In addition, OSGDO performs better than other two offline training methods in most cases when the learning rate changes. In Fig. 5c, we fix the number of iterations and the learning rate respectively. The RMSE improves when we increase the dimension of latent factor vectors. All three methods are affected when the dimension of latent factor vectors changes. In most cases, when the dimension changes, OSGDO is better than the other two methods. In Fig. 5d, we can see that the RMSE is improved when $N_{\textit{inc}}$ increases. However, some users or items usually have few ratings at the beginning, so that we set $N_{\textit{inc}}$ as 10 at experiments in Fig. 5a–c.

From the Fig. 5, we can see our method performs similarly to batch-training methods. Even in some cases, it outperforms batch-training algorithms a little bit. This is due to our model uses offline knowledge from the offline training phase. And it is also possible that each element in incremental data set will be trained only several times in OSGDO rather than dozens of times with whole data in SVD. The latter maybe lead users’ latent factor vector or items’ latent factor vectors over-fit and more likely to be trapped in the local optima. In addition, these results can prove that our framework RIRF and online learning algorithm OSGDO are effective.

5. Conclusions and future work

Online learning in recommendation systems, as one of the most effective ways to solve the incremental recommendation problem, is becoming more and more popular in recommendation systems. Different from existing work which concerns recommendation system with high accuracy on static data, we focus on the recommendation issue on the incremental/streaming data. In this paper, we propose a novel framework named RIRF for stream incremental recommendation and a novel online learning algorithm named OSGDO for incremental recommende. Both theoretical analysis and experimental results show that our method achieves very similar RMSE to traditional MF models with traditional recommendation method: SVD, but at much lower learning time. Moreover, our framework is flexible and can be combined by selecting different offline and online algorithms according to different situations. And it can be parallelised by implementing using Apache Flink.

For future work, we may consider exploring the online recommendation framework with different types of offline batch-training methods and online learning algorithms. Besides, we will make our online learning method OSGOD parallel in Apache Flink in order to achieve better execution efficiency and deal with multiple complex situations. Furthermore, we will focus on some possible solutions to the “cold start” problem in online incremental recommendation system.

Footnotes

Acknowledgments

The work is supported by the National Natural Science Foundation of China (Grant Nos. 61572176, L1624040, 61873090), the National Key Research and Development Program of China (2017YFB02022 01).

References

Abernethy

Canini

Langford

and Simma

, Online collaborative filtering, University of California at Berkeley, Tech. Rep, 2007.

Agarwal

Chen

B.-C.

and Elango

, Fast online learning through offline initialization for time-sensitive recommendation, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2010, pp. 703–712.

Amatriain

and Basilico

, Netflix recommendations: Beyond the 5 stars (part 1), Netflix Tech Blog, 2012, 6.

Bennett

Lanning

et al., The netflix prize, in: Proceedings of KDD Cup and Workshop, New York, NY, USA, 2007, 2007, p. 35.

Breese

J.S.

Heckerman

and Kadie

, Empirical analysis of predictive algorithms for collaborative filtering, in: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 1998, pp. 43–52.

Carbone

Katsifodimos

Ewen

Markl

Haridi

and Tzoumas

, Apache flink: stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36(4) (2015).

Ciobanu

and Lommatzsch

, Development of a news recommender system based on apache flink, in: CLEF (Working Notes), 2016, pp. 606–617.

Hadoop

, Apache hadoop, URL http://hadoop.apache.org, 2011.

Harper

F.M.

and Konstan

J.A.

, The movielens datasets: history and context, ACM Transactions on Interactive Intelligent Systems (TiiS) 5(4) (2016), 19.

10.

Herlocker

J.L.

Konstan

J.A.

Terveen

L.G.

and Riedl

J.T.

, Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems (TOIS) 22(1) (2004), 5–53.

11.

Huang

Cui

Zhang

Jiang

and Xu

, Tencentrec: Real-time stream recommendation in practice, in: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, 2015, pp. 227–238.

12.

Hug

, Surprise, a python library for recommender systems, 2017.

13.

Iqbal

M.H.

and Soomro

T.R.

, Big data analysis: apache storm perspective, International Journal of Computer Trends and Technology 19(1) (2015), 9–14.

14.

Koren

and Bell

, Advances in collaborative filtering, in: Recommender Systems Handbook, Springer, 2015, pp. 77–118.

15.

Koren

Bell

and Volinsky

, Matrix factorization techniques for recommender systems, Computer 42(8) (2009).

16.

Lee

D.D.

and Seung

H.S.

, Algorithms for non-negative matrix factorization, in: Advances in Neural Information Processing Systems, 2001, pp. 556–562.

17.

Linden

Smith

and York

, Amazon.com recommendations: item-to-item collaborative filtering, IEEE Internet Computing 7(1) (2003), 76–80.

18.

Ling

Yang

King

and Lyu

M.R.

, Online learning for collaborative filtering, in: Neural Networks (IJCNN), The 2012 International Joint Conference on, IEEE, 2012, pp. 1–8.

19.

Mao

Wang

and Zhang

, Recommender system application developments: a survey, Decision Support Systems 74 (2015), 12–32.

20.

Luo

Xia

and Zhu

, Incremental collaborative filtering recommender based on regularized matrix factorization, Knowledge-Based Systems 27 (2012), 271–280.

21.

Luo

Zhou

Leung

Xia

Zhu

You

and Li

, An incremental-and-static-combined scheme for matrix-factorization-based collaborative filtering, IEEE Transactions on Automation Science and Engineering 13(1) (2016), 333–343.

22.

Mangalindan

, Amazon’s recommendation secret, CNN Money http://tech.fortune.cnn.com/2012/07/30/amazon-5, 2012.

23.

Meng

Dou

Zhang

and Chen

, Kasr: a keyword-aware service recommendation method on mapreduce for big data applications, IEEE Transactions on Parallel and Distributed Systems 25(12) (2014), 3221–3231.

24.

Mnih

and Salakhutdinov

R.R.

, Probabilistic matrix factorization, in: Advances in Neural Information Processing Systems, 2008, pp. 1257–1264.

25.

Paterek

, Improving regularized singular value decomposition for collaborative filtering, in: Proceedings of KDD Cup and Workshop, 2007, 2007, pp. 5–8.

26.

Rendle

and Schmidt-Thieme

, Online-updating regularized kernel matrix factorization models for large-scale recommender systems, in: Proceedings of the 2008 ACM Conference on Recommender Systems, ACM, 2008, pp. 251–258.

27.

Sarwar

Karypis

Konstan

and Riedl

, Item-based collaborative filtering recommendation algorithms, in: Proceedings of the 10th International Conference on World Wide Web, ACM, 2001, pp. 285–295.

28.

Spark

, Apache spark: Lightning-fast cluster computing, URL http://spark.apache.org, 2016.

29.

and Khoshgoftaar

T.M.

, A survey of collaborative filtering techniques, Advances in Artificial Intelligence 2009 (2009), 4.

30.

Verma

J.P.

Patel

and Patel

, Big data analysis: recommendation system with hadoop framework, in: Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on, IEEE, 2015, pp. 92–97.

31.

Vinagre

Jorge

A.M.

and Gama

, Fast incremental matrix factorization for recommendation with positive-only feedback, in: International Conference on User Modeling, Adaptation, and Personalization, Springer, 2014, pp. 459–470.

32.

Wang

Hoi

S.C.

Zhao

and Liu

Z.-Y.

, Online multi-task collaborative filtering for on-the-fly recommender systems, in: Proceedings of the 7th ACM Conference on Recommender Systems, ACM, 2013, pp. 237–244.

33.

Wang

Zhuang

Chen

and Zhou

, A fast and better hybrid recommender system based on spark, in: IFIP International Conference on Network and Parallel Computing, Springer, 2016, pp. 147–159.

34.

Xiao

and Benbasat

, E-commerce product recommendation agents: use, characteristics, and impact, Mis Quarterly 31(1) (2007), 137–209.

35.

Yang

Zhang

and Wang

, Scalable collaborative filtering using incremental update and local link prediction, in: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, ACM, 2012, pp. 2371–2374.

36.

Mengshoel

O.J.

Jude

Feller

Forgeat

and Radia

, Incremental learning for matrix factorization in recommender systems, in: Big Data (Big Data), 2016 IEEE International Conference on, IEEE, 2016, pp. 1056–1063.

37.

Zhao

Z.-D.

and Shang

M.-S.

, User-based collaborative-filtering recommendation algorithms on hadoop, in: Knowledge Discovery and Data Mining, 2010. WKDD’10. Third International Conference on, IEEE, 2010, pp. 478–481.