Abstract
To address the problem that the existing time-aware methods always recommend out-of-fashion items to users, this paper proposes a novel method named Item Life Cycle based Collaborative Filtering (ItemLC-CF), taking both user’s preference and item’s popularity into consideration. This method presents a life cycle function to simulate the variation of item’s popularity and integrates such variation with CF to learn the relative preference for items by reranking the candidate set of CF according to item’s vitality. Given that it is extremely complicated to construct a life cycle function for each item, items are clustered into different categories via their popularities instead of original product categories. Meanwhile, a SVM ensemble classifier based on AdaBoost is employed to find an appropriate category for new arrival items. Evaluation on MovieLens-1M dataset demonstrates that the proposed method can improve the performance of recommendation list, in turn providing more reasonable and popular recommendation lists for users.
Introduction
Recommender System (RS) has become an important feature in modern websites, such as movie recommendations (MovieLens, Netflix, Youku), product recommendations (Amazon), and music recommendations (Last.fm, NetEase Music). These real-world recommender systems can be roughly clustered into eight main categories: e-government, e-business, e-commerce/e-shopping, e-library, e-learning, e-tourism, e-resource services and e-group activities [1]. Some main recommendation techniques are applied in the above-mentioned recommender systems [1], including traditional methods such as collaborative filtering-based, content-based, knowledge-based, and hybrid methods, as well as advanced methods such as fuzzy set-based [2], social network-based, trust-based, context awareness-based, and group recommendation approaches. Among these methods, collaborative filtering (CF) [3] has been widely applied to recommend items to users because of its high accuracy, good scalability, and the ability of executing without content analysis. Although there have been various recommendations and display methods for CF-based systems, the general idea is all that users who share common interests in the past would still prefer similar items in the future [4].
However, CF-based recommender systems often suffer from several shortcomings [5], such as cold starts and interest shifts. Traditional CF methods consider that the user’s preference for a specific item or item’s popularity (public interest of an item) does not change with time. But in reality, user’s preference for an item fluctuates over time and item’s popularity constantly changes as new selections emerge, which raise unique challenges. The existing time-aware models have largely focused on predicting the degree of user’s preference for items over time. Xiang et al. [6] modeled users’ long-term and short-term preferences over time on a session-based temporal graph (STG) by using user nodes and session nodes respectively. In order to integrate long and short-term preference adequately, Zhang et al. [7] constructed a Long- and Short-Term Graph, a graph model modified from Session-based Temporal Graph, to fuse long and short-term preference appropriately. Besides, Koren [8] took both overall time and segmentation time into account and put forth a timeSVD++ model to learn user’s preference for items. The aboved considered the preference shifting over time of a single user and improved the performance of classic recommenders in terms of precision, but they did not take the popularity of recommended items into account. Consequently, these methods probably recommend outdated items to users [9]. For instance, when a movie is released, it attracts attentions and gets high popularity. Given with the transitoriness of movies’ popularity, most movies will be forgotten after release. In addition, the popularity of movie may also be affected by periodic events. For example, the classic movies about Christmas are going to spike up in popularity during the holiday seasons. The influence caused by this popularity is not considered in traditional recommendation algorithms which cause to recommend movies about Christmas to users at any time [10]. Particularly when an item’s recommendation score calculated by the traditional recommendation algorithms is close to others, users are likely to be attracted by the item which is more fashionable [11]. As a matter of fact, an important issue affecting the recommendation results is the popularity of the items [12]. Lately, some works have been propounded to provide recommendations based on item’s popularity. Liu et al. [13] proposed a concept of IDOI, which considered the importance degree of torrents decline after the torrents were downloaded, to evaluate the popularity of torrents. Pera and Ng [12] assumed the likelihood of users’ bookmarks is positively corelated with the popularity of bookmarked movies, so they introduced a GroupReM recommendation model which takes the total number of users who have bookmarked on a movie as the popularity of this movie. Zou [14] proposed a popularity-based recommendation algorithm for Twitter. He suggested that the popularity of tweets can be measured by the value of user’s authority while ignoring the fact that the popularity of tweets changes over time. Javari and Jalili [15] put forward an algorithm which considered the standard definition of accuracy and an effective self-information based measure to assess the popularity of the recommendation list. These approaches modeled the popularity of items from different standpoints but none of them considered the state of fluctuation of item’s popularity.
As the traditional recommendation methods sometimes recommend outdated items for users, this paper introduces the concept of item life cycle derived from the product life cycle theory. Developed by Raymond Vernon in 1966, the product life cycle theory is used to describe the popularity of product [16]. It is defined as the period starting with the initial product design (research and development) and ending with the withdrawal of the product from the marketplace. It is characterized by specific stages including introduction, growth, maturity and decline. The typical process of life cycle is shown in Fig. 1(a). There are some special categories of product life cycles, such as styles, fashions, fads and scallop (Fig. 1(c)). Similarly, the public interests of items show different patterns (Fig. 1(b)).

(a) A typical process of product life cycle; (b) Item vitality over time in MovieLens; (c) Four special categories of product life cycle: styles, fashions, fads and scallop.
As a tactical and empirical model, product life cycle is often used to direct product planning and market analysis [17]. On the basis of the product life cycle theory, more and more scholars have applied the product life cycle characteristics to popularity forecasting. Althoff et al. [18] proposed a life cycle-based forecasting model to presage the popularity of online topics. Castillo et al. [19] modeled the future visitation patterns of online news stories by product life cycle with good effect. Seol et al. [20] put forward a new approach based on the product life cycle theory towards popularity forecasting for new services such as Internet protocol TV (IPTV). Traditional product life cycle theory is often used to model the popularity of a single product. Basallotriana et al. [21] proposed a demand forecasting model for short life cycle products in which life cycle function is constructed by weighted linear regression. Kim et al. [22] proposed a set of installed base concepts to describe the life cycle of consumer electronics, while associating with demand forecasting methodologies. Due to the huge number of items, it is time-consuming to construct a life cycle function for each item, and the life cycle function is difficult to construct for each new item. Therefore, we will establish a unified life cycle function for a given class of items. In the real-word recommender system, the items are classified according to their utilitarian functions, resulting in the item life cycle pattern of same category show different features while some items of different categories have the same life cycle pattern. The cause of such phenomenon is that the life cycles of the items in the same function category are affected by comprehensive factors such as price, promotion and quality. Therefore, our method, having considered the impact of multi-factor on the category of items, firstly clusters items. And then, a life cycle function is constructed by the result of the clustering.
Based on this, an Item Life Cycle based Collaborative Filtering (ItemLC-CF) method is proposed, which stands on a foundation of two popular CF algorithms: K-Nearest Neighbor Collaborative Filtering (KNNCF) [23] (including User-based CF [23] and Item-based CF [24]). The recommendation model manages to rank the alternative items by taking both item life cycle and CF into consideration. The main contributions of this paper are as follows: In order to solve the interest shift problem from the item’s perspective, the item life cycle inspired by the product life cycle theory in economic field is proposed to quantitatively describe the complex changes of item’s popularity over time. Items are clustered into different categories by considering item’s vitality. Life cycle function is constructed for each category which makes it possible to construct life cycle functions for a large number of items. In order to describe the life cycle of new arriving items, an item classifier is constructed to classify the new items into corresponding categories. By extending the KNNCF algorithms, ItemLC-CF is proposed to rank the candidate set of CF by the vitality values of candidate items. Experiments on real-world dataset show that the proposed life cycle method is well-suited for the description of item’s popularity, which greatly improve the performance of CF.
The rest of this article is organized as follows. Sect. 2 illustrates the main idea and key techniques of our proposed method for recommendation system in detail. In Sect. 3, experiment evaluations are conducted to compare our method with other methods. Finally, conclusions and future research directions are presented in Sect. 4.
In order to get a better description of the item’s popularity, item life cycle is redefined based on traditional product life cycle [16] as Definition 1.

Flow chart of Item LC-CF.
The general item life cycle can be described as four development stages of an item from birth to decease: introduction, growth, maturity and decline. Within the introduction stage, the item’s popularity dictates whether it will initially achieve only low rating volumes and slow rating growth. During this stage, the slope of the rating curve will remain relatively flat. If the items are attractive, this initial stage gives way to the growth stage, in which the upward slope of the rating curve increases. When the items gain popularity, they have been rated by most of users interested in them. And then the cycle moves into the maturity stage during which the rating curve flattens again. Finally, the decline stage arrives when the items are no longer attractive as the rating curve slopes downward. Similar with the product life cycle theory, there are a lot of item life cycle models which show different patterns. In order to capture these different patterns, support vector regression (SVR) is applied to construct life cycle function to simulate the variation of item’s vitality over time.
In this section, we will describe the proposed recommendation algorithm based on item life cycle in detail. The illustration of the proposed algorithm is depicted in Fig. 2. More specifically, the proposed algorithm consists of two procedures. (1) Item life cycle evaluation stage. The stage is mainly composed of two parts: construction of item life cycle function and construction of item classifier. The former part is to construct the life cycle function by SVR for each category generated by item cluster according to their vitalities. The latter part is mainly to obtain the life cycle function of new arriving items, so it first constructs a classifier for new items by using the SVM ensemble classifier based on AdaBoost, and then the corresponding life cycle function can be acquired from the collection of life cycle functions. (2) Recommendation stage. The stage is to get the final recommended items according to ItemLC-CF proposed by the paper. Firstly, KNNCF is used to acquire the top-n candidate recommendation set, and then the vitality of each candidate item is multiplied with the corresponding score to get the final recommended results. More details are presented in the following subsections.
The item life cycle is affected by not only the item’s function but also many other factors such as the time online, the manufacture information, and so on. It is unreasonable to construct the item life cycle function according to the existing classification based only on their utilitarian functions, because it causes the item life cycles of different categories to have the similar patterns and those in the same category show different features. Therefore, our method constructs the item life cycle function based on the item’s vitality instead of considering the item’s specific categories and contents.
There are many ways to calculate the item’s vitality from different perspectives. In this paper, the number of people rating an item at a certain moment is used as the item’s vitality. Then a clustering algorithm is used to cluster items according to their vitalities into different categories. But the length of each item life cycle is different by reason of the various characteristics of items, which poses difficulties in constructing item’s vectors with the same dimension for clustering. The purpose of our research is to recommend some fashion items to users, so the vitality of an item in the period of the life cycle’s introduction, growth, maturity, and early stage of decline should to be concerned. It is fortunately that the research object of this paper is some items (eg. Movie) with the same practical functions, and we can provide a given length T which covers the life cycle’s introduction, growth, maturity, and early stage of decline as the length of each item life cycle, with a few exceptions. To measure item’s vitality at different time unit of a life span T, we firstly divide T into n equal segments T = (t1, t2, …, t
n
), in chronological order. Then we consider the vitality of item i as a vector P
i
= (pi1, pi2, …, p
in
), where p
i
j represents the vitality of the item i at the jth time unit t
j
and can be calculated by Equation (1).
As it is extremely complicated to construct a life cycle function for each item, K-means clustering is used to cluster the items into different categories in this paper. Given a set of item’s vitality A = {P1, P2, …, P
m
}, the basic idea of K-means is to find K cluster centers {c1, c2, …, c
k
} which makes the square sum of distance between each data point and its nearest cluster center is the smallest, where c
i
is the vector of clustering center i’s vitality. Hereinto, the square sum of distance is presented in Equation (2).
Since K-means clustering needs to set the value of K manually, indicator named Dunn index (3) [25] is used to evaluate the cohesion of clusters in order to get the best clustering results. The Dunn index is calculated by Equation (3):
The support vector regression (SVR) is used to construct the life cycle function for each category after the best cluster C generated. Firstly, each cluster center is selected as the representative of the corresponding category and the vector of clustering center i’s vitality P
c
i
= {(t1, ci1), (t2, ci1), …, (t
n
, c
in
)} is calculated by the vitalities of items belonging to the same class, presented in Equation (4):
When a new item arrives, it is impossible to construct a new item life cycle function due to the lack of data (eg. the new item’s vitality can’t be calculated because the number of people evaluated couldn’t be acquired when a new item appeared). To overcome such hindrance, we can match the new item to a given category with similar popularity and acquire the corresponding item life cycle function. So, we construct a classifier through Adaboost [26] based SVM ensemble for this classification task which takes the basic information as input (the details of input are described in subsection 3.1). The schematic illustration of item classification based on Adaboost-SVM is shown below in Fig. 3.

Schematic illustration of item classification based on Adaboost-SVM.
Let D ={ 〈 x1, y1 〉, 〈 x2, y2 〉, … 〈 x
n
, y
n
〉 } be a class labeled tuples, where x
t
= (xt1, xt2, …, x
tm
) is the vector of input features of tth instance and y
i
∈ {1, 2, …, K} is a discrete of input features of its class. Initially, AdaBoost assigns an equal weight of 1/n to each training tuple. Owing to that the main idea of the AdaBoost ensemble algorithm is to approximate the classifier by combining many weak classifiers [26], T Support Vector Machine (SVM) classifiers should be generated for the ensemble. Therefore, T rounds are required throughout the algorithm. In ith round, sampling with replacement is used to sample the tuples from D to form a training set D
i
. Each tuple’s chance of being selected is dependent on its weight. After an SVM classifier C
i
is constructed from the training set D
i
, its error e
i
is then calculated using D
i
as a test set, presented in Equation (5). The weights of the training tuples are then adjusted by Equation (6).
where w
ij
denotes the weight of tuple j in round ith.
where Z
i
is a normalization factor. Once the boosting completes, each single SVM classifier will be assigned a weight a
i
. And the final classifier F can be acquired by Equation (7).
The above process of constructing the classifier is described in Algorithm 2.
In order to recommend popular items to meet the user’s preference, the item life cycle function is integrated with CF algorithm to learn the relative preference for items by reranking the candidate set of CF according to the vitality. The proposed ItemLC-CF method stands on the foundation of two popular CF algorithms: User-based CF and Item-based CF. Thus ItemLC-CF consists of two implementations: User-based ItemLC-CF and Item-based ItemLC-CF.
User-based ItemLC-CF assumes that a user prefers items liked by other users who have similar preferences. In this method, the similarities between users are calculated by their rating records. Then K neighbors are found by sorting the similarities. At last, items which have been rated by users’ neighbors instead of the user itself will be voted on. And then, the recommendation list will be generated. The Cosine Similarity is used to calculate the similarity by Equation (8). The degree of user u interested in item I
j
is calculated by Equation (9).
Item-based ItemLC-CF recommends a user the items similar to the items he likes before. Hence, Item-based CF consists of two steps: (1) Calculating the similarity of items; (2) Generating the recommendation list for users according to the similarity of items and the history behavior of users. Hereinto, the similarity of items is calculated by Equation (11). After the similarity of items is obtained, ItemLC-CF calculates the degree of user u interested in item I
j
, presented in Equation (12).
In summary, ItemLC-CF firstly clusters items into different categories according to their popularities and constructs life cycle function for each item. Then, SVM ensemble classifier based on AdaBoost is deployed to construct classifier for new arrival items. And then, User-based CF and Item-based CF are used to get the candidate recommendation set. Lastly, the vitality of each item is calculated by life cycle function to re-rank the candidate recommendation. The whole process of ItemLC-CF is described in Algorithm 3.
Experimental dataset and setup
Our experiments are conducted on MovieLens-1M dataset with a sparsity of 99.22% out of 1000209 rating records provided by 6040 users for 3952 movies. The basic information of MovieLens-1M dataset includes the name, time online (e.g., 01-Jan-1995) and types of films (19 kinds). In the stage of data preprocessing, the file name is withheld, while time online of a film is taken as the different days with the earliest time online. Film types are encoded in One-Hot Encoding, e.g., 0001110000000000000.
To measure the quality of recommendations, the rating data set is partition into training-testing sets (70–30%) in chronological order. Firstly, the items in training set are clustered into different categories. On the issue of the length of a life span T, it can be observed from the statistics on movieLens-1M data set that the number of rating users of most items is fluctuating largely in the first 196 days since its first evaluation by users. After 196 days, this number is almost zero, with only sporadic comments occurring intermittently. Therefore, we set T= 196 days from the first comment on an item as the life cycle length of the item and apply SVR to simulate the item’s life cycle for each category during this period of time. Then, the basic information of movies in the training set which includes the genres tags of the item is used to train the classifiers for new items.
To achieve the most accurate classifiers for new items, this paper compares five classifiers: SVM, Naive Bayes, Decision Tree, AdaBoost-SVM and AdaBoost-Decision Tree. The detailed experimental setups are as follows: (1) LibSVM invented by Chang and Lin [27] is used as the implementation tool of SVM algorithm. Hereinto, the kernel function is Radial Basis Function (RBF), penalty parameter C of the error term and kernel coefficient are 2.0 and 0.25 respectively. (2) GaussianNB and Decision Tree Classifier in scikit-learn toolkit [28] are respectively used as the implementation tools of Naive Bayes and Decision Tree, and the default parameters of these methods are used to the experiments. (3) AdaBoost-SVM and AdaBoost-Decision Tree are both constructed by using the AdaBoost Classifier method in the scikit-learn toolkit. Hereinto, LibSVM and Decision Tree are used as the base algorithms, with the maximum number of estimators and the learning rate being 200 and 0.01 respectively. The above methods are all verified by using 3-fold cross validation. In addition, a comparison is drawn by using User-based CF and Item-based CF to outline the effectiveness of User-based ItemLC-CF and Item-based ItemLC-CF.
Evaluation metrics
Several widely used evaluation metrics (including precision, recall and coverage) are adopted in this paper to measure the quality of our recommendations. We give a brief description of the aforementioned metrics below:
Precision is the proportion of the number of relevant recommended items from the total number of recommendations. This measure correlates positively with performance. The precision is calculated by Equation (13).
Recall is the proportion of relevant recommendations from the number of relevant items in the testing set. This measure also correlates positively with performance. The recall score is calculated by Equation (14).

The performance of K-means and User-based ItemLC-CF methods with different unit times and number of the clusters.

The life cycle function of two clusters.
Coverage is the proportion of the number of the items which appear in the recommendation list from the total number of items in the dataset. The coverage is calculated by Equation (15).
Several experiments are conducted on the movieLens-1M data set. The unit time, the number of the clusters, the number of nearest neighbors (K) and the number of recommendations are the four parameters which can impact the performance of recommendation. We will compare the performance with different values of these parameters.
Figure 4(a) shows the Dunn Index of the K-means algorithm with different unit times and different number of clusters. When the unit time is 7, 14, 21 and 28 days, the Dunn Indexs are 2.48%, 2.34%, 0.76% and 0.91% in average, respectively. When the unit time is 7 days at 2 clusters, the Dunn Index reaches the highest value, which is 50% higher than that of unit time being 28 days. Figure 4(b) shows the precision of User-based ItemLC-CF with varying time unit and the number of cluster. When the unit time is 7, 14, 21 and 28 days, the precisions of User-based ItemLC-CF are 15.35%, 14.76%, 12.86%, and 12.92% in average, respectively. When the unit time is 7 days at 2 clusters, the precision of User-based ItemLC-CF reaches the highest value, which is 4.34% higher in average than that of unit time being 14 days. As shown in Fig. 4(a) and 4(b), when unit time is 7 days at 2 clusters, the Dunn Index of K-means algorithm and the precision of User-based ItemLC-CF reach the maximum.

The performance of different classifiers.

The performances of different algorithms with different K-neighbors.

The performances of different algorithms with different Top-N.
When the number of the clusters is 2, the life cycle functions of the two clusters fitted by SVM are shown in the Fig. 5(a) and 5(b) respectively. It can be seen that the vitality of the cluster in Fig. 5(a) is powerful at the beginning and then decreases gradually over time. However, the vitality of the cluster in Fig. 5(b) is not powerful at the beginning. It rises slightly at first, then decreases drastically after the peak and finally is stabilized.
Figure 6 shows the performance of different classifiers. In single classifiers, the precisions of SVM, Naive Bayes and Decision Tree are 38.34%, 27.05% and 30.60%, respectively. SVM based classifier obtains a better precision than Native Bayes and Decision Tree. After introducing the conception of AdaBoost, the precisions of Adaboost-SVM based classifier and Adaboost-Decision Tree based classifier increase by 10.57% and 32.20% respectively than that of single SVM and single Decision Tree. The precision of Adaboost-SVM based classifier increases by 4.77% than that of Adaboost Decision Tree based classifier.
Figure 7 shows the performances of different algorithms with different K-neighbors. The unit time, the number of clusters, and the number of recommendations are set to be 7 days, 2, and 10 respectively. In Fig. 7(a) and 7(b), when K is less than 70, Item-based ItemLC-CF obtains better precisions and recalls compared with Item-based CF, and the precision and recall of Item-based CF increase by an average of 1.34% and 1.34% respectively. On the other hand, User-based ItemLC-CF obtains better precisions and recalls than User-based when the K is less than 100, and after considering the item life cycle, the precision and recall of User-based CF increase by an average of 5.87% and 5.50% respectively. And in Fig. 7(a), the precision of User-based ItemLC-CF reaches the peak (16.67%) when K is 100. The coverages of all methods decrease with the increase of the number of neighbors in Fig. 7(c). The coverage rates of Item-based ItemLC-CF are higher than Item-based CF, but the coverage rates of User-based ItemLC-CF are lower than User-based CF.
From Fig. 7, it can be seen that the proposed ItemLC-CF methods (including User-based ItemLC-CF and Item-based ItemLC-CF) have the higher evaluation results on precision, recall than traditional recommendation methods (including User-based CF and Item-based CF). The reason of this phenomenon is that the number of items which have high vitalities is limited. Therefore, the more neighbors are considered, the less impact vitality has on the recommendation methods. And when the nearest neighbors reach to a certain number, the items which not only meet the user’s preference but also are popular will appear in the final recommendation list with a higher probability after re-ranking the candidate set by their vitalities. This is also the most essential difference between our methods and the other popularity recommendation methods.
Figure 8 shows the performances of different algorithms with different Top-N on the movieLens-1M dataset. The unit time and cluster number are set to be 7 days and 2, respectively. And according to the front analysis, the number of nearest neighbors is set to be 70 and 100, respectively. Figure 8(a) and 8(b) show that the precisions decrease and the recalls increase with the increase of the number of recommendations. The precisions of User-based ItemLC-CF and Item-based ItemLC-CF increase by an average of 3.17% and 1.91% respectively than that of User-based CF and Item-based CF. The recalls of User-based ItemLC-CF and Item-based ItemLC-CF increase by an average of 1.14% and 1.91% respectively than those of User-based CF and Item-based CF. From Fig. 8(c), it can be seen that the coverage of Item-based ItemLC-CF is higher than that of the traditional methods. The coverages of Item-based ItemLC-CF increase by an average of 3.35% than that of Item-based CF.
Therefore, the results of experiments demonstrate that the proposed methods can provide high quality of recommendations and obtain better performance than traditional methods.
When it comes to the interest shift problem, the existing time-aware methods recommend items usually stand in the user’s point of view (user’s preference for a specific item) instead of item’s standpoint (item’s vitality), which may recommend out-of-fashion items to the users. To solve the interest shift problem by taking user’s preference and item’s popularity into account, based on a foundation of product life cycle, ItemLC-CF is proposed in which item life cycle is presented to model the popularity of items. Firstly, it clusters items into different categories according to their vitalities and constructs life cycle function for each category according to SVR. Secondly, a SVM ensemble classifier based on AdaBoost method is constructed to classify new arrival items into different categories, and then the vitalities of items are calculated by using their corresponding life cycle functions. Finally, the vitality of each item is calculated by life cycle function to re-rank the candidate recommendation obtained by the traditional KNN based CF (including User-based CF and Item-based CF). The comparisons among the traditional KNN based CF and the formulated ItemLC-CF (including User-based ItemLC-CF and Item-based ItemLC-CF) show that the proposed ItemLC-CF outperforms traditional CF methods on the precision, the recall and the coverage rate. The improvement of precision and recall rate indicates that the recommendation method, which considers item life cycle, can select a more reasonable recommendation list. The ranking scores of some items which the users are really interested in have increased in the recommendation list after considering the item life cycle. The enhancement of the coverage indicates that the long tail can be better after considering the life cycle.
There are several interesting avenues presenting possible improvements on the proposed methods for future research. One direction is to circumvent the limitation of MovieLens-1M dataset in that the number of people rating items does not accurately represent item’s vitality. Thus, more information of items such as online comments provided by users on websites could be gathered to construct the item life cycle function more accurately. Beyond that, as shown in Figs. 7(c) and 8(c), the coverage is related to the number of nearest neighbors and recommended items. Therefore, in the future it is necessary to adjust these two parameters appropriately to acquire better coverage of the methods.
Footnotes
Acknowledgments
This work is supported by the National Key Research and Development Plan of China (Grant No.2016YFB1000600 and 2016YFB1000601).
