Abstract
Recommender Systems (RS) are expected to suggest the accurate goods to the consumers. Cold start is the most important challenge for RSs. Recent hybrid RSs combine ConF and ColF. We introduce an ontological hybrid RS where the ontology has been employed in its ConF part while improving the ontology structure by its ColF part. In this paper, a new hybrid approach is proposed based on the combination of demographic similarity and cosine similarity between users in order to solve the cold start problem of new user type. Also, a new approach is proposed based on the combination of ontological similarity and cosine similarity between items in order to solve the cold start problem of new item type. The main idea of the proposed method is to expand user/item profiles based on different strategies to build higher-performing profiles for users/items. The proposed method has been evaluated on a real dataset and the experimentations indicate the proposed method has the better performance comparing with the state of the art RS methods, especially in the case of the cold start.
Introduction
Recommender Systems (RS) that predict user ratings for a set of items (books, movies, news, songs, etc.) [1, 2] are known as a subset of Information Filtering Systems (IFS). RSs help users find their favorite items among thousands of the existing items. In fact, the reason behind the success of advocate systems on commercial websites is to personalize recommendations to users [3]. Collaborative Filtering (ColF) is one of the most commonly used methods for implementing a RS [4]. Rating values assigned to the different items by a user is called Items’ Rating Vector (IRV). IRV is a vector of the size which is equal to the number of the items. Any item which is not rated by the user, has a Null value in its corresponding element of IRV. In this way, the similar users (those users with an IRV identical to the IRV of the target user) are known as the target user’s neighbors. The target user is referred to as the user who is to be recommended by the system. Therefore, a ColF RS offers a list of favorite N items among the items ranked by the neighbors to the target user. ColF RSs are divided into memory-based and model-based approaches [5]. The ColF RSs directly use the user’s matrix-item to calculate predictions and generate recommendations for the target user [6]. These methods use statistical techniques such as Pearson’s similarity [7] and cosine similarity [8] in order to calculate the similarity between users and to find the target user’s neighbor sets [9]. In a model-based ColF, strategy is to use machine learning techniques, such as clustering [10], Bayesian classifier [11] and genetic algorithm [12], to create a model of training data in the user-item matrix. In addition, in the test phase, this training model can be used to predict user rank.
One of major challenging problems that RSs suffer from is the data sparsity problem. It means due to sparsity of data in the system, they are unable to find popular items with certainty and accurately [13]. Indeed, this emerged specially in situations where there are a great number of items and users in the system and the items’ rating ratio by users is low; (in other words, the user-item matrix is a sparse one). Therefore, finding the meaningful nearest neighbors of a user is impossible. Another challenging problem related to RSs is possibility presence of users with abnormal interests. The users with abnormal interests can mislead RS to recommend unsuitable items to other users.
The user-item matrix is a matrix of size n × m where n indicates the number of users m indicates the number of items. The (i, j)th entry of this matrix is the rating that the ith user assigns to the jth item. Number of items rated by a user is about less than 1 percent of all items in a system. Consequently, the user-item matrix is a sparse matrix with less than 1 percent of data. This problem can be solved in hybrid filtering RSs to some extent. Gone et al. propose a model where the null entries in the user-item matrix are filled as a preprocessing step using a multi-layer perceptron neural network. They have used an item-wise similarity method [14]. A solution for dealing with the sparsity problem is dimension reduction approach [15]. For example, a SVD (Singular Value Decomposition) along with a PSO (Particle Swarm Optimization) has been applied for this material. It is worth mentioning that SVD has a high overhead. Therefore, in spite of its effective results, the SVD cannot be used in online RSs. PSO is used there to speed up the SVD run time.
Another challenging problem from which RSs suffer is related to their scalability. When there are about several hundreds of thousands of users, user based ColF RSs are efficient and scalable. But in real systems, there are more than millions of users. therefore, these user based ColF RSs are no longer scalable and useful. Item based ColF RSs can be considered as an alternative.
One of the most important problems in the RSs is the Cold Start (CS) one [16]. CS problem occurs due to the low number of user-ranked items, i.e. the sparsity of IRV of the users. The CS problem is divided into two categories: New User (NU) and New Item (NI) [17]. Our main focus in this article is on CS problem of the NU type. The NU problem occurs when a new user has recently logged in and has not ranked any items so far or when a user has been already present in the system, but has been less active and has consequently rated a few number of items so far.
So far, a lot of work has been done to solve the CS problem in the RSs. The solutions to the CS problem are generally divided into two groups. A group of solutions is the ones dealing with it through improving traditional RSs so as to make them ready to deal with CS problem [18]. Another group of solutions is the ones dealing with it through using hybrid RSs [19]. New users are the ones that initially rate a few number of items or the ones that still rate no items; it is worthy to be mentioned that they have an almost empty profile (or a very small profile). Therefore, the RSs are not able to recognize their preferences and cannot offer them the appropriate items of their interest. Hence, most researchers have used hybrid RSs to solve the NU problem [20]. Hybrid RSs are usually a combination of ColF with additional data sources.
So far, a great deal of research has been done on the use of additional data sources such as demographic information [21] of the users to solve the CS problem. As an example, it is proposed to implement collaborative tagging in a collaborative filtering RS to learn the users’ interests and classify items based on the users’ interests [22]. Another technique to deal with the mentioned problems usually used as a solution to CS problem in RSs is clustering [41–46]. Clustering task has been successfully used in many applications such as bioinformatics [47, 48], healthcare systems [49], optimization [50, 51], and domain adaptation [52, 53]. It is possible to cluster items, users or simultaneously both of them. These techniques can speed up the execution of RSs.
A new method for solving the CS problem, which consists of three different phases, has been investigated [23]. In the first phase, using the classification algorithms, the new user is placed in a specific group. In the second phase, the similarity between the new user and other users in the target group is calculated based on demographic data of the users. In the third phase, using the different groups created for users, the rating is predicted for the new user. A hybrid method for solving the CS problem is presented using user-based content and social information [24]. The main idea of this method is to build content information based on the profile of the keywords associated with different items and use this information in order to generate a recommendation for the new users. Different ways to the user profiling based on demographic information has been investigated by Al-Shamri [25]. These methods incorporate various combinations that include the types of used features, types to features’ presentation, and users’ profiling mechanisms. Another method based on demographic information is presented to solve the CS problem for the new users by Safoury and Salah [26]. In this method, instead of using the rates given to various items by cold users, their demographic information is used to generate recommendations. To this end, a framework has been proposed to evaluate demographic characteristics such as age, gender and occupation. In another demographic user-driven RS, users are provided with a solution to the CS problem [27]. In this way, different users are categorized based on their demographic data, and then, based on their demographic categories, recommendations are produced. Therefore, using demographic information of users, the CS problem is largely solved.
Contrarily to the above mentioned methods, other methods have not used additional data sources to solve the CS problem of NU type. They only try to improve the CS problem with the current status of the user rating profile. A framework based on a local similarity criterion of the users and also a global similarity criterion of the users tries to improve the CS problem [28]. In addition, various types of data sparsity, such as Overall Sparsity Measure, User Specific Sparsity Measure, User-Item Specific Sparsity Measure, and Unified Measure of Sparsity have been used to solve the data sparsity problem and the CS issue. In this method, in order to predict the rank of a specific item for a given user, the combination of rankings generated based on the local and global similarities of users taking into account the uniform distribution of the users ranking matrix is used. Ahn [29] has focused on the limitations of traditional similarity criteria such as the cosine and Pearson similarities, and he has provided a new exploratory similarity criterion called Proximity–Impact–Popularity (PIP) to deal with the CS problem. This similarity criterion includes three factors: (I) proximity factor (indicating the distance between two rating), (II) impact factor (indicating the severity of the hate or the interest of the users in the given item) and the popularity factor (indicating the distance between the average rating of the two users to the given item from the average rating of the total users to the given item).
In order to cope with the problem of choosing a weak neighboring set for a new user in a ColF RS, Formoso et al. propose to expand profiles of its new users based on Item-Global, Item-Local, and User-Local techniques [30]. Inspiring from [35], the ontology has been considered as an additional guideline for expanding profiles; but we expand profiles for both of users and items. The Item-Global technique tries to find a collection of the most similar items to the ones already in the user profile and adds them to the user profile. The Item-Local technique consists of two steps. In the first step, according to the user’s current profile, the system recommends some of the items, and then the items with the highest grades will be added to the user profile. In the second step, the system generates items’ recommendations to the new user. The User-Local technique based on the current neighbors’ set of the target user expands the profile. In this technique, among the items rated by the current neighbors’ set of the target user, a subset is selected based on strategies such as the Local Most-Rated Strategy (LMRS) and Local User-Local Clustering Strategy (LULCS). It is worthy to be mentioned that [31] proposed LULCS. Due to lack of a RS that incorporates all solutions to both types of CS problem, and also lack of a RS that uses both of additional information and new appropriate similarity measure to handle CS problem, we try to propose a method dealing with these challenges. Therefore, in this paper, a new hybrid method based on the profile extension technique is presented to solve a CS problem of the types NU and NI. In the proposed method, in contrast to the previous methods [25] which use only information about the matrix of the ratings given to the items by users, demographic information of users is also used. In fact, in the proposed method, combinations of cosine similarity based on rating matrix and demographic similarity based on demographic information of users are used as the final similarity in the development of user profiles.
Algorithm 1. The general framework of the proposed method
Input: Let’s U
i
. DI denotes demographic information of the ith user. Let’s R
ij
denotes the rate value the ith user gives to the jth movie. Let’s Ri: denotes the rate values the ith user gives to the different movies; its transpose (i.e. Let’s I
i
. L denotes the language of the ith movie. Let’s I
i
. D denotes the director of the ith movie. Let’s I
i
. W denotes the writer of the ith movie. Let’s I
i
. R denotes the average rate the ith item has gotten. Let’s I
i
. Rt denotes the runtime of the ith item. Let’s I
i
. Rd denotes the release data of the ith item. Let’s I
i
. C denotes the country of the ith item. Let’s I
i
. N denotes the number of rates the ith item has gotten. Let’s I
i
. P denotes the producer of the ith item. Let’s I
i
. G
j
is an asymmetric Boolean variable indicating whether the ith item has the jth genre or not. Let’s I
i
. A
k
denotes whether the kth famous actor is available in the actors’ list of the ith movie or not. A famous actor is the one features in at least five movies. I
i
. A
k
is an asymmetric Boolean variable. Let l be expanding size of user profile or item profile R matrix Z = Q = 25 N
LARS
= N
GMRS
= N
GMRS
= l
Let’s Define Define Use LARS or GMRS or LARS+GMRS to determine Use ILMRS to determine If Use ILMRS to determine Elseif Estimate Else Estimate
Output:
Body:
Related works
According to the literature, in collaborative filtering, RS scores the items based on ratings of the other users, and then recommends the items with the maximum score values to the current user. Therefore, in the start of the system, as there is no item-rating yet, RS faces a challenging problem to predict the high-quality scores for items. This problem is called cold start. To be more general, when a new item is introduced to the system, RS faces item cold start and that item is a NI, and when a new user is introduced to the system, RS faces user cold start and that user is a NU [32]. NUs are those that rate less than 5 items generally [33]. To address the CS, two approaches are used by researchers: (a) to use supplementary information of users or items for assisting RS, and (b) to use more appropriate similarity measures so as to maximally benefit from the prior information in the system. Shaw et al. propose to use association rules as a source of information to expand profile of a user to handle cold start problem of NU type [34]. Liu et al. propose a model in which the underlying user behavioral models can be attained through the invisible interests of the users. They propose a RS that learns to use this information; i.e. the information about invisible interests of the users [35]. Considering all related works, there is not any study that uses both of user profile expansion and item profile expansion. It may be due to its need to auxiliary information for users and items. However, this paper addresses this issue, and consequently, it overcomes the cold start problem with both types of NU and NI.
Proposed RS
In this section, a new method is proposed to solve the CS problem in the RSs. In the proposed system, the recommendation process consists of two phases. In the first phase, the target user profile and the target item profile are expanded. For this purpose, the combination of cosine similarity and demographic similarity are used as the final combined similarity for selecting the nearest neighbors of the target user to expand its profile. Therefore, in the proposed method, in addition to information about ratings given to various items by users, their demographic information is also used as additional information to solve the CS problem. After expanding the user profiles and the item profiles, in the second phase, the targeted items are rated and ranked for the target user based on its expanded profile if the target user is NU; otherwise if the target item is NI, the targeted item is rated based on expanded profile of the item. In the following subsections, each phase will be described in detail. Figure 1 shows scheme of the proposed method. Also, algorithm 1 shows pseudo code of the proposed method.

The scheme of the proposed method.
At this point, a hybrid approach is developed based on cosine and demographic similarities between users to extend user profiles. The information in the user rating matrix is used to calculate cosine similarity. Also, the demographic information of users (including age, gender, and occupation) is used to calculate demographic similarity among users. In order to analyze the CS problem, the proposed method is used. In this method, for each evaluated pair of users, all of items rated by both of them are selected, then, cosine similarity criterion is used to calculate the similarity between those users as presented in Equation 1.
In the Equation 1,
The similarity of users is also calculated based on demographic data (i.e. age, gender, and occupation). Demographic similarities are calculated between two users according to Equation 2. In this regard, if the value of a particular nominal attribute is the same for two users, the value of the similarity of this particular attribute is equal to one for both users; otherwise, it is equal to zero for them. For example, if both users have the same gender, the value is one, otherwise it is zero.
In this regard,
After calculating cosine and demographic similarities among users, Equation 4 is used to calculate the final similarity between users.
After calculating the final similarity between users, Z users who have the most similarity to the target user are selected as the set of closest target user neighbors. The predicted rate value which the ith user (the target user) gives to the jth item is estimated according to Equation 6.
Global Most Rated Strategy (GMRS) is a strategy where N GMRS different items are selected to expand the target user profile [25]. For this purpose, the items that have the highest number of ratings by all the Z nearest neighbors of the target user are selected. Therefore, in this strategy, the existing items are sorted in descending order based on the average number of ratings they received by the Z nearest neighbors of the target user, and the N GMRS items from the beginning of the list are selected to expand the target user profile.
Item-wise local most rated strategy
In this strategy, the similarities between the target item and the other items in the system are calculated. Then, some of the most similar items are selected for prediction of the target item rating. A hybrid approach is developed based on cosine and ontological similarities between items. The information in the user rating matrix is used to calculate cosine similarity between items. Also, the auxiliary (ontological) information of items is employed to calculate a new (ontological) similarity among users. In this method, for each evaluated pair of items, all of users rating both of them are selected, then, cosine similarity criterion is used to calculate the similarity between those users as presented in Equation 7.
In Equation 7,
The similarity of items is also calculated based on auxiliary data. Let’s I
i
. L denotes the language of the ith movie. Let’s I
i
. D denotes the director of the ith movie. Let’s I
i
. W denotes the writer of the ith movie. Let’s I
i
. R denotes the average rate the ith item has gotten. Let’s I
i
.Rt denotes the runtime of the ith item. Let’s I
i
.Rd denotes the release data of the ith item. Let’s I
i
. C denotes the country of the ith item. Let’s I
i
.N denotes the number of rates the ith item has gotten. Let’s I
i
.P denotes the producer of the ith item. Let’s I
i
. G
j
is an asymmetric Boolean variable indicating whether the ith item has the jth genre or not. Let’s I
i
. A
k
denotes whether the kth famous actor is available in the actors’ list of the ith movie or not. A famous actor is the one features in at least five movies. I
i
. A
k
is an asymmetric Boolean variable. Let’s
In this regard,
After calculating cosine and new (ontological) similarities among items, Equation 10 is used to calculate the final similarity between items.
After calculating the final similarity between items, Q items who have the most similarity to the target item are selected and used to the rating prediction. Now, Q items who have the most similarity to the target item are selected as the set of closest target item neighbors. The predicted rate value which the ith user gives to the jth item (the target item) is estimated according to Equation 12.
Benchmark datasets
In order to evaluate the proposed method, the Movielens 1m dataset, which can be downloaded online from the lens group website, has been used. The collection consists of 1,000,000 ratings, which are given by 6,040 users to 3,950 videos. Existing ratings include a 5-point numerical scale, a rating of 1 indicates a very low interest, a rating of 2 indicates low interest, a rating of 3 indicates an average interest, a rating of 4 indicates a high interest, and a rating of 5 indicates a very high interest of users. In this dataset, each user has ranked at least 20 items. A tenth of videos, i.e. 395 videos, are used as cold items. A cold item is the one which gets at most 20 ratings. We randomly put 0 to 20 real ratings for each of these cold items in the dataset. We name it as ICS (Item Cold Start) dataset.
A tenth of users, i.e. 604 users, are used as cold users. We name it as UCS (User Cold Start) dataset. In the dataset used to evaluate the proposed method, each user has rated at least 20 items. Therefore, in order to evaluate the proposed method in cold start conditions (i.e. UCS dataset), a number of items are randomly selected based on the method proposed for each user. This number of items selected is less than the total number of items rated by each user. For each user, the number of initial items in the profile is equal to different values up to 10.
A tenth of ratings are also used as general test set in a different testbed. We name it as TN (Traditional Non-cold start) dataset. It is noteworthy to be mentioned that ICS (or UCS or TN) is randomly produced 30 times and consequently, we conduct 30 independent experiments on ICS (respectively on UCS or TN) for each method and then the averaged result on these 30 independent experiments is considered as the performance of the method.
Evaluation criteria
To evaluate the proposed method, the Mean Absolute Error (MAE) criterion is calculated using Equation 13.
In Equation 13, R
ij
is the actual rating of the ith user to the jth item,
A significance test [36] is a statistical method to validate that the difference between performances of two or more competent methods is statistically valid at a (1-p)-level of confidence and it is due to chance at a (1-p)-level of confidence. The significance statistical test can be done on different evaluation criteria. The term “at a p-level of confidence” means “with the probability of p’’.
In order to evaluate the proposed method, we compare it with some basic raw methods such as LMRS, LULCS, GMRS, LMRS+ (LMRS using demographic information), as well as classical user-based collaborative filtering (or no-profile expansion abbreviated by NoPE). Also, we use some more different methods such as Non-Normalized ConF RS [37], Singular Value Decomposition based RS (denoted by SVD) [38], Popularity based RS (denoted by Pop) [39], and Ontology-based Top-N RS using Matrix Factorization (denoted by OTopN) [40] as the state of the art baselines. All these methods use their default parameters by their papers. The tests have been differently performed on the three datasets: ICS, UCS and TN.
In addition, the weight of each demographic data of age, sex, and occupation is tuned by a validation set. Also, the weight of each ontological information of movies is tuned by a validation set. Z and Q parameters are set to 25 as it is experimentally the best choice [36].
Experimental results
The methods used in the experiments are compared according to the two criteria: (a) MAE and (b) RC. It should be noted that the proposed method is named according to the type of strategy used to expand the profile, here we name it ILMRS & LAMR+GMAR headings. Figure 2 shows the results for the three benchmarks and MAE criteria for different methods. As it is clear from the results of Fig. 2, the proposed method has the best performance in almost all cases. In this section, different methods are based on the two criteria of MAE and RC. As it is clear from the results of Fig. 3, the proposed method has the best performance in almost all cases in terms of accuracy. The results in Figs. 2 and 3 show that the proposed method has the best performance on the basis of RC and MAE criteria on all benchmarks. It is more dominant in ICS benchmark. As shown in Fig. 2, the best MAE value for the different RS methods is obtained when the parameter value l = 10. Also, the best RC for the different RS methods is obtained when the parameter value l = 50. The best RC for the proposed RS method is obtained when the parameter value l ≥ 10. Therefore, according to the results in Figs. 2 and 3, the best profile expansion size is 10. Therefore, the profile expansion size will be 10 from here on, i.e. l = 10.

The performance comparison of different RSs in terms of MAE for different profile expansion sizes on a) (top-left) TN dataset, b) (top-right) ICS dataset and c) (bottom) UCS dataset.

The performance comparison of different RSs in terms of RC for different profile expansion sizes on a) (top-left) TN dataset, b) (top-right) ICS dataset and c) (bottom) UCS dataset.
Figures 4 and 5 show respectively MAE and RC values for different RS methods. The significance statistical test done on the results of Figure 5 indicates p value is 0.0375. Figure 6 depicts accuracy of Top-N recommendations presented by different RS methods on ICS benchmark.

The performance comparison of different RSs in terms of MAE for profile expansion size of 10 on a) (top-left) TN dataset, b) (top-right) ICS dataset and c) (bottom) UCS dataset.

The performance comparison of different RSs in terms of RC for profile expansion size of 10 on a) (top-left) TN dataset, b) (top-right) ICS dataset and c) (bottom) UCS dataset.

The performance comparison of different RSs in terms of recall of Top-N recommendations for profile expansion size of 10 on ICS dataset.
In this paper, a new hybrid approach is proposed based on the combination of demographic similarity and cosine similarity between users in order to solve the cold start problem of new user type. Also, a new approach is proposed based on the combination of ontological similarity and cosine similarity between items in order to solve the cold start problem of new item type. The main idea of the proposed method is to expand user profiles based on different strategies to build higher-performing profiles for users. The results from the experiments show better performance of the proposed method than other methods. One of the suggestions that can be considered for future work is to use different information about the content of items and users. Using this additional information related to items and users can increase the efficiency of the recommender systems, especially in the case of the cold start.
So far, we discussed several solutions to cold start problem in recommender systems. However, using item ontology, user ontology, semantic similarity improved by WordNet can be the future guidelines for research.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
