Hybrid recommender systems: A systematic literature review

Abstract

Recommender systems are software tools used to generate and provide suggestions for items and other entities to the users by exploiting various strategies. Hybrid recommender systems combine two or more recommendation strategies in different ways to benefit from their complementary advantages. This systematic literature review presents the state of the art in hybrid recommender systems of the last decade. It is the first quantitative review work completely focused in hybrid recommenders. We address the most relevant problems considered and present the associated data mining and recommendation techniques used to overcome them. We also explore the hybridization classes each hybrid recommender belongs to, the application domains, the evaluation process and proposed future research directions. Based on our findings, most of the studies combine collaborative filtering with another technique often in a weighted way. Also cold-start and data sparsity are the two traditional and top problems being addressed in 23 and 22 studies each, while movies and movie datasets are still widely used by most of the authors. As most of the studies are evaluated by comparisons with similar methods using accuracy metrics, providing more credible and user oriented evaluations remains a typical challenge. Besides this, newer challenges were also identified such as responding to the variation of user context, evolving user tastes or providing cross-domain recommendations. Being a hot topic, hybrid recommenders represent a good basis with which to respond accordingly by exploring newer opportunities such as contextualizing recommendations, involving parallel hybrid algorithms, processing larger datasets, etc.

Keywords

Hybrid recommendations recommender systems systematic review recommendation strategies

1. Introduction

Historically people have relied on their peers or on experts’ suggestions for decision support and recommendations about commodities, news, entertainment, etc. The exponential growth of the digital information in the last 25 years, especially in the web, has created the problem of information overload. Information overload is defined as “stress induced by reception of more information than is necessary to make a decision and by attempts to deal with it with outdated time management practices”.1 This problem limits our capacity to review the specifications and choose between numerous alternatives of items in the online market. On the other hand, information science and technology reacted accordingly by developing information filtering tools to alleviate the problem. Recommender Systems (RSs) are one such tools that emerged in the mid 90s. They are commonly defined as software tools and techniques used to provide suggestions for items and other recommendable entities to users [1]. In the early days (beginning of 90s) RSs were the study subject of other closely related research disciplines such as Human Computer Interaction (HCI) or Information Retrieval (IR) [2]. Today, RSs are found everywhere helping users in searching for various types of items and services. They also serve as sales assistants for businesses increasing their profits.

Technically all RSs employ one or more recommendation strategies such as Content-Based Filtering (CBF), Collaborative Filtering (CF), Demographic Filtering (DF), Knowledge-Based Filtering (KBF), etc. described below:

Collaborative filtering: The basic assumption of CF is that people who had similar tastes in the past will also have similar tastes in the future. One of its earliest definitions is “collaboration between people to help one another perform filtering by recording their reactions to documents they read” [3]. This approach uses ratings or other forms of user generated feedback to spot taste commonalities between groups of users and then generates recommendations based on inter-user similarities [2]. CF recommenders suffer from problems like cold-start (new user or new item), “gray sheep” (users that do not fit in any taste cluster), etc.

Content-based filtering: CBF is based on the assumption that people who liked items with certain attributes in the past, will like the same kind of items in the future as well. It makes use of item features to compare the item with user profiles and provide recommendations. Recommendation quality is limited by the selected features of the recommended items. Same as CF, CBF suffer from the cold-start problem.

Demographic filtering: DF uses demographic data such as age, gender, education, etc. for identifying categories of users. It does not suffer from the new user problem as is doesn’t use ratings to provide recommendations. However, it is difficult today to collect enough demographic information that is needed because of online privacy concerns, limiting the utilization of DF. It is still combined with other recommenders as a reinforcing technique for better quality.

Knowledge-based filtering: KBF uses knowledge about users and items to reason about what items meet the users’ requirements, and generate recommendations accordingly [4]. A special type of KBFs are constraint-based RSs which are capable to recommend complex items that are rarely bought (i.e. cars or houses) and manifest important constrains for the user (price) [5]. It is not possible to successfully use CF or CBF in this domain of items as few user-system interaction data are available (people rarely buy houses).

One of the earliest recommender systems was Tapestry, a manual CF mail system [3]. The first computerized RS prototypes also applied a collaborative filtering approach and emerged in mid 90s [6, 7]. GroupLens was a CF recommendation engine for finding news articles. In [7] the authors present a detailed analysis and evaluation of the Bellcore video recommender algorithm and its implementation embedded in the Mosaic browser interface. Ringo used taste similarities to provide personalized music recommendations. Other prototypes like NewsWeeder and InfoFinder recommended news and documents using CBF, based on item attributes [8, 9]. In late 90s important commercial RS prototypes also came out with Amazon.com recommender being the most popular. Many researchers started to combine the recommendation strategies in different ways building hybrid RSs which we consider in this review. Hybrid RSs put together two or more of the other strategies with the goal of reinforcing their advantages and reducing their disadvantages or limitations. One of the first was Fab, a meta-level recommender (see Section 3.4.6) which was used to suggest websites [10]. It incorporated a combination of CF to find users having similar website preferences, with CBF to find websites with similar content. Other works such as [11] followed shortly and hybrid RSs became a well established recommendation approach.

The continuously growing industrial interest in the recent and promising domains of mobile and social web has been followed by a similar increase of academic interest in RSs. ACM RecSys annual conference2 is now the most significant event for presenting and discussing RS research. The work of Burke in [12] is one of the first qualitative surveys addressing hybrid RSs. The author analyzes advantages and disadvantages of the different recommendation strategies and provides a comprehensive taxonomy for classifying the ways they combine with each other to form hybrid RSs. He also presents several hybrid RS prototypes falling into the 7 hybridization classes of the taxonomy. Another early exploratory work is [13] where several experiments combining personalized agents with opinions of community members in a CF framework are conducted. They conclude that this combination produces high-quality recommendations and that the best results of CF are achieved using large data of user communities. Other review works are more generic and address RSs in general, not focusing in any RS type. They reflect the increasing interest in the field in quantitative terms. In [14] the authors perform a review work of 249 journal and conference RS publications from 1995 to 2013. The peak publication period of the works they consider is between 2007 and 2013 (last one-third of the analyzed period). They emphasize the fact that the current hybrid RSs are incorporating location information into existing recommendation algorithms. They also highlight the proper combination of existing methods using different forms of data, and evaluating other characteristics (e.g., diversity and novelty) besides accuracy as future trends. In [15] the authors review 210 recommender system articles published in 46 journals from 2001 to 2010. They similarly report a rapid increase of publications between 2007 and 2010 and predict an increase interest in mixing existing recommendation methods or using social network analysis to provide recommendations.

In this review paper we summarize the state of the art of hybrid RSs in the last 10 years. We follow a systematic methodology to analyze and interpret the available facts related to the 7 research questions we defined. This methodology defined at [16, 17] provides an unbiased and reproducible way for undertaking a review work. Unlike the other review works not focused in any RS type [14, 15], this systematic literature review is the first quantitative work that is entirely focused in recent hybrid RS publications. For this reason it was not possible for us to have a direct basis with which to compare our results. Nevertheless we provide some comparisons of results for certain aspects in which hybrid RSs do not differ from other types of RSs. To have a general idea about what percentage of total RS publications address hybrid RSs we examined [18], a survey work about RSs in general. Here the authors review the work of 330 papers published in computer science and information systems conferences proceedings and journals from 2006 to 2011. Their results show that hybrid recommendation paradigm is the study object of about 14.5% of their reviewed literature.

We considered the most relevant problems hybrid RSs attempt to solve, the data mining and machine learning methods involved, RS technique combinations the studies utilize and the hybridization classes the proposed systems fall into. We also observed the domains in which the contributions were applied and the evaluation strategies, characteristics and metrics that were used. Based on the suggestions of the authors and the identified challenges we also present some future work directions which seem promising and in concordance with the RS trends. Many primary studies were retrieved from digital libraries and the most relevant papers were selected for more detailed processing (we use the terms paper and study interchangeably to refer to the same object/concept). We hope this work will help anyone working in the field of (hybrid) RSs, especially by providing insights about future trends or opportunities. The remainder of the paper is structured as follows. Section 2 briefly summarizes the methodology we followed, the objectives and research questions defined, the selection of papers and the quality assessment process. Section 3 introduces the results of the review organized in accordance with each research question. Section 4 discusses and summarizes each result whereas Section 5 concludes. Finally we list the selected papers in Appendix.

2. Methodology

The review work of this paper follows the guidelines that were defined by Kitchenham and Charters [17] for systematic literature reviews in Software Engineering. The purpose of a systematic literature review is to present a verifiable and unbiased treatment of a research topic utilizing a rigorous and reproducible methodology. The guidelines that were followed are high level and do not consider the influence of research questions type on the review procedures. In Fig. 1 we present the protocol of the review. It represents a clear set of steps which assist the management of the review process. The protocol was defined by the first author and verified by the second author. In the following sections we describe each step we summarized in Fig. 1.

Figure 1.

Systematic literature review protocol.

2.1 Research questions, search string and digital sources

The primary goal of this systematic literature review is to understand what challenges hybrid RSs could successfully address, how they are developed and evaluated and in what ways or aspects they could be experimented with. To this end, we defined the following research questions:

RQ1 What are the most relevant studies addressing hybrid recommender systems? RQ2 What problems and challenges are faced by the researchers in this field? RQ3a Which data mining and machine learning techniques are used in hybrid RSs? RQ3b What recommendation techniques are combined and which problems they solve? RQ4 What hybridization classes are used, based on the taxonomy of Burke? RQ5 In what domains are hybrid recommenders applied? RQ6a What methodologies are used for the evaluation and what metrics they utilize? RQ6b Which RS characteristics are evaluated and what metrics they use? RQ6c What datasets are used for training and testing hybrid RSs? RQ7 Which directions are most promising for future research?

Table 1
Selected sources to search for primary studies

Source	URL
SpringerLink	https://link-springer-com.web.bisu.edu.cn
Science direct	https://www-sciencedirect-com.web.bisu.edu.cn
IEEExplore	http://ieeexplore.ieee.org
ACM digital library	http://dl.acm.org
Scopus	https://www-scopus-com.web.bisu.edu.cn

Table 2

Keywords and synonyms

Keyword	Synonyms
Hybrid	Hybridization, Mixed
Recommender	Recommendation
System	Software, Technique, Technology, Approach, Engine

Furthermore we picked five scientific digital libraries that represent our primary sources for computer science research publications. They are listed in Table 1. Other similar sources were not considered as they mainly index data from the primary sources. We defined (“Hybrid”, “Recommender”, “Systems”) as the basic set of keywords. Then we added synonyms to extend it and obtain the final set of keywords. The set of keywords and synonyms is listed in Table 2. The search string we defined is: (“Hybrid” OR “Hybridization” OR “Mixed”) AND (“Recommender” OR “Recommendation”) AND (“System” OR “Software” OR “Technique” OR “Technology” OR “Approach” OR “Engine”).

Table 3

Inclusion and exclusion criteria

Inclusion criteria
Papers presenting hybrid recommender systems, algorithms, approaches, etc.
Papers that even though do not specifically present hybrid RSs, provide recommendations combining different data mining
techniques.
Papers from conferences and journals
Papers published from 2005 to 2015
Papers written in English language only
Exclusion criteria
Papers not addressing recommender systems at all
Papers addressing RSs but not implying any hybridization or combination of different approaches or data mining
techniques.
Papers that report only abstracts or slides of presentation, lacking detailed information
Grey literature

2.2 Selection of papers

Following Step 4 of the protocol, we applied the search string in the search engines of the five digital libraries and found 9673 preliminary primary studies (see Table 4). The digital libraries return different numbers of papers because of the dissimilar filtering settings they use in their search engines. This retrieval process was conducted during May 2015. To objectively decide whether to select each preliminary primary study for further processing or not, we defined a set of inclusion/exclusion criteria listed in Table 3. The inclusion/exclusion criteria are considered as a basis of concentrating in the most relevant studies with which to achieve the objectives of the review. Duplicate papers were removed and a coarse selection phase followed. Processing all of them strictly was not practical. Therefore we decided to include journal and conference papers only, leaving out gray literature, workshop presentations or papers that report abstracts or presentation slides. We initially analyzed title, publication year and publication type (journal, conference, workshop, etc.). In many cases abstract or even more parts of each paper were examined for deciding to keep it or not. Our focus in this review work is on hybrid recommender systems. Thus we selected papers presenting mixed or combined RSs dropping out any paper addressing single recommendation strategies or papers not addressing RSs at all. Hybrid RSs represent a somehow newer family of recommender systems compared to other well known and widely used families such as CF or CBF. Therefore the last decade (2005–2015) was considered an appropriate publication period. Using inclusion/exclusion and this coarse selection step we reached to a list of 240 papers. In the next step we performed a more detailed analysis and selection of the papers reviewing abstract and other parts of every paper. Besides relevance based on the inclusion/exclusion criteria, completeness (in terms of problem definition, description of the proposed method/technique/algorithm and evaluation of results) of each study was also taken into account. Finally we reached to our set of 76 included papers. The full list is presented in Appendix together with the publication details.

Table 4
Number of papers after each selection step

	Number of papers at the end of step:
Digital source	Search and retrieval	Coarse selection	Detailed selection
SpringerLink	4152	50	13
Scopus	3582	27	9
ACM digital library	1012	53	13
Science direct	484	35	12
IEEExplore	443	75	29
Total	9673	240	76

Table 5

Quality assessment questions

Quality question	Score	Weight
QQ1. Did the study clearly describe the problems that is addressing?	yes/partly/no (1/0.5/0)	1
QQ2. Did the study review the related work for the problems?	yes/partly/no (1/0.5/0)	0.5
QQ3. Did the study recommend any further research?	yes/partly/no (1/0.5/0)	0.5
QQ4. Did the study describe the components or architecture of the proposed system?	yes/partly/no (1/0.5/0)	1.5
QQ5. Did the study provide an empirical evaluation?	yes/partly/no (1/0.5/0)	1.5
QQ6. Did the study present a clear statement of findings?	yes/partly/no (1/0.5/0)	1

2.3 Quality assessment

We also defined 6 questions listed in Table 5 for the quality estimation of the selected studies. Each of the question receives score values of 0, 0.5 and 1 which represent answers “no”, “partly” and “yes” correspondingly. The questions we defined do not reflect equal level of importance in the overall quality of the studies. For this reason we decided to weight them with coefficients of 0.5 (low importance) 1 (medium importance) and 1.5 (high importance). We set higher weight to the quality questions that address the components/architecture of the system/solution (QQ4) and the empirical evaluation (QQ5). Quality questions that address problem description (QQ1) and statement of results (QQ6) got medium importance. We set a low importance weight to the two questions that address the related studies (QQ2) and future work (QQ3). The papers were split in two disjoint subsets. Each subset of papers was evaluated by one of the authors. In cases of indecision the quality score was set after a discussion between the authors. At the end, the final weighted quality score of each study was computed using the following formula:

$\displaystyle\textit{score}=\sum_{i=1}^{6}w_{i}*v_{i}/6$

$w_{i}$ is the weight of question $i$ (0.5, 1, 1.5), $v_{i}$ is the vote for question $i$ (0, 0.5, 1).

After this evaluation, cross-checking of the assessment was done on arbitrary studies (about 40% of included papers) by the second author. At the end, an agreement on differences was reached by discussion.

Table 6
Data extraction form

Extracted data	Explanation	RQ
ID	A unique identifier of the form Pxx we set to each paper	–
Title	–	RQ1
Authors	–	–
Publication year	–	RQ1
Conference year	–	–
Volume	Volume of the journal	–
Location	Location of the conference	–
Source	Digital library from which was retrieved	–
Publisher	–	–
Examiner	Name of person who performed data extraction	–
Participants	Study participants like students, academics, etc.
Goals	Work objectives	–
Application domain	Domain in which the study is applied	RQ5
Approach	Hybrid recommendation approach applied	RQ3b
Contribution	Contribution of the research work	–
Dataset	Public dataset used to train and evaluate the algorithm	RQ6c
DM techniques	Data mining techniques used	RQ3a
Evaluation methodology	Methodology used to evaluate the RS	RQ6a
Evaluated characteristic	RS characteristics evaluated	RQ6b
Future work	Suggested future works	RQ7
Hybrid class	Class of hybrid RS	RQ4
Research problem	–	RQ2
Score	Overall weighted quality score	–
Other information	–	–

2.4 Data extraction

Data extraction was carried on the final set of selected primary studies. We collected both paper meta-data (i.e., author, title, year, etc.) and content data important to answer our research questions like problems, application domains, etc. Table 6 presents our data extraction form. In the first column we list the extracted data, in the second column we provide an explanation for some of the extracted data which may seem unclear and in the third column the research question with which the data is related. All the extracted information was stored in Nvivo3 which was used to manage data extraction and synthesis process. Nvivo is a data analysis software tool that helps in automating the identification and the labeling of the initial segments of text from the selected studies.

2.5 Synthesis

For the synthesis step we followed Cruzes and Dyba methodology for the thematic synthesis [19]. Their methodology uses the concept of codes which are labeled segments of text to organize and aggregate the extracted information. Following the methodology we defined some initial codes which reflected the research questions. Some examples include the first research problems found, hybrid recommendation classes, first application domains, data mining techniques, recommendation approaches and evaluation methodologies. After completing the reading we had refined or detailed each of the initial codes with more precise sub-codes (leaf nodes in NVivo) which were even closer to the content of the selected papers, covering all the problems found, all the datasets used, and similar detailed data we found. We finished assigning codes to all the highlighted text segments of the papers and then the codes were aggregated in themes (of different levels if necessary) by which the papers were grouped. Afterwards a model of higher-order themes was created to have an overall picture. The research questions were mapped with the corresponding themes. Finally, the extracted data were summarized in categories which are reported in the results section (in pictures or tables) associated with the research questions they belong to.

3. Results

In this section we present the results we found from the selected studies to answer each research question. We illustrate the different categories of problems, techniques, hybridization classes, evaluation methodologies, etc. with examples from the included studies. The results are further discussed in the next section.

3.1 RQ1: Included studies

Figure 2.

Distribution of studies per publication year.

RQ1 addresses the most relevant studies that present Hybrid RSs. We selected 76 papers as the final ones for further processing. They were published in conference proceedings and journals from 2005 to 2015. The publication year distribution of the papers is presented in Fig. 2. It shows that most of the hybrid RS papers we selected were published in the last 5 years.

For the quality assessment process we used the quality questions listed in Table 5. In Fig. 3, the box plots of quality score distributions per study type (conference or journal) are shown. We see that about 75% of journal studies have quality score higher than 0.9. Same is true for about 35% of conference studies. In Fig. 4 we present the average quality score about each quality question. QQ4 (Did the study describe the components or architecture of the proposed system?) has the highest average score (0.947) wheres QQ3 (Did the study suggest further research?) has the lowest (0.651). The weighted quality score is higher than 0.81 for any included paper. Only one journal study got a weighted average score of 1.0 (highest possible).

Figure 3.

Boxplot of quality score per publication type.

Figure 4.

Average score of each quality question.

Figure 5.

Addressed problems.

3.2 RQ2: Research problems

To answer RQ2 we summarize the most important RS problems the studies try to solve. A total of 12 problems were found. The most frequent are presented in Fig. 5 with the corresponding number of studies where they appear. Studies may (and often do) address more than one problem. Same thing applies for other results (data mining techniques, domains, evaluation metrics, etc.) reported in this section. Below we describe each of the problems:

Cold-start This problem is heavily addressed in the literature [20, 21] and has to do with recommendations for new users or items. In the case of new users the system has no information about their preferences and thus fails to recommend anything to them. In the case of new items the system has no ratings for these items and doesn’t know to whom recommend them. To alleviate cold-start, authors in use a probabilistic model to extract latent features from item’s representation. Using the latent features they generate accurate pseudo ratings, even in cold-start situation when few or no ratings are provided. Another example is where the authors try to solve the new user cold-start in the e-learning domain by combining CF with a CBF representation of learning contents. Cold-start problem is also treated in where the authors merge the weighted outputs of different recommendation strategies using Ordered Weighted Averaging (OWA), a mathematical technique first introduced in [22]. In total, cold-start was found in 23 studies. Data sparsity This problem rises from the fact that users usually rate a very limited number of the available items, especially when the catalog is very large. The result is a sparse user-item rating matrix with insufficient data for identifying similar users or items, negatively impacting the quality of the recommendations. Data sparsity is prevalent in CF RSs which rely on peer feedback to provide recommendations. In data sparsity of cross-domain recommendations is solved using a factorization model of the triadic relation user-item-domain. Also in we find an attempt to solve data sparsity by treating each user-item rating as predictor of other missing ratings. They estimate the final ratings by merging ratings of the same item by other users, different item ratings made by the same user and ratings of other similar users on other similar items. Another example is where CF is combined with Naive Bayes in a switching way. Data sparsity was a research problem of 22 studies. Accuracy Recommendation accuracy is the ability of a RS to correctly predict the item preferences of each user. Much attention has been paid to improve the recommendation accuracy since the dawn of RSs. Obviously there is still place for recommendation accuracy improvements. This is especially true in data sparsity situations, as accuracy and data sparsity are two problems that appear together in 6 studies (e.g., ). In a Bayesian network model with user nodes, item nodes, and feature nodes is used to combine CF with CBF and attain better recommendation quality. Other example is where a web content RS is constructed. The authors construct user’s long term interest based on his/her navigation history. Than the similarity of user’s profile with website content is computed to decide whether to suggest the website or not. Experiments conducted with news websites show improved accuracy results. Improving accuracy was a research objective of 16 studies. Scalability This is a difficult to attain characteristic which is related to the number of users and items the system is designed to work for. A system designed to recommend few items to some hundreds of users will probably fail to recommend hundreds of items to millions of people, unless it is designed to be highly scalable. Hyred in is an example of a system designed to be scalable and overcome data sparsity problem as well. The authors combine a modified Pearson correlation CF with distance-to-boundary CBF. They find the nearest and furthest neighbors of each user to reduce the dataset. The use of this compressed dataset improves scalability, alleviates sparsity, and also slightly reduced the computational time of the system. In the authors propose a hybrid RS designed to recommend images in social networks. They involve CF and CBF in a weighted way and also consider aesthetic characteristics of images for a better filtering, which overcomes the problem of scalability and cold-start as well. In a system with better scalability is conceived by combining Naive Bayer and SVM with CF. Improving scalability was addressed in 11 studies. Diversity This is a desired characteristic that is getting attention recently [23]. Having diverse recommendations is important as it helps to avoid the popularity bias. The latter is having a recommendation list with items very similar to each other (e.g., showing all the episodes of a very popular saga). A user that is not interested in one of them is probably not interested in any of them and gets no value from that recommendation list. K-Furthest Neighbors, the inverted neighborhood model of K-NN is used in for the purpose of creating more diverse recommendations. The authors report an increased diversity. However, the user study they conduct shows that the perceived usefulness of it is not different from the one of traditional CF. In the concept of experts is utilized to find novel and relevant items to recommend. The ratings of users are analyzed and some of the users are promoted as “experts” of a certain taste. They generate recommendations of their for the rest of the “normal” users in that item taste. Diversity is also addressed in totaling in 3 studies. Other These are other problems appearing in few studies. They include Lack of Personalization, Privacy Preserving, Noise Reduction, Data source Integration, Lack of Novelty and User preference Adaptiveness.

Table 7
Distribution of studies by DM/ML techniques

DM/ML technique	Studies
K-NN	59
Clustering	34
Association rules	17
Fuzzy logic	14
Matrix manipulation	9
Other	19

3.3 RQ3a: Data mining and machine learning techniques

In this section we address the distribution of the studies according to the basic Data Mining (DM) and Machine Learning (ML) techniques they use to build their hybrid RSs. The variety of DM and ML techniques or algorithms used is high. Authors typically use different techniques to build the diverse components of their solutions or prototypes. In Table 7 we present the most frequent that were found in the included studies. Below we describe some of them. More details about the characteristics of DM/ML techniques and how they are utilized to build RSs can be found at [24].

K-NN K-Nearest Neighbors is a well known classification algorithm with several versions and implementations, widely utilized in numerous data mining and other applications. This technique is popular among collaborative filtering RSs which represent the most common family of recommenders. It is mostly utilized to analyze neighborhood and find users of similar profiles or analyze items’ catalog and find items with similar characteristics. K-NN was found in a total of 59 studies. Clustering There are various clustering algorithms used in RSs and other data mining applications. They typically try to put up a set of categories with which data can be identified. The most popular is K-means which partitions the entire data into K clusters. In RSs clustering is mostly applied to preprocess the data. In the authors experiment with K-way (similar to K-means) clustering and Bisecting K-means for grouping different types of learning items. They also use CBF to create learners’ profiles and build an e-learning recommender with improved accuracy. An other example is where websites are clustered using co-occurence of pages and the content data of pages. The results are aggregated to get the final recommendations and overcome data sparsity. In total clustering algorithms were used in 34 studies. Association rules Association rule mining tries to discover valuable relations (association rules) in large databases of data. These associations are in the form X => Y, where X and Y are sets of items. The association that are above a minimum level of support with an acceptable level of confidence can be used to derive certain conclusions. In recommender systems this conclusions are of the form “X likes Y” where X is a user to whom the system can recommend item Y. In information collected from a discussion group is mined and association rules are used to form the user similarity neighborhood. Word Sense Disambiguation is also used to select the appropriate semantically related concept from posts which are then recommended to the appropriate users of the forum. This hybrid meliorates different problems such as cold-start, data sparsity and scalability. In classification based on association methods is applied to build a RS in the domain of tourism. The system is more resistant to cold-start and sparsity problems. To overcome cold-start, the authors in propose a procedure for finding similar items by association rules. Their algorithm considers the user-item matrix as a transaction database where the user Id is the transactional Id. They find the support of each item and keep items with support greater than a threshold. Afterwards, they calculate the confidence of remaining rules and rule scores by which they find the most similar item to any of the items. Association rules were found in 17 studies. Fuzzy logic Also called fuzzy set theory it is a set of mathematical methods that can be used to build hybrid RSs. Those methods are also called reclusive in the literature. Contrary to CF which relies on neighborhood preferences without considering item characteristics, they require some representation of the recommended items [25]. Reclusive methods are complementary to collaborative methods and are often combined with them to form hybrid RSs. An example of using Fuzzy logic is where better accuracy is achieved by combining 2 CFs with a fuzzy inference system in a weighted way to recommend leaning web resources. In fuzzy clustering is used to integrate user profiles retrieved by a CF with Point Of Interest (POI) data retrieved from a context aware recommender. The system is used in the domain of tourism and provides improved accuracy. In total Fuzzy logic was found in 14 studies. Matrix manipulation Here we put together the different methods and algorithms that are based on matrix operations. The methods we identified are Singular Value Decomposition (SVD), Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), Dimensionality Reduction and similar matrix factorization techniques. Matrix manipulation methods are often used to build low error collaborative RSs and were especially promoted after the Netflix challenge was launched in 2006. In a topic model based on LDA is used to learn the probability that a user rates an item. An other example is where Dimensionality Reduction is used to solve sparsity and scalability in a multi-criteria CF. They were found in 9 studies. Other Other less frequent techniques such as Genetic Algorithms, Naive Bayes, Neural Networks, Notion of Experts, Statistical Modeling, etc. were found in 19 papers.

3.4 RQ3b: Recommendation technique combinations

In this section we present a list of the most common technique combinations that form hybrid RSs. We also present the problems each of this combinations is most frequently associated with. In the following subsections the construct and technical details of some of the prototypes implementing each combination is described. Table 8 presents the summarized results.

Table 8
Hybrid recommendation approaches distributed per problem

	Hybrid recommenders and studies
Problem	CF-X	CF-CBF	CF-CBF-X	IICF-UUCF	CBF-X	Other
Cold-start	2	3	2	1	1	5
Data sparsity	0	5	3	3	4	6
Accuracy	2	3	0	2	2	4
Scalability	0	2	2	0	2	2
Diversity	2	0	0	0	0	1
Other	0	2	1	1	1	2
Total	6	15	8	7	10	20

3.4.1 CF-X

Here we report studies that combine CF with one other technique which is not CBF (those are counted as CF-CBF). An example of this combination is where the authors go hybrid to improve the performance of a multi-criteria recommender. They base their solution on the assumption that usually only a few selection criteria are the ones which impact user preferences about items and their corresponding ratings. Clustering is used first to group users based on the items’ criteria they prefer. CF is then used within each cluster of similar users to predict the ratings. They illustrate their method by recommending hotels from TripAdvisor4 and report performance improvements over traditional CF. Other attempt to improve the predictive accuracy of traditional CF is . Here the authors integrate in CF discrete demographic data about the users such as gender, age, occupation, etc. Fuzzy logic is used to compute similarities between users utilizing this extra demographic data and integrate the extra similarities with the user-based similarities calculated from ratings history. After calculating the final user similarities their algorithm predicts the rating values. The extra performance which is gained from the better user similarities that are obtained, comes at the cost of a slightly larger computational time which is however acceptable. In total CF-X combination was found in 6 studies with X being KBF, DF or a DM/ML technique from those listed in Table 6.

3.4.2 CF-CBF

This is a very popular hybrid RS utilizing the two most successful recommendation strategies. In many cases the recommendations of both systems are weighted to produce the final list of predictions. In other cases the hybrid RS switches from CF to CBF or is made up of a more complex type of combination (see Section 3.5). An example is where the authors develop a hybrid RS suitable for working with high volumes of data and solve scalability problems in e-commerce systems. Their solution first involves CF (Pearson’s product moment coefficients) to reduce the dataset by finding the nearest neighbors of each user, discarding the rest and reducing the dataset. Afterwards distance-to-boundary CBF is used to define the decision boundary of items purchased by the target user. The final step combines the CF score (correlation coeficient between two customers) with the distance-to-boundary score (distance between the decision boundary and each item) in a weighted linear form. The authors report an improved accuracy of their hybrid RS working in the reduced dataset, compared to other existing algorithms that use full datasets.

In the authors propose a CF-CBF hybrid recommender which is based on Bayesian networks. This model they build uses probabilistic reasoning to compute the probability distribution over the expected rating. The weight of each recommending strategy (CF and CBF) is automatically selected, adapting the model to the specific conditions of the problem (it can be applied to various domains). The authors demonstrate that their combination of CF and CBF improves the recommendation accuracy. Other studies involve similar mathematical models or constructs (e.g., fuzzy logic) to put together CF and CBF and gain performance or other benefits. In total CF-CBF contributions were found in 15 studies.

3.4.3 CF-CBF-X

Those are cases in which CF and CBF are combined together with a third approach. One example is where CF and CBF are combined with DF to generate recommendations for groups of similar profiles (users). These kind of recommendations are particularly useful in online social networks (e.g., for advertising). The goal of the authors is to provide good recommendations in data sparsity situations. First CBF is used to analyse ratings and items’ attributes. CF is then invoked as the second stage of the cascade to generate the group recommendations. DF is used to reinforce CF in the cases of sparse profiles (users with few ratings). In total CF-CBF-X was found in 8 studies. X is mostly a clustering technique or DF.

3.4.4 IICF-UUCF

Item-Item CF and User-User CF are two forms of CF recommenders, differing on the way the neighborhoods are formed. Some studies combine both of them to improve overall CF performance. An example is where the authors present a hybrid recommendation framework they call Collaborative Filtering Topic Model (CFTM) which considers both user’s reviews and ratings about items of a certain topic (or domain) in e-commerce. The first stage which is offline performs sentiment analysis in the reviews to calculate the User or Item similarity. The second stage of the cascade uses IICF or UUCF (switching) to predict the ratings. The authors evaluate using 6 datasets of different domains from Amazon and report that their hybrid approach performs better than traditional CF, especially in sparsity situations. IICF-UUCF combinations were found in 7 studies.

3.4.5 CBF-X

There were also 10 studies in which CBF is combined with another technique X which is not CF (counted as CF-CBF). X represents different approaches like KBF and DF or DM/ML techniques like clustering etc. One example is where the authors describe and use the interesting notion of user lifestyle. They select demographic information, consumer credit data and TV program preferences as lifestyle indicators, and confirm their significance by performing statistical analysis on 502 users. The most significant lifestyle attributes are binary encoded and used to form the neighborhoods and ratings of each user by means of Pearson correlation. The authors call the resulting complete (in terms of ratings) matrix pseudoUser-item matrix. It is then used for a Pearson based (classical CF) prediction of the original user-item ratings. Considerable performance improvements are reported.

3.4.6 Other

Other implementations include combinations of the same recommendation strategy (e.g., CF1-CF2 with different similarity measures or tuning parameters each), trust-aware recommenders that are being used in social communities, prototypes using association rules mining, neural networks, genetic algorithms, dimensionality reduction, social tagging, semantic ontologies, pattern mining or different machine learning classifiers.

3.5 RQ4: Classes of hybridization

To answer RQ4 we classified the examined hybrid RSs according to the taxonomy proposed by Burke [12]. This taxonomy categorizes hybrid RSs in 7 classes based on the way the different recommendations techniques are aggregated with each other. Each class is explained in the subsections below where we discuss in more details few examples from the included papers. The results are summarized in Fig. 6.

Figure 6.

Distribution of studies per hybridization class.

3.5.1 Weighted

Weighted hybrids were the most frequent. They compute the scores of the items they recommend by aggregating the output scores of each recommendation technique using weighted linear functions. One of the first weighted recommenders was P-Tango [26] which combined CF and CBF rating scores in a linear weighted way to recommend online newspapers. In P-Tango, aggregation was made giving equal initial weights to each score and then possibly adapting by the feedback of users. The weights of CF and CBF are set on a per-user basis enabling the system to determine the optimal mix for each user and alleviating the “gray sheep” problem. In the authors propose a weighting method for combining user-user, user-tag and user-item CF relations in social media. The method they propose computes the final rating score of an item for a user as the linear combination of the above three CF relations. Unlike the traditional CF, this weighted hybrid CF recommender is completely based on tags and does not require that users provide explicit rating scores for the items that are recommended (e.g., photos). An other example is where the authors combine a content-based model with a rule-based model to recommend e-learning materials. They build their CBF using an education domain ontology and compute the scores of each learning material using Vector Space Model and TF-IDF. The rule-based recommender utilizes the ontology and the user’s previously visited concepts to realize a semantic mapping between user’s query and his/her semantic profile, resulting in adequate term recommendations about learning materials. The two RS modules set different weights to each recommended item based on user’s preferences and higher accuracy is achieved. Apparently the benefit of a weighted hybrid is the fact that it uses a straightforward way to combine the results of each involved technique. It is also easy to adjust priority assignment for each involved strategy by changing the weights. This class of hybrid RS was used in 22 (28.9%) of the included studies.

3.5.2 Feature combination

This type of hybrid RSs treats one recommender’s output as additional feature data, and uses the other recommender (usually content-based which makes extensive use of item features) over the new extended data. In case of a CF-CBF hybrid, the system does not exclusively rely on the collaborative data output of CF. That output is considered as additional data for the CBF which generates the final list. This reduces the sensitivity to possible sparsity of the initial data. For example, in the authors present a CF-CBF book recommender which implements an extended feature combination strategy. In the first phase new features (prefered books) are generated by applying CF among the readers. In the second phase they utilize fuzzy c-means clustering and type-2 fuzzy logic to obtained data for creating book categories of each user type (teacher, researcher, student). In the third and final phase CBF is involved to recommend the most relevant books to each user. The authors report performance improvements both in MAE and F1 accuracy scores. Also in the authors build an information system about courses and study materials for scholars. The system invokes a web crawler to collect related web pages and classifies the obtained results in different item categories (websites, courses, academic activities) using a web page classifier supported by a school ontology. An information extractor is later invoked to get significant web page features. Finally the system operates on the extra features of each item category to produce integrated recommendations based on the order of the keyword weight of each item. System verification reports higher recommendation quality and reliability. Feature combination hybrids were found in 12 (15.8%) studies.

3.5.3 Cascade

Cascade hybrids are examples of a staged recommendation process. First one technique is employed to generate a coarse ranking of candidate items and than a second technique refines the list from the preliminary candidate set. Cascades are order-sensitive; a CF-CBF would certainly produce different results from a CBF-CF. An example is which presents a mobile music cascade recommender combining SVM genre classification with collaborative user personality diagnosis. The first level of the recommendation process consists of a multi-class SVM classifier of songs based on their genre. The second level is a personality diagnosis which assumes that user preferences for songs constitute a characterization of their underlying personality. The personality type of each user is assumed to be the vector of ratings in the items the user has seen. The personality diagnosis approach estimates the probability that each active user is of the same personality type as other users. As a result the probability that a active user will like new songs is computed in a more personalized way.

In the authors combine two CF systems with different properties. The first module is responsible for retrieving the data and generating the list of neighbors for each user. This module uses two distance measures, Pearson’s coefficient and Euclidean distance in a switching way, depending on the user’s deviation from his/her average rating. The authors report that Euclidean distance performs better than Pearson’s coefficient in most of the cases. In the second module of the cascade, they experiment switching between three predictors to generate the final recommendations: Bayesian estimator, Pearson’s weighted sum and adjusted weighted sum. They also report that the Bayesian prediction gives best results. An other example of a cascade hybrid is . It implements a cascade of item-based CF and Sequential Pattern Mining (SPM) to recommend items in an e-learning environment. To adopt the CF to the e-learning domain they introduce a damping function which decreases the importance of “old” ratings. The SPM module takes in a list of k most similar items for each item and determines it support. At the end it prunes the items with support less than the threshold and generates the recommended items. The authors also apply this recommender in P2P learning environments for resource pre-fetching. Cascade hybrids were found in 8 (10.5%) studies.

3.5.4 Switching

In a switching hybrid the system switches between different recommendation techniques according to some criteria. For example, a CF-CBF approach can switch to the content-based recommender only when the collaborative strategy doesn’t provide enough credible recommendations. Even different versions of the same basic strategy (e.g., CBF1-CBF2 or CF1-CF2) can be integrated in a switching form. An example is DailyLearner, an online news recommender presented in [27]. It first employs a short-term CBF recommender which considers the recently rated news stories utilizing Nearest Neighbor text classification and Vector Space Model with TF-IDF weights. If a new story has no near neighbors the system switches to the long-term model which is based on data collected over a longer time period, presenting user’s general preferences. It uses a Naive Bayes classifier to estimate the probability of news being important or not.

In the authors build a switching hybrid RS that is based on a Naive Bayes classifier and Item-Item CF. The classifier is trained in offline phase and used to generate the recommendations. If this recommendations have poor confidence the Item-Item CF recommendations are used instead. First, they compute the posterior probability of each class generated by the Naive Bayes classifier. Then they assume that the classifier’s confidence is high if the posterior probability of the predicted class is sufficiently higher than the ones of the other classes. Movielens and Filmtrust are employed to evaluate the approach and performance improvements are reported, both in accuracy and in coverage. An other example of a switching hybrid is where the authors describe the design and implementation of a mobile locaton-aware CF-KBF recommender of touristic sites (e.g., restaurants). Their system involves both CF and KBF modules in generating recommendations. Then 3D-GIS location data are used to compute the physical distance of the mobile user from the recommended sites. The system switches from one recommendation strategy to the other and performs a distance-based re-ranking of the recommendations, choosing the sites that are physically closer to the user with higher accuracy. In most of the cases we see that complexity of switching RSs lies in the switching criteria which are mostly based on distance or similarity measures. However, this systems are sensitive to the strengths and weaknesses of the composing techniques. This hybrid RS category was found in 7 (9.2%) studies.

3.5.5 Feature augmentation

In this class of hybrids, one of the combined techniques is used to produce an item prediction or classification which is then comprised in the operation of the other recommendation technique. Feature augmentation hybrids are order-sensitive as the second technique is based on the output of the first. For example an association rules engine can generate for any item, similar items which can be used as augmented item attributes inside a second recommender to improve its recommendations. Libra presented in [28] is a content-based book recommender. It augments the textual features of the books with “related authors” and “related titles” data obtained from Amazon CF recommender to obtain a better recommendation quality. Libra uses an inductive learner to create user profiles. This inductive learner is based on vectorized bag-of-words naive Bayes text classifier. The authors report that the integrated collaborative content has a significant positive effect on recommendation performance.

presents a hybrid method which combines multidimensional clustering and CF to increase recommendation diversity. They first invoke multidimensional clustering to collect and cluster user and item data. Clusters with similar features are deleted and the remaining feature clusters are fed into the CF module. Item-Item similarity is computed using an adjusted cosine similarity which works for $m$ cluster features of each item. Finally the rating predictions are computed base on item-item similarity and the rating deviations from neighbors. The authors report an increase in recommendation diversity with only minimal loss in accuracy. Feature augmentation offers a means of improving the performance of a system (in the above examples the second recommender) without the need to modify it. The extra functionality is added by augmenting the processed data. This hybrid RS class was used in 7 (9.2%) studies.

3.5.6 Meta level

Meta levels are also an example of order-sensitive hybrid RSs that use an entire model produced by the first technique as input for the second technique. It is typical to use content-based recommenders to build item representation models, and then employ this models in collaborative recommenders to match the items with user profiles. A meta level recommendation strategy was implemented by Fab [10], one of the first website recommenders. Fab uses a selection agent which based on term vector model accumulate user-specific feedback about areas of interest for each user. There are also two collection agents: search agents which perform a search for websites, and index agents which construct queries for already found websites to avoid duplicate work. Collection agents utilize the models of the users (collaborative component) to collect the most relevant websites which are then recommended to the users.

Also presents a meta level recommender used in the domain of music which integrates CF with CBF. Here each user is stochastically matched with a music genre based on the collaborative output. Then the system generates a musical piece for the user based on the acoustic features. For the integration they adopt a probabilistic generative model called three-way aspect model. As this model is only used for textual analysis and indexing (bag-of-words representation) they propose the bag-of-timbres model, an interesting approach to content-based music recommendations which represents each musical piece as a set of polyphonic timbres. The advantage this hybridization class presents is that the learned model of the first technique is compressed and thus better used from the second. However, the integration effort is considerable and use of advanced constructs is often required. This hybrid RS class was found in 7 (9.2%) studies.

3.5.7 Mixed

Mixed hybrids represent the simplest form of hybridization and are reasonable when it is possible to put together a high number of different recommenders simultaneously. Here the generated item lists of each technique are added to produce a final list of recommended items. One of the first examples of mixed hybrids was PTV system [29] which used CBF to relate similar programs to the user profile and CF to relate similar user profiles together. The CBF module converts each user profile in a feature-based representation they call profile schema which is basically a TV program content summary represented in features. The CF module computes the similarity of two users utilizing a graded difference metric of the ranked TV programs in each user’s profile. At the end, a selection of programs recommended by the two modules is suggested.

Yet another example of recommending TV programs is a CF-CBF mixed hybrid named queveo.tv described in . Here the authors use demographic information such as age, gender and profession together with user’s history to build his/her profile which is used by the CBF module. This module makes use of Vector Space Model and cosine correlation to provide the recommended TV programs. The CF module uses both user-based CF to generate the top neighbors of the active user, and item-based CF to predict the level of interest of the user for a certain item. At the end the system takes recommendations from the two modules to generates the final list of TV programs. Those TV programs that were part of both listings (CBF and CF) are highlighted as Star Recommendations, as they are probably the most interesting for the user. Mixed hybrid RSs are simple and can eliminate acute problems like cold-start (new user or new item). They were found in 3 (3.9%) studies only.

3.6 RQ5: Application domains

A rich collection of 18 application domains was identified. Figure 7 presents the percentage of studies for each application domain. We see that most of the studies (21 or 27.6%) are domain independent. They haven’t been applied to a particular domain. Movie domain was considered by 17 (22.3%) studies. Next comes education or e-learning considered by 9 (11.8%) studies. Six (7.8%) studies were applied in the domain of music. There were also web service RSs implemented in 5 (6.5%) studies. Other domains are images, touristic sites, TV programs, web pages and microposts which appeared in 2 (2.6%) studies each. Domains like business, food, news, bibliography, etc. categorized as “Other” count for less than 10.5% of the total number of studies.

3.7 RQ6: Evaluation

Another important aspect of hybrid RSs that we examined is the evaluation process. In this section we present results about the evaluation methodologies and the corresponding involved metrics (answering RQ6a), evaluated RS characteristics and the utilized metrics for each (answering RQ6b) and finally the public datasets used to train and test the algorithms (answering RQ6c).

3.7.1 RQ6a: Evaluation methodologies

Table 9
Evaluation methodology

Methodology	Studies
Comparison with similar method	58
User survey	14
Comparison and user survey	3
No evaluation	1

Figure 7.

Distribution of studies according to the application domains.

Here we try to explain how (with what methodologies) the evaluation process is performed and what metrics are involved in each methodology. Table 9 lists the distribution of studies according to the methodology they use to perform the evaluation. There are 58 (more than three-quarters) studies comparing the proposed system (or solution) with a similar well known method or technique. Usually CF-X or CF-CBF hybrid RSs are compared with pure CF or CBF. In some cases the proposed system is compared with different parameter configurations of itself. Accuracy or error measures like MAE (Mean Average Error) or RMSE (Root Mean Square Error) are very common. They estimate the divergence of the RS predictions from the actual ratings. Decision support metrics like Precision, Recall and F1 are also very frequent. Precision is the percentage of selected items that are relevant. Recall is the percentage of relevant items that are recommended. F1 is the harmonic mean of the two. User surveys are the other evaluation methodology utilized in 14 studies. They mainly perform subjective quality assessment of the RS and require the involvement of users who provide feedback for their perception about the system. Surveys are usually question based and reflect the opinion of users about different aspects of the hybrid recommender. An example of user surveys is where the participants were 30 high school students. In the users of the survey are customers of a web retail store who rated products they purchased. In a mix of real and simulated users are used to rate movies, books, etc. In total user surveys were conducted in 14 studies.

Both comparisons and surveys are used in 3 studies: where the participants were 17 males along with 15 females and different versions of the system were compared with each-other, where the system was compared with CF using Movielens and the survey involved 132 participants, and where online user profiles were utilized for the survey, and the proposed fuzzy hybrid book RS was compared with traditional CF. The only study with no evaluation at all was . Here the authors present a personalized hybrid recommendation framework which integrates trust-based filtering with multi-criteria CF. This framework is specifically designed for various Government-to-Business e-service recommendations. The authors leave the evaluation of their framework as a future work.

Table 10

Evaluated characteristics

Recommendation characteristic	Studies
Accuracy	62
User satisfaction	10
Diversity	7
Computational complexity	6
Novelty-Serendipity	4

3.7.2 RQ6b: Characteristics and metrics

In order to address RQ6b we analyzed the recommendation characteristics the authors evaluate, and what metrics they utilize. Five characteristics were identified, listed in Table 10. The top characteristic is accuracy measured in 62 studies. It is followed by user satisfaction, a subjective characteristic assessed in 10 studies. Diversity is about having different list of recommended items each time the user interacts with the system. In total it was measured in 7 studies. Computational complexity of the RS is measured in 6 studies. Novelty and serendipity express the capability of the hybrid RS to recommend new or even unexpected but still relevant items to the user. They were measured in 4 studies. We also observed the metrics that authors use for each evaluated characteristic, summarized in Table 11. Accuracy is mostly measured by means of precision (31 studies), recall (23) and F1 (14). MAE and RMSE were found in 27 and 6 studies correspondingly. Other less frequent metrics used to evaluate accuracy include MSE (Mean Squared Error), nDCG (normalized Discounted Cumulative Gain), AUC, etc. They were found in 15 studies. As previously mentioned user satisfaction is measured by means of user surveys which were found in 10 studies. They usually consist of polls which aim to get the opinion of the users about different recommendation aspects of the system. Diversity is measured mostly by coverage which was found in 4 studies. In the other cases it is measured using ranking distances (3 studies). Execution time is the time it takes for the system to provide the recommendations and is a measure of the computational complexity. It was found in 6 studies. Novelty and Serendipity are measured by less known metrics such as Surprisal, Coverage in Long-Tail or Expected Popularity Complement.

Table 11
Evaluated characteristics and involved metrics

Characteristic	Metrics	Studies
Accuracy	Precision	31
	MAE	27
	Recall	23
	F1	14
	RMSE	6
	Other	15
User satisfaction	Qualitative Subjective Assessment	10
Diversity	Coverage	4
	Ranking distances	3
Complexity	Execution time	6
Novelty-Serendipity	Surprisal	2
	Coverage in Long-Tail	1
	Expected Popularity Complement	1

3.7.3 RQ6c: Datasets

We also kept track of the public datasets used by the authors to evaluate their hybrid RSs. These datasets are used by the scientific community to replicate experiments and validate or improve their techniques. There are 55 studies that use at least one public dataset. Sometimes a study uses more than one dataset. On the other hand 21 studies do not use any dataset. Sometimes they use synthetic data or rely on user surveys or other techniques. In Fig. 8 we present the datasets that were used and the number of studies in which they appear.

Figure 8.

Distribution of studies according to the datasets they use for evaluation.

MovieLens5 used in 26 studies, is one of the most popular public datasets used in the field of RSs. It was collected and made available by GroupLens6 which is still maintaining it. EachMovie is also a movie dataset used in 6 studies. Even though it is now retired, it was the original basis for MovieLens and has been extensively used by the RS community. FilmTrust is a movie dataset and a recommendation website that uses the concept of trust to recommend movies. It is smaller in size compared to the other movie datasets but it has the advantage of being more recent in content. FilmTrust was used in 5 studies. Yahoo-Movie is a dataset containing a subset of Yahoo Movie community preferences for movies. It also contains descriptive information about many movies released prior to November 2003. Yahoo-Movie was used in 3 studies. Last.fm7 is a music dataset crawled by last.fm website. It contains information about some of the users’ attributes, their track preferences and the artists. Last.fm was used in 3 studies. Tripadvisor is a dataset consisting of hotel and site reviews crawled by tripadvisor website. It is especially used to provide touristic recommendations to mobile users. Tripadvisor was used in 2 studies. Delicious8 is a dataset containing website bookmarks and tags of the form (user, tag, bookmark) shared by many users within the network. Delicious dataset was used in 2 studies. Other less popular datasets containing different type of recommendable items were found in 16 studies.

Table 12

Future work suggestions

Future work	Studies
Extend the proposed solution	14
Perform better evaluation	11
Other	9
Add context to recommendations	8
Consider other application domains	7
Use more data or item features	7
Experiment with more or different algorithms	6
Try other hybrid recommendation class	5

3.8 RQ7: Future work

The last research question has to do with future work opportunities and directions. Our findings are summarized in Table 12 and shortly explained below:

Extend the proposed solution It is a common suggestion stated by many authors. They often identify and suggest several additional parts or components which could be aggregated to the system to improve the performance, extend the functionalities, etc. It is suggested in 14 (18.4%) studies. Perform better evaluation It is difficult to evaluate recommender systems. The hard part is to find the most appropriate techniques or algorithms that can be used as benchmark. Performing a good evaluation of the proposed system increases its value and credibility. This suggestion appears in 11 (14.4%) studies. Add context to recommendations The authors suggest to make more use of contextual (location, time of day, etc.) data which are revealed by mobile users. It appears in 8 (10.5%) studies. Consider other application domains Some of the studies apply their contributions in a certain domain. Different authors target alternative domains or propose domain independent contributions. Considering other domains was suggested in 7 (9.2%) studies. Use more data or item features Some authors plan to use more data for training their algorithms or plan to extract and use more features of the recommended items. This has been stated in 7 (9.2%) studies. Experiment with more or different algorithms Some authors suggest to combine different recommendation or data mining algorithms and see the results they can obtain. Sometimes they suggest to use alternative similarity measures also. This has been suggested in 6 (7.9%) studies. Try other hybridization class Although it is not always possible, combining the applied techniques in another way could bring better results. Trying another hybridization class appeared in 5 (6.5%) studies. Other Other future work suggestions include applying hybrid RSs in less frequent domains or contexts, making more personalized recommendations, reducing the computational cost of the solution, improving other recommendation quality criteria (besides accuracy) like diversity or serendipity, etc.

4. Discussion

The main issues covered in this work are presented in the schematic model of Fig. 9. The issues are associated with the research question they belong to. In this section we discuss the obtained results for each research question.

Figure 9.

RQs and higher-order themes.

4.1 Selected studies

The quality evaluation results of the selected studies are presented in Figs 3 and 4. These results indicate that journal studies have lower spread and slightly higher quality score than conference studies. The authors in [30], a systematic review work about linked data-based recommender systems, report similar results. Regarding the publication year of the selected studies, we see in Fig. 2 a steady increase in hybrid RS publications. More than 76% of the included papers were published in the second half (from 2010 later on) of the 10 years time period. This high number of recent publications suggest that hybrid RSs are still a hot topic. As mentioned in introduction, similar increased academic interest in RSs is also reported by other surveys like [14] or [15]. Some factors that have boosted the publications and development of RSs are probably the Netflix Prize9 (2006–2009) and the boom of social networks.

4.2 Problems and challanges

Cold-start was the most acute problem that was found. CF RSs are the most affected by cold-start as they generate recommendations relying on ratings only. Hybrid RSs try to overcome the lack of ratings by combining CF or other recommendation techniques with association rule mining or other mathematical constructs which extract and use features from items. Data sparsity is also a very frequent problem in the field of RSs. It represents a recommendation quality degradation due to the insufficient number of ratings. Hybrid approaches try to solve it by combining several matrix manipulation techniques with the basic recommendation strategies. They also try to make more use of item features, item reviews, user demographic data or other known user characteristics.

Accuracy has been the top desired characteristic of RSs since their dawn, as it directly influences user satisfaction. Improving recommendation accuracy is a problem that is mostly addressed by using parallel (i.e. in a weighted or switching hybrid classes) recommendation techniques. Scalability is also an important problem which is frequently found in association with data sparsity (appear together in 9 studies). Lack of diversity is a problem that has been addressed in few studies. As explained in [31] diversity is frequently in contradiction with accuracy. Authors usually attain higher diversity by tolerable relaxations in accuracy. In general we see that hybrid RSs try to solve the most acute problems that RSs face. In Table 13 we summarize some typical solutions about each problem with examples from papers discussed in Sections 3.2–3.5.

Table 13
Problems and possible solutions

Problems	Possible solutions	References
Cold-Start	Use association rule mining on item or user data to find relations which can compensate the lack of ratings. Mathematical constructs for feature extraction and combination of different strategies can also be used.	, , ,,
Sparsity	Use the few existing ratings or certain item features to generate extra pseudo ratings. Experiment with Matrix Factorization or Dimensionality Reduction.	, , ,
Accuracy	Use Fuzzy Logic or Fuzzy Clustering in association with CF. Try putting together CF with CBF using Probabilistic Models, Bayesian Networks or other mathematical constructs.	, , ,, ,
Scalability	Try to compress or reduce the datasets with Clustering or different measures of similarity.	, ,
Diversity	Try modifying neighborhood creation by relaxing similarity (possible loss in accuracy) or use the concept of experts for certain item tastes.	, ,

4.3 Techniques and combinations

As shown in Table 7, K-NN is the most popular DM technique among hybrid RSs. This result highlights the fact that K-NN CF is one of the most successful and widespread RSs. Clustering techniques are also commonly used. There are different types of clustering algorithms with K-means being the most popular. Clustering as a process is mostly involved in preliminary phases to identify similar users, similar items, similar item features, etc. Association rules are also used to identify frequent relations between users and items. Fuzzy logic and matrix manipulation methods are also incorporated in hybrid RSs. In most of the cases authors combine 2 recommendation strategies. In few cases event 3 are involved. CF-CBF is the most popular combination, commonly associated with recurrent problems like data sparsity, cold-start and accuracy. CF-CBF-X is also common. Here CF and CBF are combined together and reinforced by a third technique.

In CF-X combinations, X is usually integrated in CF to improve its performance and usually represents fuzzy logic (reclusive methods are complementary to collaborative methods) or clustering. IICF-UUCF is also popular as it represents the combination of two basic version of CF. In conclusion, as can be inferred from Table 8, the most common recommendation techniques (with CF been the most popular) are combined to solve the typical problems which are cold-start, data sparsity and accuracy. Actually it is not a surprise that CF combines with almost any other recommendation technique. Other surveys report similar results. In [32] the authors present a broad survey about CF techniques. They also conclude that most of hybrid CF recommenders use CF methods in combination with content-based methods (CF-CBF is also the most frequent combination we found) or other methods to fix problems of either recommendation technique and to improve recommendation performance. CBF-X addresses problems like data sparsity, accuracy and scalability.

Other combinations put together techniques like Bayesian methods, demographic filtering, neural networks, regression, association rules mining or genetic algorithms. It is important to note that in some cases hybrid RSs are not built by combining different recommendation techniques. In those cases they represent combinations of different data sources, item or user representations, etc. embedded in a single RS. For this reason the number of the reported combinations is smaller than the number of total primary studies we analyzed.

4.4 Hybridization classes

Regarding the hybrid classes, weighted hybrid is the most popular. It often combines CF and CBF recommendations in a dynamic way (weights change over time). Feature combination is the second, putting together data from two or more sources. Cascade, switching, feature augmentation and meta-level have almost equal frequency of appearance whereas mixed hybrid is the least common class. There is also a last category we denoted as “Other” which includes 13.2% of the studies. It was not possible for us to identify a hybridization class of this recommenders based on Burke’s taxonomy (which might also need to be extended). In some studies hybrid RSs are not combinations of two or more recommendation strategies in a certain way. They put together different data sources and item or user representations in a single strategy. In this sense, the “Other” category means “we don’t know”.

Various mathematical constructs are used as “gluing” methods between the different components of the systems based on the hybridization class. Weighted, Mixed, Switching and Feature Combination are order-insensitive; there is no difference between a switching CF-CBF and a switching CBF-CF. In this sense these 4 classes are easier to concatenate compared to Cascade, Feature Augmentation and meta-level which are inherently ordered. The few mixed systems do not need the “glue” at all as their components generate recommendations independently from each other. Our results indicate that Weighted hybrids usually rely on weighted linear functions with static or dynamic weights which are updated based on the user feedback. Switching hybrids usually rely on distance/similarity measures such as Euclidean distance, Pearson correlation, Cosine similarity, etc. to decide which of the components to activate in a certain time. Feature combinations usually involve fuzzy logic to match the features obtained by one module with those of the other module. Feature augmentation, Cascade and Meta-level hybrids rely on even more complex and advanced mathematical frameworks such as probabilistic modeling, Bayesian networks, etc.

4.5 Application domains

A rich set of application domains was found as shown in Fig. 7. Many of the studies are domain independent (more than a quarter). They are not limited to any particular domain and the methods or algorithms they present can be applied in different domains with minor or no changes at all. Movies are obviously the most recommended items. It is somehow because of the large amount of public and freely accessible user feedback about movie preferences (i.e. many public movie datasets on the web10) which are highly helpful. There is also a rich set of algorithms and solutions (Netflix $1M challenge was a big motivation to improve movie recommenders). This allows researchers to train and test their recommendation algorithms easily. Education or e-learning is another domain in which hybrid RSs are gaining popularity. The amount of educational material on the web has been increasing dramatically in the last years and MOOCs (Massive Open Online Course) are becoming very popular. Other somehow popular domains are music and web services. More detailed information about the application domains of recommender systems can be found at [33] where the authors illustrate each application domain category with real RS applications found in the web.

4.6 Evaluation

Evaluation of Recommender Systems is an essential phase which helps in choosing the right algorithm in a certain context and for a certain problem. However, as explained in [34], evaluating recommender systems is not an easy task. Certain algorithms may perform better or worse in different datasets and it is not easy to decide what metrics to combine when performing comparative evaluations. With the three research questions about evaluation, we addressed different aspects of this delicate process. Based on our results most of the studies evaluate hybrid RSs by comparing them with similar methods. The experiments which are usually offline utilize accuracy or error metrics like MAE or RMSE and information retrieval metrics like precision, recall and F1. Similar results are reported in [35] where offline evaluations that typically measure accuracy are dominant. User surveys are less popular, using subjective quality assessments and occasionally precision or recall. These kind of experiments are mostly online (i.e. users interacting with the system and answering questions) and offer more direct and credible evaluation conclusions. From the results, we see that researchers find it easier to compare their system with other systems using public data rather than to perform massive user surveys for a more subjective and qualitative evaluation.

Regarding RS characteristics, accuracy results to be the most commonly evaluated characteristic of the hybrid RSs. This is partly because it is easy to represent and compute it by means of various measures that exist. The most frequent metrics used to evaluate accuracy are Precision, Recall and MAE. User satisfaction (subjective recommendation quality) comes second. It is evaluated by means of user surveys. There is a lot of discussion in the literature about recommendation diversity. In [36] the authors conclude that the user’s overall liking of recommendations goes beyond accuracy and involves other factors like diversity. On the other hand, in [31] the authors agree that increasing diversity in recommendations comes with a cost in accuracy. Our results show that diversity is still less frequently evaluated. Actually most of the studies that try to provide diversity do it by conceding accuracy. In [23] the authors explore the use of serendipity and coverage as both characteristics and quality measures of RSs. They suggest that serendipity and coverage are designed to account for the quality and usefulness of the recommendations better than accuracy does. In our results serendipity is rarely evaluated.

It is important to note that the difference between recommendation characteristics and evaluation metrics is sometimes subtle. This is the case for coverage. Is coverage a recommendation characteristic, a recommendation metric or both? In some works like [34, 23] coverage is considered as both a characteristic and metric. As a characteristic it reflects the usefulness of the system. The higher the coverage (more items predicted and recommended) the more useful the recommender system for the users. In other works like [37] it is only considered as a metric with which the authors evaluate diversity, another recommendation characteristic. In the studies we considered for this review coverage is both considered as a metric for estimating the diversity and as a recommendation characteristic of the systems. Few studies we analyzed evaluate the computational complexity of the systems they propose by measuring the execution time. Besides the new trends, the results indicate that accuracy is still the most frequently evaluated characteristic.

We also considered the public datasets used to perform the evaluation. With the exponential growth of the web content there are more and more public data and datasets which can be used to train and test new algorithms. These datasets usually come from highly visited web portals or services and represent user preferences about things like movies, music, news, books, etc. In [38] we present the characteristics of some of the most popular public datasets and the types of RSs they can be used for. It is convenient to exploit them for evaluating novel algorithms or recommendation techniques in offline experiments. The evaluation process steps are clearly explained in [39]. The result of this review indicate that movie datasets led by Movielens are very popular being used in more than 72% of the studies. This is somehow related with the fact that movie domain is also highly preferred. Many authors chose to experiment in the domain of movies to easily evaluate their prototypes. Music, web services, tourism, images datasets, etc. make up the rest of the datasets the studies use.

4.7 Future work

With RQ7 we tried to uncover the most important future work directions in hybrid recommender systems. Extending or improving the proposed solution is the most common future work the authors intend to undertake. Extension of the proposed solutions comes in diverse forms like (i) extend by applying more algorithms, (ii) extend the personalization level by adapting more to the user context and profile, (iii) extend by using more datasets or item features, etc. Performing a comprehensive evaluation is something in which many studies fail. This is why some authors present it as a future work. It usually happens in the cases when the authors implement their algorithm or method in a prototype. In these cases comparison with similar methods using accuracy metrics does not provide clear insights about recommendation or system quality. Reinforcing with subjective user feedback may be the best way to optimize evaluation of the system, making it more user oriented.

A highly desired characteristic from RSs is adapting to the user interest shifting or evolving over time, especially as a results of rapid context changes. As a result, different authors suggest to add context to their systems or to analyze different criteria of items or users as ways to improve the recommendation quality. Context-Aware Recommender Systems (CARS) and Multi-Criteria Recommender Systems (MCRS) are relatively new approaches which are gaining popularity in the field of RSs [40]. They are promoted by the increased use of mobile devices which reveal user details (i.e. the location) that can be used as important contextual inputs. Combining context and multiple criteria with other hybrid recommendation techniques could be a good direction in which to experiment.

Considering other application domains in which hybrid RSs could be applied is also stated by some authors. Many of the works were domain independent and can be easily adapted to different recommendation domains. One step further could be to have hybrid RSs recommend items from different (changing) domains and implement the so called cross domain recommender systems. Having found the best movie for the weekend, the user may also want to find the corresponding soundtrack or the book in which the movie may be based on. Cross-domain RSs are an emerging research topic [41, 42]. Different recommendation strategies like CF and CBF could be specialized in different domains of interest and then joined together in a weighted, switching, mixed or other hybrid cross-domain RS which would recommend different items to its users.

Combining more data from different sources or with various item features was a way to create hybrid RSs. Using more data is a common trend not only in recommender systems but in similar disciplines as well. However, having and using big volumes of data requires scaling in computations. One way to achieve this high scalability is by parallelizing the algorithms following MapReduce model which could be a future direction as suggested in [43]. Experimenting with other hybrid recommendation classes is also possible in many cases. The results indicate that some hybrid classes are rarely explored (i.e. mixed hybrid appears in 3 studies only). It could be a good idea to experiment building CF-CBF, CF-CBF, CF-KBF or other types of mixed hybrids and observe what characteristics this systems could provide. Other future work suggestions include increasing personalization and reducing the computational cost of the system.

5. Conclusions

In this review work we analyzed 76 primary studies from journals and conference proceedings which address hybrid RSs. We tried to identify the most acute problems they solve to provide better recommendations. We also analyzed the data mining and machine learning techniques they use, the recommendation strategies they combine, hybridization classes they belong to, application domains and dataset, evaluation process, and possible future work directions.

With regard to the research problems cold-start, data sparsity and accuracy are the most recurrent problems for which hybrid approaches are explored. The authors typically use association rules mining in combination with traditional recommendation strategies to find user-item relations and compensate the lack of ratings in cold-start situations. We also found that matrix factorization techniques help to compress the existing sparse ratings and attain acceptable accuracy. It was also typical to find studies in which collaborative filtering was combined with other techniques such as fuzzy logic attempting to alleviate cold-start or data sparsity and at the same time provide good recommendation accuracy.

We also presented a classification of the included studies based on the different DM/ML techniques they utilize to build the systems and their recommendation technique combinations. K-NN classifier which is commonly used to construct the neighborhood in collaborative RSs, was the most popular among the data mining technique. On the other hand, CF was the most commonly used recommendation strategy, frequently combined with each of the other strategies attempting to solve any kind of problem.

We identified and classified the different hybridization approaches relying in the taxonomy proposed by Burke and found that the weighted hybrid is the most recurrent, obviously because of the simplicity and dynamicity it offers. Other hybridization classes such as meta level or feature augmentation are rare as they need complicated mathematical constructs to aggregate the results of the different recommenders they combine.

Concerning evaluation, accuracy is still considered the most important characteristic. The authors predominantly use comparisons with similar methods and involve error or prediction metrics in the evaluation process. This evaluation methodology is “hermetic” and often not credible. User satisfaction is commonly evaluated with subjective data feedback from surveys which are user oriented, more credible and thus highly suggested. Additionally, computational complexity was found in few cases. We also investigated what public datasets are typically used to perform evaluation of the hybrid systems. Based on our findings movie datasets led by Movielens are the most popular, facilitating the evaluation process. Moreover movie domain was the most preferred for prototyping, among the numerous that were identified.

More than three-quarters of our included studies were published in the last five years. This high and growing number of recent publications in the field lets us believe that hybrid RSs are a hot and interesting topic. Our findings indicate that future works could be focused in context awareness of recommendations and models with which to formalize and aggregate severals contextual factors inside a hybrid recommender. Such RSs could be able to respond to quick shifts of user interest with high accuracy.

We also found that there are many combinations of recommendation techniques or hybridization classes which are not explored. Thus they represent a good basis for future experimentations in the field. Using more data was another possible work direction we found. In the epoch of big data, processing more or larger dataset (as even more become available) with hybrid parallel algorithms could be a good way to alleviate the problem of scalability and also provide better recommendation quality. Other future work direction could be using hybrid RSs to build cross domain recommenders or improve the computation complexity of the existing techniques.

Footnotes

http://www.businessdictionary.com/definition/information-overload.html.

https://recsys.acm.org/.

http://www.qsrinternational.com/products.aspx.

http://www.tripadvisor.co.uk.

http://grouplens.org/node/73.

http://grouplens.org.

http://ocelma.net/MusicRecommendationDataset/lastfm-360K.html.

http://disi.unitn.it/ knowdive/dataset/delicious/.

http://www.netflixprize.com/.

https://gist.github.com/entaroadun/1653794.

Acknowledgments

This work was supported by a fellowship from TIM.

Appendix

References

Ricci

Rokach

and Shapira

, Recommender Systems Handbook, Springer US, Boston, MA, 2011, Ch. Introduction to Recommender Systems Handbook, pp. 1–35. doi: 10.1007/978-0-387-85820-3_1.

Ekstrand

M.D.

Riedl

J.T.

and Konstan

J.A.

, Collaborative filtering recommender systems, Found, Trends Hum.-Comput. Interact. 4(2) (2011), 81–173. doi: 10.1561/1100000009.

Goldberg

Nichols

Oki

B.M.

and Terry

, Using collaborative filtering to weave an information tapestry, Commun, ACM 35(12) (1992), 61–70. doi: 10.1145/138859.138867.

Burke

, Knowledge-based recommender systems, in: Kent

, Ed., Encyclopedia of Library and Information Science, Vol. 69, CRC Press, 2000, pp. 181–201.

Felfernig

and Burke

, Constraint-based recommender systems: Technologies and research issues, in: Proceedings of the 10th International Conference on Electronic Commerce, ICEC ’08, ACM, New York, NY, USA, 2008, pp. 3:1–3:10. doi: 10.1145/1409540.1409544.

Resnick

Iacovou

Suchak

Bergstrom

and Riedl

, Grouplens: An open architecture for collaborative filtering of netnews, in: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, CSCW ’94, ACM, New York, NY, USA, 1994, pp. 175–186. doi: 10.1145/192844.192905.

Hill

Stead

Rosenstein

and Furnas

, Recommending and evaluating choices in a virtual community of use, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’95, ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 1995, pp. 194–201. doi: 10.1145/223904.223929.

Lang

, NewsWeeder: learning to filter netnews, in: Proceedings of the 12th International Conference on Machine Learning, Morgan Kaufmann Publishers Inc.: San Mateo, CA, USA, 1995, pp. 331–339. URL http://citeseer.ist.psu.edu/lang95newsweeder.html.

Krulwich

and Burkey

, Learning user information interests through extraction of semantically significant phrases, in: Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, AAAI Press Menlo Park, 1996, pp. 100–112.

10.

Balabanović

and Shoham

, Fab: Content-based, collaborative recommendation, Commun, ACM 40(3) (1997), 66–72. doi: 10.1145/245108.245124.

11.

Sarwar

B.M.

Konstan

J.A.

Borchers

Herlocker

Miller

and Riedl

, Using filtering agents to improve prediction quality in the grouplens research collaborative filtering system, in: Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work, CSCW ’98, ACM, New York, NY, USA, 1998, pp. 345–354. doi: 10.1145/289444.289509.

12.

Burke

, Hybrid recommender systems: Survey and experiments, User Modeling and User-Adapted Interaction 12(4) (2002), 331–370. doi: 10.1023/A:1021240730564.

13.

Good

Schafer

J.B.

Konstan

J.A.

Borchers

Sarwar

Herlocker

and Riedl

, Combining collaborative filtering with personal agents for better recommendations, in: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, AAAI ’99/IAAI ’99, American Association for Artificial Intelligence, Menlo Park, CA, USA, 1999, pp. 439–446. URL http://dl.acm.org/citation.cfm?id=315149.315352.

14.

Bobadilla

Ortega

Hernando

and GutiéRrez

, Recommender systems survey, Know.-Based Syst. 46 (2013), 109–132. doi: 10.1016/j.knosys.2013.03.012.

15.

Park

D.H.

Kim

H.K.

Choi

I.Y.

and Kim

J.K.

, A literature review and classification of recommender systems research, Expert Syst. Appl. 39(11) (2012), 10059–10072. doi: 10.1016/j.eswa.2012.02.038.

16.

Kitchenham

, Procedures for performing systematic reviews, Keele, UK, Keele University 33(2004) (2004), 1–26.

17.

Kitchenham

and Charters

, Guidelines for performing systematic literature reviews in software engineering, EBSE Technical Report, EBSE 2007-001, Keele University and Durham University Joint Report, 2007.

18.

Jannach

Zanker

and Gröning

, E-Commerce and Web Technologies: 13th International Conference, EC-Web 2012, Vienna, Austria, September 4–5, 2012. Proceedings, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, Ch. Recommender Systems in Computer Science and Information Systems – A Landscape of Research, pp. 76–87. doi: 10.1007/978-3-642-32273-0_7.

19.

Cruzes

and Dyba

, Recommended steps for thematic synthesis in software engineering, in: Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on, 2011, pp. 275–284. doi: 10.1109/ESEM.2011.36.

20.

Lika

Kolomvatsos

and Hadjiefthymiades

, Facing the cold start problem in recommender systems, Expert Systems with Applications 41(4, Part 2) (2014), 2065–2073. doi: https://dx-doi-org.web.bisu.edu.cn/10.1016/j.eswa.2013.09.005.

21.

Zhang

Z.-K.

Liu

Zhang

Y.-C.

and Zhou

, Solving the cold-start problem in recommender systems with social tags, EPL (Europhysics Letters) 92(2) (2010), 28002. doi: 10.1016/j.eswa.2012.03.025.

22.

Yager

R.R.

, On ordered weighted averaging aggregation operators in multicriteria decisionmaking, IEEE Trans. Syst. Man Cybern. 18(1) (1988), 183–190. doi: 10.1109/21.87068.

23.

Delgado-Battenfeld

and Jannach

, Beyond accuracy: Evaluating recommender systems by coverage and serendipity, in: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, ACM, New York, NY, USA, 2010, pp. 257–260. doi: 10.1145/1864708.1864761.

24.

Amatriain

Jaimes

Oliver

and Pujol

J.M.

, Recommender Systems Handbook, Springer US, Boston, MA, 2011, Ch. Data Mining Methods for Recommender Systems, pp. 39–71. doi: 10.1007/978-0-387-85820-3_2.

25.

Yager

R.R.

, Fuzzy logic methods in recommender systems, Fuzzy Sets Syst. 136(2) (2003), 133–149. doi: 10.1016/S0165-0114(02)00223-3.

26.

de Campos

L.M.

Fernndez-Luna

J.M.

Huete

J.F.

and Rueda-Morales

M.A.

, Combining content-based and collaborative recommendations: A hybrid approach based on bayesian networks, International Journal of Approximate Reasoning 51(7) (2010), 785–799. doi: https://dx-doi-org.web.bisu.edu.cn/10.1016/j.ijar.2010.04.001.

27.

Billsus

Pazzani

M.J.

and Chen

, A learning agent for wireless news access, in: Proceedings of the 5th International Conference on Intelligent User Interfaces, IUI ’00, ACM, New York, NY, USA, 2000, pp. 33–36. doi: 10.1145/325737.325768.

28.

Mooney

R.J.

and Roy

, Content-based book recommending using learning for text categorization, in: Proceedings of the Fifth ACM Conference on Digital Libraries, DL ’00, ACM, New York, NY, USA, 2000, pp. 195–204. doi: 10.1145/336597.336662.

29.

Smyth

and Cotter

, A personalised {TV} listings service for the digital {TV} age, Knowledge-Based Systems 13(2–3) (2000), 53–59. doi: https://dx-doi-org.web.bisu.edu.cn/10.1016/S0950-7051(00)00046-0.

30.

Figueroa

Vagliano

Rocha

O.R.

and Morisio

, A systematic literature review of linked data-based recommender systems, Concurrency and Computation: Practice and Experience 27(17) (2015), 4659–4684.

31.

Zhou

Kuscsik

Liu

J.-G.

Medo

Wakeling

J.R.

and Zhang

Y.-C.

, Solving the apparent diversity-accuracy dilemma of recommender systems, Proceedings of the National Academy of Science 107 (2010), 4511–4515. doi: 10.1073/pnas.1000488107.

32.

and Khoshgoftaar

T.M.

, A survey of collaborative filtering techniques, Adv. in Artif. Intell. 2009 (2009), 4:2–4:2. doi: 10.1155/2009/421425.

33.

Rao

K.N.

, Application domain and functional classification of recommender systems-a survey, DESIDOC Journal of Library & Information Technology 28(3). doi: https://dx-doi-org.web.bisu.edu.cn/10.14429/djlit.28.3.174.

34.

Herlocker

J.L.

Konstan

J.A.

Terveen

L.G.

and Riedl

J.T.

, Evaluating collaborative filtering recommender systems, ACM Trans. Inf. Syst. 22(1) (2004), 5–53. doi: 10.1145/963770.963772.

35.

Beel

Gipp

Langer

and Breitinger

, Research paper recommender systems: A literature survey, International Journal on Digital Libraries (2015), 1–34. doi: 10.1007/s00799-015-0156-0.

36.

Ziegler

C.-N.

McNee

S.M.

Konstan

J.A.

and Lausen

, Improving recommendation lists through topic diversification, in: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, ACM, New York, NY, USA, 2005, pp. 22–32. doi: 10.1145/1060745.1060754.

37.

Adomavicius

and Kwon

, Improving aggregate recommendation diversity using ranking-based techniques, Knowledge and Data Engineering, IEEE Transactions on 24(5) (2012), 896–911. doi: 10.1109/TKDE.2011.15.

38.

Çano

and Morisio

, Characterization of public datasets for recommender systems, in: Research and Technologies for Society and Industry Leveraging a Better Tomorrow (RTSI), 2015 IEEE 1st International Forum on, 2015, pp. 249–257. doi: 10.1109/RTSI.2015.7325106.

39.

Shani

and Gunawardana

, Evaluating recommender systems, Tech. Rep. MSR-TR-2009-159, Microsoft Research (November 2009). doi: 10.1007/978-0-387-85820-3_8.

40.

Adomavicius

Tuzhilin

Manouselis

and Kwon

, Recommender Systems Handbook, Springer US, Boston, MA, 2011, Ch. Context-Aware Recommender Systems and Multi-Criteria Recommender Systems, pp. 217–253 and 769–803. doi: 10.1007/978-0-387-85820-3_7.

41.

Cremonesi

Tripodi

and Turrin

, Cross-domain recommender systems, in: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops, ICDMW ’11, IEEE Computer Society, Washington, DC, USA, 2011, pp. 496–503. doi: 10.1109/ICDMW.2011.57.

42.

Fernández-Tobías

Cantador

Kaminskas

and Ricci

, Cross-domain recommender systems: A survey of the state of the art, Spanish Conference on Information Retrieval.

43.

Barragáns-Martínez

A.B.

Costa-Montenegro

Burguillo

J.C.

Rey-López

Mikic-Fonte

F.A.

and Peleteiro

, A hybrid content-based and item-based collaborative filtering approach to recommend tv programs enhanced with singular value decomposition, Information Sciences 180(22) (2010), 4290–4311. doi: https://dx-doi-org.web.bisu.edu.cn/10.1016/j.ins.2010.07.024.

Hybrid recommender systems: A systematic literature review

Abstract

Keywords

1. Introduction

2. Methodology

Table 1 Selected sources to search for primary studies

Table 4 Number of papers after each selection step

Table 6 Data extraction form

2.5 Synthesis

3. Results

3.1 RQ1: Included studies

Table 7 Distribution of studies by DM/ML techniques

3.4 RQ3b: Recommendation technique combinations

Table 8 Hybrid recommendation approaches distributed per problem

3.4.2 CF-CBF

3.4.3 CF-CBF-X

3.4.4 IICF-UUCF

3.4.5 CBF-X

3.4.6 Other

3.5 RQ4: Classes of hybridization

3.5.2 Feature combination

3.5.3 Cascade

3.5.4 Switching

3.5.5 Feature augmentation

3.5.6 Meta level

3.5.7 Mixed

3.6 RQ5: Application domains

3.7 RQ6: Evaluation

3.7.1 RQ6a: Evaluation methodologies

Table 9 Evaluation methodology

Table 11 Evaluated characteristics and involved metrics

4. Discussion

4.2 Problems and challanges

Table 13 Problems and possible solutions

4.4 Hybridization classes

4.5 Application domains

4.6 Evaluation

4.7 Future work

5. Conclusions

Footnotes

Acknowledgments

Appendix

References

Table 1
Selected sources to search for primary studies

Table 4
Number of papers after each selection step

Table 6
Data extraction form

Table 7
Distribution of studies by DM/ML techniques

Table 8
Hybrid recommendation approaches distributed per problem

Table 9
Evaluation methodology

Table 11
Evaluated characteristics and involved metrics

Table 13
Problems and possible solutions