Abstract
Recommender systems are pivotal components of modern Internet platforms and constitute a well-established research field. By now, research has resulted in highly sophisticated recommender algorithms whose further optimization often yields only marginal improvements. This paper goes beyond the commonly dominating focus on optimizing algorithms and instead follows the idea of enhancing recommender systems with reputation data. Since the concept of reputation-enhanced recommender systems has attracted considerable attention in recent years, the main aim of the paper is to provide a comprehensive survey of the approaches proposed so far. To this end, existing work is identified by means of a systematic literature review and classified according to seven carefully considered dimensions. In addition, the resulting structured analysis of the state of the art serves as a basis for the deduction and discussion of several future research directions.
Keywords
Introduction
The rise of the World Wide Web has made sharing and accessing various kinds of information easier and faster than ever before, resulting in considerable benefits for its users. However, this trend has also led to the phenomenon of information overload, which may overwhelm users in the course of their decision making processes [20]. To give a recent example, in June 2017 Spotify offered more than 30 million songs on their music streaming platform1
This paper concentrates on relevant concepts from the research field of trust and reputation systems, which show substantial connections to recommender systems (especially to collaborative filtering) [23]. Thus, there are several proposals on trust-enhanced recommender systems [50], which consider trust in the form of explicitly declared trust or friendship relationships (e.g. web of trust on Epinions2
Because of these drawbacks of explicit trust links, this paper specifically focuses on the enhancement of recommender systems with reputation data. Reputation is another kind of construct relevant when taking advice from others [7]. It is closely linked to trust [23] or even used to establish trust (“reputation-based trust” [8]). However, it fits the aforementioned peculiarities of modern online platforms better. Reputation values are calculated on a global scale instead of being limited to the trust links of one single user. On the one hand, this mitigates the problem of sparsely available personal trust links. On the other hand, reputation values are computationally less expensive because they are computed once for the entire community whereas trust values have to be determined from the perspective of every individual user [34]. Since the concept of reputation-enhanced recommender systems has attracted considerable attention in recent years, several combination approaches have been proposed. In this paper, we comprehensively identify the existing methods by means of a systematic literature review based on well-established guidelines and classify them according to seven carefully considered dimensions. Thus, the state of the art of reputation-enhanced recommender systems is revealed in an exhaustive manner. Moreover, we are able to point out several possible directions for future work in this research stream. In general, our results also provide an important basis for the further exchange of ideas between recommender and reputation systems researchers.3
Note that this article is an extended version of a paper presented at the 2017 IFIPTM conference [41]. It features an additional taxonomy dimension, the corresponding background information, a considerably extended discussion of results and future research directions, and several new clarifying figures and remarks.
The remainder of the paper is organized as follows. Section 2 introduces the main principles of recommender and reputation systems and relates them to each other according to their similarities and differences. Based on this, Section 3 discusses the process and the outcomes of a systematic literature review on reputation-enhanced recommender systems. This, in turn, leads to the formulation of future research directions in Section 4. Section 5 concludes the paper.
Modern Internet platforms, such as e-commerce marketplaces and social media websites, are omnipresent in today’s society. Recommender and reputation systems are pivotal decision support components of these platforms.
Recommender systems principles
As already mentioned, the main motivation for the use of recommender systems is the information overload problem [37]. To tackle this issue, recommender systems are supposed to provide users with only the most relevant information and only those items that are worth considering. This is done by predicting the ratings of the items a particular user has not rated yet and recommending those which receive the highest predicted ratings. This task has come to be known as the recommendation problem [3,21] and can be described in a more formal manner as follows. First, the users’ preference values are predicted according to the function
Over the years, many different ways of estimating the users’ preference values have been proposed. They are usually classified into different categories according to their general approach to recommendation generation, with collaborative, content-based, demographic-based, knowledge-based, and context-based being the basic and generally distinguished ones [4,13]. Figure 1 depicts the entities and relationships considered in the two original and still most important ones of these recommender system types: collaborative filtering and content-based filtering [3,9].

User-item relations with
Collaborative filtering [18,47] is based on the idea that people tend to agree with people they agreed with in the past and thus captures the typical human behavior of relying on the opinions of acquaintances with similar tastes. When employing the user-based nearest neighbor algorithm, as one particular form of collaborative filtering, the predicted ratings for each item are calculated by aggregating the ratings of the other users weighted by their similarities (in rating behaviors) to the user in focus. Ratings can take different forms such as
Content-based filtering [33] assumes that people will like items similar to the ones they liked in the past. As opposed to collaborative filtering, other users and their ratings do not have any influence on content-based filtering. It is solely based on the user’s own ratings and the similarities of items determined according to their features. In the example depicted in Fig. 1,
Reputation systems [23] are needed because users usually have no or only few direct experiences with other users on digital platforms. Thus, a user does not know whether to trust another user or not. Reputation systems can alleviate this issue by assisting the user in determining the trustworthiness of other users. Figure 2 depicts the entities and relationships involved in the calculation of users’ reputation values indicating their trustworthiness.

User-user relations with
After each encounter, users are able to rate the behavior of their counterpart. In e-commerce, for example, a customer can judge a seller’s behavior according to factors like on-time delivery, adequate product quality, and reasonable e-mail response times. Similar to recommender systems, common rating scales are

Component classes of reputation systems.

Common feedback model.
In the example depicted in Fig. 2, which uses
As can be inferred from the remarks in the preceding subsections, the main similarity of recommender and reputation systems is that both kinds of decision support systems are based on user experiences and feedback [23]. Moreover, the two kinds of systems are frequently applied in similar contexts. Besides e-commerce as the most important of the common application areas, other examples include online communities, service selection, and peer-to-peer networks. These fundamental similarities make combined considerations feasible and allow creating a common feedback model as depicted in Fig. 4. The model includes two sets of entities: users
Moreover, recommender and reputation systems differ in certain facets and assumptions, which makes combined considerations potentially meaningful [23]. Recommender systems emphasize the similarity of users regarding their subjective tastes whereas reputation systems are ideally applied to taste-independent aspects [24]. Therefore, the calculations of (collaborative filtering) recommender systems are typically based on the opinions of local communities consisting of the most similar users [3]. As opposed to this, the calculations of reputation systems are mostly done on a global basis because reputation is considered as a collective measure of trustworthiness [23]. For this purpose, reputation systems aggregate the individual trust values expressed by several entities to reflect the opinion of an entire group. Thus, the resulting reputation values are supposed to be more objective and the same from the perspectives of all entities whereas recommendation values are subjective and determined from the perspective of one particular entity.
State of the art
Based on the background information introduced in the previous section, we survey the state of the art of reputation-enhanced recommender systems. To this end, we conduct a systematic literature review following the well-recognized guidelines by Webster and Watson [54] and Levy and Ellis [28]. In particular, we act on the eight-step process by Okoli and Schabram [36], which specifies these guidelines in detail.
Literature review protocol
In order to fulfill the demand of vom Brocke et al. [51] that not only the findings of a literature review but also the process of searching and filtering the literature should be comprehensively described, the implementation of each of Okoli and Schabram’s eight steps [36] is discussed in the following.
(1) Purpose of the literature review. By systematically examining the existing ways to enhance recommender systems with reputation data and relating them to one another, the state of the art of this research stream is revealed.
(2) Protocol and training. When conducting a systematic literature review, it is crucial to act according to a detailed protocol. The most important aspects are pointed out for each step within this subsection. Training is not applicable to this paper because the literature review has essentially been conducted by the first author only. Nevertheless, conceptual feedback by the co-authors has been taken into consideration.
(3) Searching for the literature. The main issue to consider regarding the literature search is systematics. In this literature review, the following five digital libraries are used: ACM Digital Library, AIS Electronic Library, IEEE Xplore Digital Library, ScienceDirect, and Scopus. As demanded by vom Brocke et al. [51], they are chosen because they provide access to the journals and conference proceedings that are most relevant to the topic of this paper. In order to discover as many potentially relevant publications as possible, we use the very general search phrase “recommend* AND reputation”. We also use the search phrase “collaborative AND reputation” because there are several publications in the recommender systems field mentioning only collaborative filtering instead of recommender systems in general. Since recommender systems are relevant in multiple research disciplines (e.g. computer science, engineering, mathematics), we do not exclude any of them from the initial search. We also do not exclude any work based on the year of publication. Moreover, we search for both journal articles and conference papers. The initial search carried out in November 2016 resulted in 420 hits at ACM, 19 hits at AIS, 341 hits at IEEE Xplore, 241 hits at ScienceDirect, and 1,367 hits at Scopus.
(4) Practical screen. Since we use very general search phrases and do not exclude any disciplines from our search, we receive a high number of initial search results (especially considering the narrow focus of this paper). All these publications enter the screening process by title, in which many of the clearly irrelevant ones can be removed. The relevance of the remaining papers is then judged based on their abstracts. Again, they are removed only if they are clearly not applicable to the scope of the literature review. If there are any doubts about their relevance, they are kept for the time-consuming full text review. In order to be relevant, a proposal first of all has to contain both an actual recommender and an actual reputation component. On the one hand, this excludes papers using the term “recommendation” to describe a rating or second-hand information in the reputation systems domain. On the other hand, this also excludes work creating recommendations by simply ranking items according to their reputation values. In addition, publications are regarded as relevant only if the considered recommendation and reputation values as well as the combinations of recommender and reputation components are sufficiently described. Figure 5 provides an overview of how the number of potentially relevant publications evolved during the practical screen. With the title screening, the number of potentially relevant publications was reduced to 243. After the abstract screening, 81 publications remained for the detailed full text review. 24 out of these 81 publications were finally judged as relevant to the focus of this paper. A backward search (i.e. reviewing the references given in the identified publications) brought forth one additional publication, whereas a forward search (i.e. reviewing the papers citing the identified publications) yielded no relevant papers that had not been discovered before.

Practical screen in numbers.
(5) Quality appraisal. Publications may be judged based on the ranking of their outlets. Since we examine an emerging research stream for which the number of publications in top journals and at top conferences is still low, however, we do not limit our focus to highly recognized and popular work only.
(6) Data extraction. In this step, the information from those publications the full text review brings forth as relevant are collected. In order to be able to compare the publications in a structured manner, we develop a dedicated taxonomy as a basis for the data extraction step (cf. Section 3.2). Particular attention is paid to the hybridization approaches, the underlying recommender and reputation systems, and the evaluations described in the papers.
(7) Synthesis of studies. Based on the notes of the data extraction step, the relevant publications are analyzed in detail. With the help of our taxonomy, we provide a structured overview of existing work (cf. Section 3.3) and are able to identify directions for future research efforts (cf. Section 4).
(8) Writing the review. Presenting the insights gained in the synthesis step concludes the eight-step process of conducting a systematic literature review.
As previously described, the data extraction step requires the excerption of the publications judged as relevant in the full text review. In the following, a taxonomy providing a clear structure for this activity is developed.
First and foremost important, reputation-enhanced recommender systems can be analyzed according to their hybridization approaches. Following Burke’s [13] overview of methods for the hybridization of two or more recommendation techniques, we define the first dimension for distinguishing different approaches to enhance recommender systems with reputation data: the hybridization method dimension. We adapt the methods listed by Burke [13] to the hybridization scenario of this paper, resulting in the following six categories:
Weighted: The respective outputs of a recommender and a reputation system are combined based on a weighting factor.
Switching: If a recommender system is not able to generate enough suggestions, a reputation system is used instead or in addition.
Mixed: The outputs of both systems may be presented at the same time. In particular, the final recommendation value is high only if both individual values are high.
Rec-rep-cascade: A reputation system refines the output of a recommender system.
Rep-rec-cascade: A reputation system pre-filters the input for a recommender system.
Augmentation: Reputation data is considered directly within the calculations of the recommender system.
Furthermore, Fig. 4 (cf. Section 2.3) shows that there are two kinds of data bases in connection with recommender and reputation systems:
Combinations of recommender and reputation system data bases
Combinations of recommender and reputation system data bases
Comparison of publications according to the developed taxonomy
In addition to the previously introduced hybridization approach dimensions, reputation-enhanced recommender systems can be compared according to the underlying recommender and reputation systems. Therefore, the third dimension and the fourth dimension focus on the recommendation approach and the reputation components, respectively. Regarding the recommendation approach categories, we distinguish between the three commonly accepted approaches [3]: content-based filtering (CbF), collaborative filtering (CF), and hybrid (CbF/CF). Regarding the reputation components categories, we rely on the component repository by Sänger et al. [43] introduced in Section 2.2. In particular, we use the secondary component classes as categories. They are listed along with the abbreviations used later on in this paper in Fig. 3 in Section 2.2. Although the ideas behind the recommendation algorithms of reputation-enhanced recommender systems are generally applicable to different contexts, the respective publications typically focus on a specific domain (at least for demonstration and evaluation purposes). This constitutes the fifth dimension of the taxonomy: the application area dimension. Possible values include movies, products, and hotels. However, we do not define a fixed list of categories for this dimension at this point because there is no comprehensive list in the literature we could rely on.
Apart from the characteristics of the developed systems, it is crucial to judge publications according to their evaluations because not all kinds of evaluation may proof the value of a proposal equally well. For example, real-world case studies are more meaningful than fictional scenarios by far. Here, we rely on the “how” of evaluation as described by Prat et al. [38] and adapt the dimensions and categories that are most relevant to our analysis. First, there is the evaluation technique dimension with its categories: case study, field study, action research, static analysis, dynamic analysis, controlled experiment, simulation, testing, informed argument, scenario, survey, and focus group. And second, there is the relativeness dimension with its categories: absolute and relative.
In total, our full text review consists of 82 papers published between 2004 and 2017. In the following, the ideas of the work finally judged as relevant to the scope of this paper are comprehensively described. The remarks are structured according to the hybridization method dimension. In addition, Table 2 compares the publications according to the complete taxonomy developed in Section 3.2. Please note that Abdel-Hafez et al. [1] describe two distinct hybridization approaches in their paper.
Weighted
McNally et al. [35] introduce a weighted hybridization method for the HeyStaks social search platform [45] in which recommender and reputation values are based on different data bases. The collaborative filtering recommender component determines the relevance scores of the search results with respect to a given search query whereas the reputation component aggregates the reputation scores of those HeyStaks members that are responsible for the existence of the search results. The particular reputation aggregation components belong to the simple arithmetic and statistic classes. However, it is not specified how the single reputation scores of the relevant HeyStaks members are determined in the first place. Alotaibi and Vassileva [5] pursue a similar approach for their recommender system for scientific papers. The recommender component is based on the content similarity between a candidate paper and the user’s current interests as well as on the ratings other users have assigned to the paper. Thus, it constitutes a hybrid variant combining content-based filtering and collaborative filtering. The reputation component relies on the reputation of the author of the candidate paper, which is represented by a simple arithmetic measure such as the h-index. In the crowdsourcing recommender of Wang et al. [52], the collaborative filtering recommender component identifies appropriate tasks based on user similarities whereas the reputation component relies on the reputations of the task requesters. However, it is not specified how the reputations of the task requesters are determined. The product recommender system proposed by Cui et al. [16] combines the reputation value of an item (determined according to a simple arithmetic aggregation component termed “favorable rating ratio”) with the recommendation value of the user providing the respective item.
Abdel-Hafez et al. [1] describe a weighted hybridization method in which the recommender and the reputation system use the same data base. The first step is to perform the Borda count method separately for the ranked output lists of the recommender system and the reputation system. In general, this step is independent of the specific components of the reputation system. Nevertheless, the authors relied on a reputation model based on the Dirichlet probability distribution as a form of statistic aggregation component. In the second step, the weighted sum of the Borda count scores is determined for each item by assigning weights to the two Borda count lists. The item with the highest total score is recommended to the user. Abdel-Hafez et al. [2] introduce a recursive variant of this approach. In another proposal belonging to this category, Wang et al. [53] suggest the weighted enhancement of a product’s recommendation value with its reputation and its purchase frequency. However, it is not specified how the reputation of a product is determined.
Switching
The switching method is used by Bedi et al. [10] in their restaurant recommender termed SRPRS. The system produces a list of collaborative filtering recommendations based on the degrees of importance of the items retrieved from similar users. Only if the recommendation list does not contain as many items as requested, it is extended based on the degrees of importance of all items whose reputation values are greater than some threshold. However, there are no details on the determination of the reputation values of the items.
The ideas of SRPRS can also be found in two other proposals identified in the literature review: MARST [11] and SAPRS [12]. Although the exact items considered for these systems may slightly differ (MARST considers not only restaurants but also hotels and points of interest), they all focus on scenarios in which the recommender and the reputation component rely on the same data base.
Mixed
The service recommender developed by Yazidi et al. [58] is divided into several subsystems. Among others, there is a content-based recommender component identifying relevant services based on the user’s context and profile as well as a reputation component managing the reputation value of the services. A service is recommended only if it is positively evaluated by all subsystems. In this work, particular attention is paid to the unfair rating difficulty in reputation systems. Therefore, a clustering filter based on stochastic learning automata [57] is included. Yan et al. [56] describe a system to recommend the usage of mobile applications based on the applications’ local collaborative filtering recommendation values as well as their public reputation values. The applications are recommended only if they possess both a high personalized recommendation value and a high public reputation value. As opposed to the publications introduced so far, the determination of reputation scores is discussed in more detail here. In particular, a weighted summation is used, with the weights coming from time discounting as well as the users’ reliability values regarding application reputation assessment. Jøsang et al. [22] introduce an operator which returns a high total value only if both the recommendation and the reputation score are high. This is supposed to “amplify the discriminating power” [22]. The recommendation score is based on collaborative filtering while the reputation score is determined according to subjective logic, which falls into the statistic aggregation component class. Even though not applied in the paper, it is at least mentioned that reputation scores could furthermore be adjusted through various weighting components. Similarly to the approaches employing the switching method, the systems based on the mixed method all combine recommender and reputation systems relying on the same data base.
Rec-rep-cascade
Constantinov et al. [15] propose a rec-rep-cascade hybridization using different data bases. First, a hybrid recommender system determines a product the customer is supposed to be particularly interested in. Then, a reputation system depicts information relevant for the assessment of the trustworthiness of the sellers offering the product (including the average value of the received ratings). Because of the limited size of the platform, the reputation information is limited to only one seller. On a larger platform, however, there would be many providers offering the same item. Then, the reputation system helps determine the most trustworthy one.
In contrast, Abdel-Hafez et al. [1] consider a cascade hybridization of a recommender and a reputation system relying on the same data base. They enhance a recommender system’s output by re-sorting the top-M recommendations based on their reputation values. Thus, only the top-M items according to the recommender system enter the second step of the cascade. Finally, the top-N (
Rep-rec-cascade
Tserpes et al. suggest that “providers that systematically fail to comply with their obligations against the consumers will be isolated” [49] and thus to use reputation data as a pre-filtering mechanism prior to the recommendation process in their service recommender. Guo et al. [19] realize this by extending their document recommendation system with a reputation component keeping track of the reputation values of the users according to their activities and the acceptance rates of the documents shared by them. If the reputation value of a user, which is updated according to a statistic aggregation component and referred to as authority, drops below a particular group’s threshold, he can no longer access this group and his sharing activities are no longer considered in any recommendations. The recommender system introduced by Yu et al. [59] also excludes users with negative reputation values from the item recommendation process. However, the determination of reputation is different because it is not based on a statistic model but simply averages all received trust values of a particular user. If a user has not been assigned any personal trust value, trust propagation is applied relying on the credibility and reliability weighting component.
Augmentation
In contrast to the proposals discussed so far, the following approaches integrate the reputation data directly into the computation process of the recommender system. In all of them, the recommender component is concerned with items whereas the reputation values belong to users (e.g. sellers, providers). Qian et al. [34] employ a particular user’s reputation value to control the importance of his ratings in the matrix factorization process of the proposed product recommender. The reputation value is determined according to a statistic-based filter relying on the correlation between the user’s ratings and the average ratings of the items rated by the user. Tang et al. [39] propose a similar augmentation approach but with a different reputation model relying on graph-based aggregation in the form of a PageRank-based algorithm. Cimini et al. [14] use the reputations of news item creators to replace or at least supplement the consideration of similarity values in the collaborative filtering calculations of their news recommender system. The news item creators’ reputation values are based on the fraction of users that have liked the respective news items. Similarly, Su et al. [46] use the reputations of web service users to enhance the similarity calculations within the collaborative filtering process of their quality of service prediction approach. The reputation value of a particular user depends on the deviatons between the user’s ratings and the opinions of the honest clusters of users on the web servies rated by the user. It is calculated according to the beta-family of probability density functions [55]. Liu et al. [31] suggest to overcome the limitation of content-based filtering systems of recommending only items similar to the ones a user has previously liked by augmenting the user’s rating matrix with his group’s preference scores. The group’s preference score for an item in their blog article recommender is derived according to the reputation of the users who have pushed the particular item. A user’s reputation value, in turn, is based on the amount of articles pushed by him as well as the number of users following these articles. Since different reputation values are calculated for different article categories, a context comparability component is applied. Liu et al. [29,30] also use this idea for a document recommender based on the similarity between the topic interests of a community and the target documents. The topic interests are determined according to the topics collected by the community and the reputation of the users who have collected them. The users’ reputation values, in turn, are based on the number of push interactions indicating that other users found a document helpful. In particular, the reputation values are calculated according to a graph-based aggregation component in the form of a PageRank-based algorithm.
Limitations of the literature review
Overall, our review serves as a comprehensive summary of the state of the art of reputation-enhanced recommender systems and can, as such, be used for understanding or new research. Even though we ensured a high quality of the review by relying on well-recognized guidelines, there are some limitations to discuss.
Analyzing the literature according to a newly developed taxonomy carries the risk that the insights gained might be of little value if the dimensions are poorly defined. To mitigate this potential shortcoming, we derived the data base dimension from commonly accepted principles regarding recommender and reputation systems and kept its values generalized. The hybridization method dimension is based on published research as it adapts the values of Burke’s [13] work on hybrid recommender system. The same applies to the recommendation approach, the reputation components, and the evaluation dimensions, which rely on the remarks of Adomavicius and Tuzhilin [3], Sänger et al. [43], and Prat et al. [38], respectively.
Another possible limitation is that relevant literature might not be included in our search results. Since we chose five of the most relevant databases, used them with very general search phrases, and conducted forward as well as backward searches, however, it is unlikely that we missed many relevant publications.
Discussion and future research directions
The analysis of the literature yields several observations. First of all, the publication years of the papers (cf. Fig. 6) suggest a growing interest in reputation-enhanced recommender systems especially since 2011. This trend experienced a short break in 2015 but was continued in 2016. Note that our search results already contain one relevant article that has been planned to be published in a 2017 journal issue.

Distribution of the finally relevant publications by year.
Coverage of the hybridization approach dimensions
Turning to the contents of the existing work, important insights on the state of the art of the research stream can be gained by assigning the publications to the different hybridization approaches, whose dimensions and categories are introduced as the most important ones of our taxonomy in Section 3.2. As Table 3 shows, each hybridization method is covered by at least three proposals. Each category of the data base dimension is covered by multiple publications as well. However, not all combinations of data base and hybridization method categories have been addressed so far. Our search results do not contain any proposals regarding the switching and the mixed hybridization with different data bases as well as the rep-rec-cascade and the augmentation hybridization with the same data base. Therefore, the first future research direction is to investigate whether the missing combinations are applicable to meaningful use cases and whether corresponding systems lead to performance improvements. Abdel-Hafez et al. [1], for example, justify their decision to focus on the rec-rep-cascade hybridization instead of the rep-rec-cascade hybridization with the assumption that personalized recommender-generated lists would be more accurate than non-personalized reputation-generated lists and therefore should be used as the primary candidate recommendation list. Although this assumption is intuitively understandable, its validity is still worth investigating. Going beyond the mere applicability to meaningful use cases, it may also be worthwhile to investigate if there are hybridization approaches (including both hybridization method and data base) that are supposed to have advantages over others and why, and if this superiority is dependent on a specific application area or generally valid.
Coverage of the reputation components dimension
Similarly to the coverage of the hybridization approaches, some further insights on the state of the art of reputation-enhanced recommender systems can be gained by analyzing the identified publications regarding the usage of the different reputation component classes. To begin with, it is important to note that, in general, it does not matter for reputation-enhanced recommender systems where the reputation data come from or how they have been determined. What is crucial is that reputation data are integrated in the recommendation process in the first place. Nevertheless, Table 4 reveals that quite a few of the publications identified in the literature review indeed comment on the reputation components underlying the considered reputation data. Thus, the recommender systems community can be regarded as not entirely unfamiliar with computation methods used in the research field of reputation systems. Obviously, most of the literature addresses aggregation components. This is not surprising since, as opposed to the optional nature of filtering and weighting components, every reputation system has to include an aggregation component in order to come up with the final reputation values. Thus, one would expect that each of the identified publications must be listed in Table 4 at exactly one aggregation component. The observation that some publications do not appear at all can be explained by the fact that they just use the reputation data but do not elaborate on their provenance. Some form of aggregation has certainly been performed before, probably by relying on the simple arithmetic component class. The observation that two publications are listed at more than one aggregation component is reasoned by the fact that these publications describe different aggregation alternatives (i.e. [35]) and combine two aggregation components (i.e. [48]), respectively. The latter actually conflicts with the intuitive assumption that every reputation system comprises exactly one aggregation component, which is also implemented in the software framework on the composition of reputation systems proposed by Sänger et al. [42]. In this respect, we propose to conduct some follow-up investigations on the basic classification of reputation systems components as the second future research direction. This may imply the reconsideration of some aspects of the comprehensive component repository by Sänger et al. [43], such as regarding the term “aggregation” or regarding the classification of ranking techniques and/or PageRank-based algorithms.
As already indicated, both filtering and weighting components are mentioned less frequently in the literature on reputation-enhanced recommender systems. This has been expectable because of the optional nature of these components but still brings us to the third future research direction, which is to not only use some basic concepts from the research field of reputation systems but to draw on more in-depth knowledge. Following the preceding remarks, this means to not only use the reputation data but to actually comprehend their provenance. In addition, this also means to make more extensive use of sophisticated filtering and weighting components in particular. Thereby, reputation data can be tailored to the peculiarities of the intended recommender system, which is absolutely desirable considering the personalized nature of recommender systems.
Focusing on the evaluation dimensions, Table 2 (cf. Section 3.3) reveals that some of the publications are not thoroughly evaluated by comparing them to related work or not evaluated at all. Those publications that have actually been evaluated all show improvements in terms of the employed metrics, which supports the implicit claim of this paper that enhancing recommender systems with reputation data leads to better recommendation performance. Nevertheless, some of the evaluations are based on fictional and overly simplistic scenarios. Although demonstrations, as these light-weight forms of evaluation should rather be denoted, can show the feasibility and meaningfulness of the proposals, the fourth future research direction is to investigate how the systems that have been evaluated insufficiently or not at all actually compare to related baseline recommendation techniques using real-world data. Note that this direction consists of two requirements. The former, i.e. the comparison to related baseline recommendation techniques, particularly calls for a move from an absolute evaluation to a relative evaluation (cf. relativeness dimension). This is to ensure that a proposal not only makes sense per se but is actually superior to at least the most basic of the approaches comparable to it. The latter, i.e. the usage of real-world data, ensures that improvements cannot be claimed by simply creating unrealistic scenarios and data sets that unfairly favor one’s own proposal.
The ultimate goal regarding the evaluation dimensions, and thus the fifth future research direction, is to not only compare the developed systems to baseline recommendation techniques but also among one another. To determine the best proposal for a specific use case, it is necessary to make the respective evaluations comparable by always using the same metrics and data sets. This is far from being an easy task because not all of the existing approaches are described in sufficient detail to be able to re-implement them and compare them to one another. Moreover, determining (let alone creating) the most meaningful metric and the most appropriate data set is not trivial either. Thus, for scenarios for which there are already numerous applicable proposals, the research efforts required for a comprehensive comparison with pre-defined metrics and data sets may actually be worth a stand-alone publication.
The marginal improvements that may be achieved from further optimizing highly sophisticated recommender algorithms have motivated scholars to broaden the horizon of recommender systems research and integrate relevant concepts from related fields. Since trust and reputation systems show substantial connections to recommender systems, there have been attempts to consider trust relationships in the recommendation process. However, personal trust links are only available in small numbers on modern online platforms because these are typically characterized by short-term interactions. As the concept of reputation is closely linked to trust but fits the peculiarities of modern online platforms better, this paper focused on the integration of reputation data instead of trust relationships. In fact, the corresponding research stream of reputation-enhanced recommender systems has attracted considerable attention in recent years. Therefore, our main goal was to provide a comprehensive survey of the approaches proposed so far. At first, we identified existing work in a systematic and exhaustive search process. Then, in order to relate the publications to one another, we developed a dedicated taxonomy based on commonly accepted principles and published research. Comparing the proposals according to the taxonomy resulted in a structured overview of the state of the art of the research stream.
On the one hand, our results help stimulate further innovation in reputation-enhanced recommender systems. Future research is not only needed to close or explain the identified gaps but also to improve the existing proposals. After all, there still is constant innovation in the respective research fields of recommender and reputation systems, which is why new hybridization approaches are needed and expected as well. On the other hand, this paper also serves as an important basis for the further exchange of ideas between both communities. For example, future research efforts could investigate the opposite of our approach: how recommender systems may be used to enhance reputation systems.
Footnotes
Acknowledgements
The authors would like to thank the anonymous reviewers for their helpful comments. The research leading to these results was supported by the “Bavarian State Ministry of Education, Science and the Arts” as part of the FORSEC research association.
