Abstract
Recommender System (RS) is an information filtering approach that helps the overburdened user with information in his decision making process and suggests items which might be interesting to him. While presenting recommendation to the user, accuracy of the presented list is always a concern for the researchers. However, in recent years, the focus has now shifted to include the unexpectedness and novel items in the list along with accuracy of the recommended items. To increase the user acceptance, it is important to provide potentially interesting items which are not so obvious and different from the items that the end user has rated. In this work, we have proposed a model that generates serendipitous item recommendation and also takes care of accuracy as well as the sparsity issues. Literature suggests that there are various components that help to achieve the objective of serendipitous recommendations. In this paper, fuzzy inference based approach is used for the serendipity computation because the definitions of the components overlap. Moreover, to improve the accuracy and sparsity issues in the recommendation process, cross domain and trust based approaches are incorporated. A prototype of the system is developed for the tourism domain and the performance is measured using mean absolute error (MAE), root mean square error (RMSE), unexpectedness, precision, recall and F-measure.
Introduction
Recommender system, an information filtering approach, provides suggestions interesting to the user by considering their past preferences ([1–5]). The quality of recommendation depends on a number of factors which states that accuracy alone is not enough criteria for the evaluation of presented recommendations ([6, 7]). One of the criteria of quality recommendation is to reduce the overspecialization problem. Overspecialization problem is about generating the list of recommendation that exist repeatedly in the range of the users past behavior, and it means that it follows a certain fashion that creates a loop of same list [8]. In order to improve the quality of recommendation, one factor is to recommend serendipitous items. In this paper, we have considered three factors to generate the serendipitous item recommendations and these factors are: novelty, relevancy and unexpectedness.
Motivation
The motivation behind the work is to get the answer of following research questions on which we have worked in this paper:-
RQ1- How we can assess the serendipitous recommendations for a target user and what are various factors that affect serendipity?
RQ2- How to deal with other problems of recommendations i.e. cold start problem and sparsity issues at the time of presenting serendipitous recommendations to the user?
RQ3- How the balancing takes place between the various components that effects serendipitous recommendations?
To answer the RQ1, we need to understand the factors that join hands in order to provide serendipity as result. The novelty of an item is related to both relevant and non-relevant items in the recommendation generation process. Several researchers have found that both the novelty and serendipity are closely related with each other [9]. Irrelevant items incorporate dissimilar items which may or may not be known to a user. Similarly, relevant items could be unknown to a user which may belong to the long tail [10] of the catalogue. The items selected from long tail increases the possibilities of getting a search result which is novel to the user as well as relevant. In this paper, novelty is considered as an important component to get the desired serendipitous result.
Unexpectedness is the term which is the backbone of the concept of serendipity. This concept supports the items recommended to be significantly different from the usual one that has a greater chance to result into serendipitous recommendation [11]. However, by combining the novelty with unexpectedness the list will be unknown as well as different for the user. Another important concept is relevancy that ensures even if the items are not according to user’s profile, it will not be somehow completely opposite to user’s interest [12].
To answer the RQ2 we have focused on the aspects that pay attention on dealing with the issues of cold start and sparsity problem. Cross domain and trust aware recommendation system are found as potential solutions in the mentioned area ([13–15]). Cross Domain Recommender System (CDRS) includes multiple domains to enrich the data in the target domain. This helps to solve the problem of sparsity and cold start problem [16]. The sparse data in the target domain will result into the compromise with the accuracy. The incorporation of cross domain with serendipitous recommendation results into the desired accuracy in the presented list.
Focusing on RQ3, the reason to maintain the balance between the novelty, relevancy and unexpectedness is not to provide importance to a single concept rather than covering each of them based on their significance for serendipitous item generation. Providing the importance to any one will reduce the strength of serendipity. In order to provide the decision about whether the serendipitous items should be recommended to the user depends on the intensity of its components. To balance the serendipity score, the proposed system utilizes fuzzy logic to handle the uncertainty while accessing the various components. The fuzzy logic is derived from fuzzy set theory to provide the approximate assessment [17]. The serendipity score is determined by its factors, as earlier mentioned novelty, relevancy and unexpectedness are used as linguistic input variable.
Rest of the paper is structured as follows. Section-2 discusses the review of literature whereas section-3 discusses the architecture, working and algorithm of the proposed approach. Evaluation of the proposed approach is given as in the later section and then concluded.
Literature survey
Recommender system provides a way to handle the information overload problem to help the end user in decision making process [18]. Providing quality recommendation to the user helps to improve the user satisfaction and increase the trust in the system. Collaborative filtering (CF) is one of the most widely used approaches in recommendation [19, 20]. CF approach suffers from the problem of sparsity and cold start problem.
However, researchers always try to focus on various aspects in order to improve the system e.g. beyond the accuracy of the presented list of recommendation. An attempt is made in the direction of generating serendipitous recommendation [21]. They have predicted the degree of curiosity of a user in an online tourism recommender system. This helps in order to show the positive response for the presented serendipitous and novel items recommendation. The overspecialization problem or serendipity problem always encourages getting the solution in order to surprise the user with the presented recommendation. Novelty and diversity are two key features that help to discuss the quality of recommendation beyond accuracy [22]. They have discussed three concepts to follow the definition of novelty and diversity as choice, discovery and relevance. Novelty of an items helps to handle the biasness in a system and to eradicate the problem [23]. The novelty of an item holds a key feature for the user satisfaction that is able to effectively recognize the effective item and ensure the accuracy as well [24].
A long tail phenomenon is related to the items which are not in high demands due to the unavailability of enough rating and sparse data issue in collaborative filtering. Getting recommendation from the long tail helps to add novelty to the presented list that results into the inclusion of not so popular items in the list [10]. The focus of the recommendation has been taken aside of the positive surprise for the user rather than only presenting accuracy in the presented list. On the same ground a survey has been presented by [25] in order to list out the work presented by various researchers.
In order to improve the serendipity and accuracy in the target domain an attempt is made by [9]. They have used cross domain approach to accomplish the serendipitous recommendation in target domain. A cross domain approach is able to handle the cold start and sparsity issue [13]. It utilizes knowledge from multiple domain i.e. source domain in order to generate or enhance prediction for the target user in the target domain ([16, 26]). Various attempts have been made in the area of cross domain to utilize its capabilities in other fields as well [27]. Trust has gained its own share of popularity in recommendation to provide most successful list of items to reduce the sparsity issue and presenting personalized recommendation to them [15]. An effort is made by [28] to combine trust and cross domain approach to enhance the system to handle cold start and sparsity issue in recommendation and generate personalized list. A fuzzy trust based system is proposed by [29] in order to get rid of the sparsity problem. They have used linguistic terms to represent fuzzy trust rather than numerical value [30]. Presented a survey where they also argued about various objectives which focuses on beyond the accuracy of the system: novelty, diversity, serendipity and coverage. Continuing the same objective of serendipitous recommendation a fuzzy based approach is provided by [17]. They have modeled similarity relationship using fuzzy graph among the fuzzy sets describing the metadata.
In this paper, a fuzzy based approach for serendipitous item generation is provided that helps to explore surprisingly interesting items. Moreover, only one component is not sufficient to make a decision on serendipitous items. The component that are found effective to exhibit serendipitous behavior are presented in linguistic terms using fuzzy set in order to represent in fuzzy logic. To deal with the sparsity issue this work incorporates cross domain along with trust aware recommender system.
Proposed Trust and Fuzzy Inference based Cross Domain Serendipitous Item Recommendation (TFCDSRS)
Serendipity can be considered as the output of the combined effort of novelty, relevancy and unexpectedness. This paper presents an approach to get a collective effort of these factors to generate serendipitous list of recommendations that also handles the sparsity and cold start problems. As these factors are subjective in nature, they are represented as fuzzy sets. In the next sub-section the architecture of the proposed system has been discussed.
Architecture of proposed system
The proposed system is a multi-agent based system where multiple agents works periodically and simultaneously in order to accomplish the goal.
The system commences the recommendation computation once the user enters into the system. It collects the basic information about the target user and stores it into the repository. This repository stores the result of various computation performed to accomplish the goal of serendipitous recommendation for the current user as well as for the future references. The collection of ratings and contextual information of user helps to find the neighborhood for the target user. Once the similar user is found, the system computes the prediction score for the target user. If the system does not have enough ratings then it approaches the source domain. The remote domain i.e. source domain helps to get the similar user for the target user in the remote domain. The assumption is taken that the identity of a user is shared across the domain. This helps to identify the target user for which the similar user is computed in the source domain. Further, the system proceeds for the trust computation before final neighborhood computation.
The trust computation in the source domain helps to get the trustworthy user that will increase the user acceptance towards the presented recommendation. Also, the trust computation helps to reduce the problem of sparsity [15]. The results from the computation of similarity and trust computation in source domains is combined and sent as response to the final neighborhood of the target user. The overall similarity computation combines the local and remote neighborhood after getting the response from the source domain. This neighborhood computation will provide similar users who might share the same interests as the target user. Using the neighborhood the not yet seen items is identified for the target user in the target domain.
The items which are above some threshold are selected for the further computation of serendipity score. The serendipity assessment phase determines the score for the items using the fuzzy logic because each component has their own importance in determining the serendipity of an item. These components works as input for the fuzzy logic and expressed as linguistic values.
The output of fuzzy computation is the crisp value that determines whether or not the item fulfills and to be recommended as the surprise recommendation.
The algorithm is outlined as follow:
In order to get the similar users collaborative filtering uses the ratings given by the user. The problem arises when the user does not have given enough ratings or when the user is new in the system. In both of the case, it becomes difficult to compute the neighborhood because in both of the cases if we do not have enough rating then it will lead to the poor results.
Where,
r xi and r yi denote the ratings of users x and y for ith item respectively.
Where,
Sim (x, y) denote the local similarity value between user x and y in tth domain.
cor (t, s) denote the correlation between the target domain t and remote domain s.
Where,
sim (i, j) is the similarity between two items i and j.
I t is the set of items in the domain t.
Rating based correlation has been used in the proposed approach.
Where,
ru,i is the rating for an item i by user u.
It is clear from the above equation that as the ratings behavior of two user’s matches then the confidence between them is high.
Where,
τi,j denote the trust of user i on user j.
n denote the total number of users.
The dynamic trust of user x on user yτx,y (∀ x ≠ y) in UT matrix is computed by combining similarity between users xandy, confidence of user x on user y and reputation of user y as follows:
Where,
x1: Sim (x, y) represents similarity between users x and y.
x2: conf (y|x) represents confidence of user x on user y
x3: ROU j represents reputation of user y
k1andk2 are very small positive constants
Diagonal values of the user trust represent the trust value on themselves, so it is set to 1. Initially at time t = 0, τx,y may be negligible but as the time passes, user y will become trustworthy of user x. As the similarity between users increases, correspondingly the reputation of user y increases. The recommendations are continuously generated by the user y for user x and user x gives a positive feedback against the generated recommendations.
This step involves the generation of trustworthy neighborhood using cross domain and trust based system. The final set of neighborhood is selected to process for the next phase of computation.
The previous phase helps to narrow down the final neighborhood for the target user. This helps to get the items for which the prediction score needs to be computed. Prediction score is computed for the not yet seen items for the target user. This phase is followed by the serendipity assessment phase that gives the list by computing the ‘surprise me’ component in the presented list.
Where,
sim (x, y) denotes the similarity between the user x and y.
ry,i denotes the rating of user x for an item i.
The serendipity assessment phase deals with the computation of the task involving various components using fuzzification. The fuzzy logic allows to get the intermediate values between the yes/no, true/false etc. [32]. The serendipity level is estimated using novelty, relevancy and unexpectedness. These components are used as linguistic input variable to mapped by the system as fuzzy member using fuzzy sets. The linguistic terms helps to ease in the expressiveness in the product features.
These input variables are represented using suitable fuzzy set and mapped with fuzzy number as listed below:
Novelty = {Popular, Less Popular, Unseen}
Relevance = {Valuable, Acceptable, Deprecate}
Unexpectedness = {Low, Mid, High}
The output variable i.e. serendipity score as fuzzy set is defined as:
Serendipity = {Low, Mid, High}
The standard triangular membership function (as shown in Fig. 2) is used by the system to represent the regions for each input and output variables. Similarly, the other inputs and output parameters (novelty, relevance, unexpectedness and serendipity) are also defined within the system.

Triangular Membership function for input parameter Unexpectedness.

MAE comparison for TFCDSRS and CF approach.

RMSE comparison for TFCDSRS and CF approach.

Distribution of Unexpected items variation from top-5 to top-30.

Precision, Recall and F-measure Comparison between TFCDSRS and Collaborative Filtering (CF) for the varying Top-N Items.

Precision, Recall and F-measure Comparison for various Top-N items for TFCDSRS.
The relationship between the input and output variable are defined by rules within system in the following format:
If Novelty is Popular AND Relevancy is Deprecate AND Unexpectedness is Low then Serendipity is Low
Linguistic definition for the importance
If Novelty is Unseen AND Relevancy is Deprecate AND Unexpectedness is Low then Serendipity is Low
If Novelty is Unseen AND Relevancy is Acceptable AND Unexpectedness is Low then Serendipity is Mid
If Novelty is Less Popular AND Relevancy is Valuable AND Unexpectedness is High then Serendipity is High
If Novelty is Unseen AND Relevancy is Valuable AND Unexpectedness is High then Serendipity is High
There are more rules for the same which are formulated in a similar way. After getting the value of serendipity using these rules, the next step is defuzzification which is done using centroid of area defined below.
Where,
μ A (Z) is the aggregate output of membership function.
Once the estimated serendipity score exceeds a pre-defined threshold value then accordingly further process generate the recommendation list.
Serendipity Score is computed using fuzzy logic where input values are novelty, relevancy and unexpectedness. The corresponding weight is extracted by the agent for the fuzzification. The computation is processed as follows:
There are various components that are involved in the serendipity computation. In this step the detail discussion about the components are presented.
The novel item is defined as the item that might not have been found by the user himself/herself. It drives to make the identification of the unseen item as novel item on the basis of novelty score of the item. Higher novelty score means the item is less popular as compared to the items having lower value of novelty score. Using the above relation between novelty and popularity, the novelty score (NS) of an item ‘i’ is computed as follows [23]:
Where,
IF = item frequency which is given as:
IUF is inverse user frequency for unseen item ‘I’ given by:
The item frequency is the measure of how frequently an item has been rated indicating the popularity of an item whereas, the inverse user frequency is the measure to balance the popularity of the item.
An item is supposed to be serendipitous only when the item has relevance with the user’s choice. The relevance of the item is related to the user interest depicted as in profile of the user. The most common way to get the interesting items for the target user is to consider the similar user’s choice which is generated by the neighborhood computation. The formulation of relevance score of the target user is computed using following formula:
Where,
C is the normalizing constant.
sim (u, v) is the similarity score between the user ‘u’ and ‘v’.
r (v, i) is the rating given by user ‘v’ to item ‘i’.
The relevancy score
Unexpected item could be both relevant and irrelevant but while considering the serendipity scenario an item should be unexpected, interesting and relevant at the same time. The basis for this measure is to get the items which are similar with the user profile. To measure the unexpectedness [33] has given a method which is based on the co-occurrence based item unexpectedness. This can be measured by the probability for the item to be seen (i.e. rated) together with the items in the user’s profile. It actually gives the approximate idea that if the items are not seen together then they are likely to be different.
Where,
p(i) and p(j) represent the probability for the items to be rated by any user, and p(i,j) is the probability for the same user that has rated both of the items
p(i) = a user who have rated item i/Total number of item a user has rated.
p(i,j) = a user who rated both i and j/Total number of item a user has rated.
For both of the above values lower values signifies better results.
To design the fuzzy system we require input parameters for the fuzzy sets. Fuzzy set is a collection of pairs consisting of members and degrees of “support” and “confidence” for those members. A linguistic variable whose values are words, phrases or sentences are labels of fuzzy sets. For fuzzification, all input values in respect of the above discussed input parameters are defined as a fuzzy number instead of crisp numbers by using suitable fuzzy sets.
Fuzzy sets shown above in turn are represented by a membership function. The membership function is the representation of the magnitude of the participation of each input. Here we have used triangular membership function to represent three regions for each input variable.
The analysis is done for the serendipity component of the items. The resultant is output as in serendipity score of the item that shows the best serendipitous item for the user. This includes following modules: Weights of various components i.e. novelty score, relevancy score and unexpectedness. Serendipity score Selection of items on the basis of threshold value of serendipity Recommendation of the items
Once the input and output fuzzy sets and membership functions are constructed, the fuzzy if-then rules are then framed to reflect the relationships between any possible relation of input variables and the output variable.
The levels of input parameters defined in above step are used in the antecedent of rules and the serendipity level as the consequent of rules. Usually T-norm and T-conorm operators are used in the evaluation of antecedent and consequents respectively as given below:
T-norm operator is T min (a, b) = min (a, b) and
T-conorm operator is T max (a, b) = max (a, b)
The rules are formulated using Mamdani’s inference method (Mamdani, 1993). Rules formulated in the rule base of the fuzzy inference system (serendipity assessment module) are represented in the format as given in the previous section.
After if-then rules are applied, the crisp output is calculated through a process called defuzzification. Defuzzification refers to the way a crisp value is extracted from a fuzzy set as a representative value. The most widely adopted method for defuzzifying a fuzzy set A of a universe of disclosure Z is centroid of area. This method is used to calculate the crisp value of output parameter context level.
The final list of items will be recommended that are selected using all the component of serendipity computation and is sorted in decreasing order of serendipity score.
To evaluate the accuracy of the list presented to user is in itself a challenging task which needs a tradeoff between two major evaluations metric: one is coverage and another is serendipity score [6]. They have focused on the quality impression perceived by the user that measures user satisfaction. In serendipity one of the key features is unexpectedness which boosts the user satisfaction [11]. Has discussed that providing the unexpected item increases the user satisfaction and apart from what they expect from the system. The evaluation metric to measure the prediction accuracy of the system is very frequently handled by many researchers. However, when it comes to evaluate the serendipitous recommendation [34] has provided unexpectedness, measured as the distance between the result produced by the method to be evaluated and the result produced by primitive prediction method. They have also taken into account the ranking in the presented list and evaluated the unexpectedness [35]. Discussed various researches around serendipity, e.g. various concept needed to generate serendipitous recommendation and evaluation that justify serendipity in the presented list.
To accomplish the evaluation different metric is used depending on the different measuring parameter. To evaluate the accuracy of the presented list Mean Absolute Error (MAE) and Root mean square error (RMSE) is used. Precision and recall are used as a measure to evaluate the effects of recommendation system. Also, the combination of the precision and recall are used to get the F-measure. The unexpectedness result informs about the serendipitous behavior of the items. In order to evaluate the system’s performance a prototype of the system is developed using Java; to develop the framework whereas JADE (Java Agent Development Environment); to provide the multi-agent environment and MySql; repository of the system.
Dataset
To evaluate the system we have used tourism dataset of Delhi, India. This domain further includes four sub-domains as: restaurants, hotels, travel places and shopping places. These sub-domains have information related to the system. The information about restaurants includes restaurants name, address, their opening and closing time, average cost per person etc. For hotels, this information includes hotel name, their location, charges etc. For shopping places and travel places, it includes their name, location, opening and closing time etc. Various websites are used to get these details e.g. Zomato, MakeMyTrip, TripAdvisor, Delhi Tourism, ShopKhoj. The dataset contains 8857 restaurants, 1023 hotels, 139 places to shop and 115 places to visit for entertainment or tourist spot.
Evaluation metric
To measure the accuracy of the system various metric have been used in the literature that includes statistical accuracy metric like MAE and RMSE. These metric actually helps us to know about how well a system can predict the rating for a specific item.
The MAE comparison is between the proposed approach i.e. FSCDRS and the traditional approach i.e. collaborative filtering (CF). We have compared the MAE by varying number of items and it has been found that the proposed system outperforms.
Again the RMSE comparison helps to get the difference between the two methods and again the proposed system outperforms.
Where,
The unexpectedness is computed for varying top-N items. This shows that as the top-N items increases the proposed system exhibit a better ‘surprise me’ component.
Where, U is a set of users, while RS u (K) is a set of top-K suggestions for user u. Recording from the test set for user u are represented by REL u .
The comparison between CF and TFCDSRS has been done for Precision, Recall and F-measure when the top-N items variation. The comparison shows the changes when it varies the top-N item as 5, 12, 17, 22, 27 and 32. The outcome of the comparison shows that the proposed work outperforms as compared to the traditional approach.
In this paper, Trust and Fuzzy Inference based Cross Domain Serendipitous Item Recommendation is presented that incorporates Trust and Cross Domain approach (TFCDSRS). The main emphasis of this paper is to incorporate fuzzy based approach to generate serendipitous item. The reason for incorporating the fuzzy approach is to combine and balance each of the factors that are necessary for serendipity computation. It also exhibits the capability to deal with the problem of sparsity and cold start problem by the use of trust and cross domain approach. Serendipity recommendation is considered as the combination of various components i.e. novelty, relevancy and unexpectedness. At the same time each components has its effect while generating the recommendation. So, the presented approach uses fuzzy based computation to make a decision on the serendipity score of the item present in the list of recommendations. While narrowing down the neighbourhood for the target user, if the system encounters the sparsity problem, then it goes for cross domain computation. Trust computation shows the effort in dealing with the cold start and sparsity issues. The proposed system is implemented using multi-agent approach and the performance of the system is measured using various standards metric. The evaluation of the proposed system shows that the presented system outperforms as compared to the traditional approach of recommendation. Overall the proposed system is able to present serendipitous recommendation while maintaining the accuracy.
