Trust and fuzzy inference based cross domain serendipitous item recommendations (TFCDSRS)

Abstract

Recommender System (RS) is an information filtering approach that helps the overburdened user with information in his decision making process and suggests items which might be interesting to him. While presenting recommendation to the user, accuracy of the presented list is always a concern for the researchers. However, in recent years, the focus has now shifted to include the unexpectedness and novel items in the list along with accuracy of the recommended items. To increase the user acceptance, it is important to provide potentially interesting items which are not so obvious and different from the items that the end user has rated. In this work, we have proposed a model that generates serendipitous item recommendation and also takes care of accuracy as well as the sparsity issues. Literature suggests that there are various components that help to achieve the objective of serendipitous recommendations. In this paper, fuzzy inference based approach is used for the serendipity computation because the definitions of the components overlap. Moreover, to improve the accuracy and sparsity issues in the recommendation process, cross domain and trust based approaches are incorporated. A prototype of the system is developed for the tourism domain and the performance is measured using mean absolute error (MAE), root mean square error (RMSE), unexpectedness, precision, recall and F-measure.

Keywords

Recommender system cross domain serendipity trust fuzzy sets

1 Introduction

Recommender system, an information filtering approach, provides suggestions interesting to the user by considering their past preferences ([1 –5]). The quality of recommendation depends on a number of factors which states that accuracy alone is not enough criteria for the evaluation of presented recommendations ([6, 7]). One of the criteria of quality recommendation is to reduce the overspecialization problem. Overspecialization problem is about generating the list of recommendation that exist repeatedly in the range of the users past behavior, and it means that it follows a certain fashion that creates a loop of same list [8]. In order to improve the quality of recommendation, one factor is to recommend serendipitous items. In this paper, we have considered three factors to generate the serendipitous item recommendations and these factors are: novelty, relevancy and unexpectedness.

1.1 Motivation

The motivation behind the work is to get the answer of following research questions on which we have worked in this paper:-

RQ1- How we can assess the serendipitous recommendations for a target user and what are various factors that affect serendipity?

RQ2- How to deal with other problems of recommendations i.e. cold start problem and sparsity issues at the time of presenting serendipitous recommendations to the user?

RQ3- How the balancing takes place between the various components that effects serendipitous recommendations?

To answer the RQ1, we need to understand the factors that join hands in order to provide serendipity as result. The novelty of an item is related to both relevant and non-relevant items in the recommendation generation process. Several researchers have found that both the novelty and serendipity are closely related with each other [9]. Irrelevant items incorporate dissimilar items which may or may not be known to a user. Similarly, relevant items could be unknown to a user which may belong to the long tail [10] of the catalogue. The items selected from long tail increases the possibilities of getting a search result which is novel to the user as well as relevant. In this paper, novelty is considered as an important component to get the desired serendipitous result.

Unexpectedness is the term which is the backbone of the concept of serendipity. This concept supports the items recommended to be significantly different from the usual one that has a greater chance to result into serendipitous recommendation [11]. However, by combining the novelty with unexpectedness the list will be unknown as well as different for the user. Another important concept is relevancy that ensures even if the items are not according to user’s profile, it will not be somehow completely opposite to user’s interest [12].

To answer the RQ2 we have focused on the aspects that pay attention on dealing with the issues of cold start and sparsity problem. Cross domain and trust aware recommendation system are found as potential solutions in the mentioned area ([13 –15]). Cross Domain Recommender System (CDRS) includes multiple domains to enrich the data in the target domain. This helps to solve the problem of sparsity and cold start problem [16]. The sparse data in the target domain will result into the compromise with the accuracy. The incorporation of cross domain with serendipitous recommendation results into the desired accuracy in the presented list.

Focusing on RQ3, the reason to maintain the balance between the novelty, relevancy and unexpectedness is not to provide importance to a single concept rather than covering each of them based on their significance for serendipitous item generation. Providing the importance to any one will reduce the strength of serendipity. In order to provide the decision about whether the serendipitous items should be recommended to the user depends on the intensity of its components. To balance the serendipity score, the proposed system utilizes fuzzy logic to handle the uncertainty while accessing the various components. The fuzzy logic is derived from fuzzy set theory to provide the approximate assessment [17]. The serendipity score is determined by its factors, as earlier mentioned novelty, relevancy and unexpectedness are used as linguistic input variable.

Rest of the paper is structured as follows. Section-2 discusses the review of literature whereas section-3 discusses the architecture, working and algorithm of the proposed approach. Evaluation of the proposed approach is given as in the later section and then concluded.

2 Literature survey

Recommender system provides a way to handle the information overload problem to help the end user in decision making process [18]. Providing quality recommendation to the user helps to improve the user satisfaction and increase the trust in the system. Collaborative filtering (CF) is one of the most widely used approaches in recommendation [19, 20]. CF approach suffers from the problem of sparsity and cold start problem.

However, researchers always try to focus on various aspects in order to improve the system e.g. beyond the accuracy of the presented list of recommendation. An attempt is made in the direction of generating serendipitous recommendation [21]. They have predicted the degree of curiosity of a user in an online tourism recommender system. This helps in order to show the positive response for the presented serendipitous and novel items recommendation. The overspecialization problem or serendipity problem always encourages getting the solution in order to surprise the user with the presented recommendation. Novelty and diversity are two key features that help to discuss the quality of recommendation beyond accuracy [22]. They have discussed three concepts to follow the definition of novelty and diversity as choice, discovery and relevance. Novelty of an items helps to handle the biasness in a system and to eradicate the problem [23]. The novelty of an item holds a key feature for the user satisfaction that is able to effectively recognize the effective item and ensure the accuracy as well [24].

A long tail phenomenon is related to the items which are not in high demands due to the unavailability of enough rating and sparse data issue in collaborative filtering. Getting recommendation from the long tail helps to add novelty to the presented list that results into the inclusion of not so popular items in the list [10]. The focus of the recommendation has been taken aside of the positive surprise for the user rather than only presenting accuracy in the presented list. On the same ground a survey has been presented by [25] in order to list out the work presented by various researchers.

In order to improve the serendipity and accuracy in the target domain an attempt is made by [9]. They have used cross domain approach to accomplish the serendipitous recommendation in target domain. A cross domain approach is able to handle the cold start and sparsity issue [13]. It utilizes knowledge from multiple domain i.e. source domain in order to generate or enhance prediction for the target user in the target domain ([16, 26]). Various attempts have been made in the area of cross domain to utilize its capabilities in other fields as well [27]. Trust has gained its own share of popularity in recommendation to provide most successful list of items to reduce the sparsity issue and presenting personalized recommendation to them [15]. An effort is made by [28] to combine trust and cross domain approach to enhance the system to handle cold start and sparsity issue in recommendation and generate personalized list. A fuzzy trust based system is proposed by [29] in order to get rid of the sparsity problem. They have used linguistic terms to represent fuzzy trust rather than numerical value [30]. Presented a survey where they also argued about various objectives which focuses on beyond the accuracy of the system: novelty, diversity, serendipity and coverage. Continuing the same objective of serendipitous recommendation a fuzzy based approach is provided by [17]. They have modeled similarity relationship using fuzzy graph among the fuzzy sets describing the metadata.

In this paper, a fuzzy based approach for serendipitous item generation is provided that helps to explore surprisingly interesting items. Moreover, only one component is not sufficient to make a decision on serendipitous items. The component that are found effective to exhibit serendipitous behavior are presented in linguistic terms using fuzzy set in order to represent in fuzzy logic. To deal with the sparsity issue this work incorporates cross domain along with trust aware recommender system.

3 Proposed Trust and Fuzzy Inference based Cross Domain Serendipitous Item Recommendation (TFCDSRS)

Serendipity can be considered as the output of the combined effort of novelty, relevancy and unexpectedness. This paper presents an approach to get a collective effort of these factors to generate serendipitous list of recommendations that also handles the sparsity and cold start problems. As these factors are subjective in nature, they are represented as fuzzy sets. In the next sub-section the architecture of the proposed system has been discussed.

3.1 Architecture of proposed system

The proposed system is a multi-agent based system where multiple agents works periodically and simultaneously in order to accomplish the goal.

The system commences the recommendation computation once the user enters into the system. It collects the basic information about the target user and stores it into the repository. This repository stores the result of various computation performed to accomplish the goal of serendipitous recommendation for the current user as well as for the future references. The collection of ratings and contextual information of user helps to find the neighborhood for the target user. Once the similar user is found, the system computes the prediction score for the target user. If the system does not have enough ratings then it approaches the source domain. The remote domain i.e. source domain helps to get the similar user for the target user in the remote domain. The assumption is taken that the identity of a user is shared across the domain. This helps to identify the target user for which the similar user is computed in the source domain. Further, the system proceeds for the trust computation before final neighborhood computation.

The trust computation in the source domain helps to get the trustworthy user that will increase the user acceptance towards the presented recommendation. Also, the trust computation helps to reduce the problem of sparsity [15]. The results from the computation of similarity and trust computation in source domains is combined and sent as response to the final neighborhood of the target user. The overall similarity computation combines the local and remote neighborhood after getting the response from the source domain. This neighborhood computation will provide similar users who might share the same interests as the target user. Using the neighborhood the not yet seen items is identified for the target user in the target domain.

The items which are above some threshold are selected for the further computation of serendipity score. The serendipity assessment phase determines the score for the items using the fuzzy logic because each component has their own importance in determining the serendipity of an item. These components works as input for the fuzzy logic and expressed as linguistic values.

The output of fuzzy computation is the crisp value that determines whether or not the item fulfills and to be recommended as the surprise recommendation.

The algorithm is outlined as follow:

Phase I: User Assessment Phase

Step 1: Formation of input data (preprocessing phase)

Step 2: Similarity Computation

Step 3: Cross Domain Computation

3.1- Overall similarity computation

3.2- Inter-domain Correlation Computation

3.3- Similarity Computation for items

Step-4: Trust computation

4.1- Similarity Computation

4.2- Confidence between users

4.3- Reputation of users

Step-5: Generating Trustworthy users using Cross domain and Trust computation

Phase II: Item Assessment Phase

Step-1: Selection of similar users

Step-2: Prediction Computation

Phase III: Serendipity Assessment

Step-1: Computation of various components of serendipity

1.1- Novelty Score

1.2- Relevance Score

1.3- Unexpectedness

Step-2: Fuzzy operations

Step 3: Rule Evaluation Operation

Step 4: Defuzzification operation on input parameters

Step-5: Final List of item selection

Phase I: User Assessment Phase

In order to get the similar users collaborative filtering uses the ratings given by the user. The problem arises when the user does not have given enough ratings or when the user is new in the system. In both of the case, it becomes difficult to compute the neighborhood because in both of the cases if we do not have enough rating then it will lead to the poor results.

Step 1.1: Formation of Input data: The system considers multiple domains where the input data is stored as user item rating matrix. Each domain consists of two dimensional rating matrix which shares same structure. The row of the matrix represents the user and column of the matrix represents the item involve. The range of the rating is scaled between 1–5.

Step 1.2: Similarity Computation: The similarity between users is computed using PCC. In the system the similarity agent (SA) periodically computes the similarity between the users and updates the same to the repository. The PCC formulates as:

$Sim (x, y) = \frac{\sum_{i = 1}^{n} (r_{xi} - {\bar{r}}_{x}) * (r_{yi} - {\bar{r}}_{y})}{\sqrt{\sum_{i = 1}^{n} {(r_{xi} - {\bar{r}}_{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(r_{yi} - {\bar{r}}_{y})}^{2}}}$ (2)

Where,

r_xi and r_yi denote the ratings of users x and y for i^th item respectively.

${\bar{r}}_{x}$ and ${\bar{r}}_{y}$ denote the average ratings of user x and y respectively.

Step 1.3: Cross Domain Computation The cross domain computation involves the similarity computation between the items in remote domain. Further, the target domain computes the overall similarity computation using the neighborhood provided by the source domain as follows:

1.3.1-Overall similarity computation: In CDRS distributed neighborhood approach is used to find out similar users in the target domain. To aggregate the set of neighborhood, target domain computes the overall similarity. Inter-domain correlation and averaging the similarity score of the neighborhood helps to get the overall similarity score [26]. $Sim (x, y) = \frac{\sum_{t \in T} cor (t, s) {sim}_{t} (x, y)}{\sum_{t \in T} cor (t, s)}$ (3)

Where,

Sim (x, y) denote the local similarity value between user x and y in t^th domain.

cor (t, s) denote the correlation between the target domain t and remote domain s.

1.3.2 Inter-domain Correlation Computation: The computation provides that how closely the two domain, target domain and remote domains, are related. The overall computation between domains is computed as follows:

$\begin{matrix} {cor}_{rating} (t_{1}, t_{2}) \\ = AVG {sim (i, j) : i \neq j, i \in I_{t_{1}}, j \in I_{t_{2}}} \end{matrix}$ (4)

Where,

sim (i, j) is the similarity between two items i and j.

I_t is the set of items in the domain t.

Rating based correlation has been used in the proposed approach.

1.3.3 Similarity Computation between items for correlation computation: The rating based correlation between two items i and j which belongs to two different domains is computed using PCC. An item-item matrix is generated by iterating all the items and computing the similarity score for each pair of them. For the correlation computation the first step is to isolate the user which has rated both the items i and j. The items which are co-rated by the user compute similarity between them as follows: $Sim (i, j) = \frac{\sum_{u \in U} (r_{u, i} - {\bar{r}}_{i}) * (r_{u, j} - {\bar{r}}_{j})}{{\sqrt{\sum_{u \in U} (r_{u, i} - {\bar{r}}_{i})}}^{2} {\sqrt{\sum_{u \in U} (r_{u, j} - {\bar{r}}_{j})}}^{2}}$ (5)

Where,

${\bar{r}}_{i}$ is the average of the ith item ratings.

r_u,i is the rating for an item i by user u.

Step 1.4: Trust and reputation computation : The proposed approach considers trust as combination of three factors: similarity, confidence and users’ reputation [31]. The i^th row and j^th column entry of this matrix represents the trust of user i on user j and this value lies in between 0 and 1. The major steps to compute trust are the similarity between users, confidence between users and reputation score of user:

1.4.1- Similarity Computation: We have used Pearson correlation coefficient for the similarity computation as shown in Equation (2).

1.4.2 Confidence between users: The confidence between two users (user x and user y) is computed as follows:

$conf (y | x) \frac{no of common items rated by both x and y}{no of items rated by x}$ (6)

It is clear from the above equation that as the ratings behavior of two user’s matches then the confidence between them is high.

1.4.3 Reputation of users: Initially at time t = 0 the reputation score the reputation score are set to zero (7). Once the user trust matrix is initialized, the reputation score for each user is computed to form a user reputation vector. The reputation of j^th user ROU_j in the UR vector is computed as: ${ROU}_{j} = \frac{\sum_{i = 1}^{n} τ_{i, j}}{n - 1} ifi \neq j, 1 ⩽ j ⩽ n$ (7)

Where,

τ_i,j denote the trust of user i on user j.

n denote the total number of users.

The dynamic trust of user x on user yτ_x,y (∀ x ≠ y) in UT matrix is computed by combining similarity between users xandy, confidence of user x on user y and reputation of user y as follows: $τ_{x, y} = {\begin{matrix} {H (x}_{1} {, x}_{2} {, x}_{3}) & if x_{1} \neq 0, x_{2} \neq 0, x_{3} \neq 0 \\ {H (x}_{1} {, x}_{2} {, k}_{1}) & if x_{1} \neq 0, x_{2} \neq 0, x_{3} = 0 \\ {H (k}_{1} {, x}_{2} {, x}_{3}) & if x_{1} = 0, x_{2} \neq 0, x_{3} \neq 0 \\ k_{2} \times x_{2} & if x_{1} = 0, x_{2} \neq 0, x_{3} = 0 \\ k_{2} \times x_{3} & if x_{1} = 0, x_{2} = 0, x_{3} \neq 0 \\ 0 & if x_{1} = 0, x_{2} = 0, x_{3} = 0 \end{matrix}$ (8)

Where, $H (x_{1} x_{2} \dots x_{n}) = {(\frac{1}{n} \times \sum_{i = 1}^{n} x_{i}^{- 1})}^{- 1}$ (9)

x₁: Sim (x, y) represents similarity between users x and y.

x₂: conf (y|x) represents confidence of user x on user y

x₃: ROU_j represents reputation of user y

k₁andk₂ are very small positive constants

Diagonal values of the user trust represent the trust value on themselves, so it is set to 1. Initially at time t = 0, τ_x,y may be negligible but as the time passes, user y will become trustworthy of user x. As the similarity between users increases, correspondingly the reputation of user y increases. The recommendations are continuously generated by the user y for user x and user x gives a positive feedback against the generated recommendations.

Step-1.5: Generating Trustworthy users using Cross domain and Trust computation

This step involves the generation of trustworthy neighborhood using cross domain and trust based system. The final set of neighborhood is selected to process for the next phase of computation.

Phase II: Item Assessment Phase

The previous phase helps to narrow down the final neighborhood for the target user. This helps to get the items for which the prediction score needs to be computed. Prediction score is computed for the not yet seen items for the target user. This phase is followed by the serendipity assessment phase that gives the list by computing the ‘surprise me’ component in the presented list.

Step-2.1: Final Neighborhood Selection: The last step of the previous phase gets the neighborhood of the target user. For a set of neighborhood, above some pre-defined threshold, are selected to provide the not yet seen item list for the target user. The next step computes the prediction score for the not yet seen items.

Step-2.2: Prediction Computation: The prediction computation of an item for the target user gives the prediction score for the item. This score decides that whether an item will be recommended to the group of user or not. The formula for prediction computation is given below: $Pred (x, i) = {\bar{r}}_{x} + \frac{\sum_{y \in U} sim (x, y) * (r_{y, i} - {\bar{r}}_{y})}{\sum_{y \in U} sim (x, y)}$ (10)

Where,

sim (x, y) denotes the similarity between the user x and y.

${\bar{r}}_{x}$ and ${\bar{r}}_{y}$ denote the average rating of user x and y.

r_y,i denotes the rating of user x for an item i.

Phase III: Serendipity Assessment Phase

The serendipity assessment phase deals with the computation of the task involving various components using fuzzification. The fuzzy logic allows to get the intermediate values between the yes/no, true/false etc. [32]. The serendipity level is estimated using novelty, relevancy and unexpectedness. These components are used as linguistic input variable to mapped by the system as fuzzy member using fuzzy sets. The linguistic terms helps to ease in the expressiveness in the product features.

These input variables are represented using suitable fuzzy set and mapped with fuzzy number as listed below:

Novelty = {Popular, Less Popular, Unseen}

Relevance = {Valuable, Acceptable, Deprecate}

Unexpectedness = {Low, Mid, High}

The output variable i.e. serendipity score as fuzzy set is defined as:

Serendipity = {Low, Mid, High}

The standard triangular membership function (as shown in Fig. 2) is used by the system to represent the regions for each input and output variables. Similarly, the other inputs and output parameters (novelty, relevance, unexpectedness and serendipity) are also defined within the system.

Fig. 1

Triangular Membership function for input parameter Unexpectedness.

Fig. 2

MAE comparison for TFCDSRS and CF approach.

Fig. 3

RMSE comparison for TFCDSRS and CF approach.

Fig. 4

Distribution of Unexpected items variation from top-5 to top-30.

Fig. 5

Precision, Recall and F-measure Comparison between TFCDSRS and Collaborative Filtering (CF) for the varying Top-N Items.

Fig. 6

Precision, Recall and F-measure Comparison for various Top-N items for TFCDSRS.

The relationship between the input and output variable are defined by rules within system in the following format:

If Novelty is Popular AND Relevancy is Deprecate AND Unexpectedness is Low then Serendipity is Low

Linguistic definition for the importance

Component	Linguistic Terms	Triangular fuzzy numbers
Novelty	Unseen	(0,0) (0.25,1) (0.5, 0);
	Less Popular	(0.25,0) (0.5,1) (0.75,0);
	Popular	(0.5, 0) (0.75, 1) (1, 0);
Relevancy	Deprecate	(0, 0) (0.25, 1) (0.5, 0);
	Acceptable	(0.25,0)(0.5,1)(0.75, 0);
	Valuable	(0.5, 0) (0.75, 1) (1, 0);
Unexpectedness	Low	(0, 0) (0.25, 1) (0.5, 0);
	Mid	(0.25, 0) (0.5, 1) (0.75, 0);
	High	(0.5, 0) (0.75, 1) (1, 0);

If Novelty is Unseen AND Relevancy is Deprecate AND Unexpectedness is Low then Serendipity is Low

If Novelty is Unseen AND Relevancy is Acceptable AND Unexpectedness is Low then Serendipity is Mid

If Novelty is Less Popular AND Relevancy is Valuable AND Unexpectedness is High then Serendipity is High

If Novelty is Unseen AND Relevancy is Valuable AND Unexpectedness is High then Serendipity is High

There are more rules for the same which are formulated in a similar way. After getting the value of serendipity using these rules, the next step is defuzzification which is done using centroid of area defined below.

$Centroid of area = \int_{z} μ_{A} (Z) zdz / \int_{z} μ_{A} (Z) dz$ (1)

Where,

μ_A (Z) is the aggregate output of membership function.

Once the estimated serendipity score exceeds a pre-defined threshold value then accordingly further process generate the recommendation list.

Serendipity Score is computed using fuzzy logic where input values are novelty, relevancy and unexpectedness. The corresponding weight is extracted by the agent for the fuzzification. The computation is processed as follows:

Step-3.1: Computation of various components

There are various components that are involved in the serendipity computation. In this step the detail discussion about the components are presented.

3.1.1 Novelty score

The novel item is defined as the item that might not have been found by the user himself/herself. It drives to make the identification of the unseen item as novel item on the basis of novelty score of the item. Higher novelty score means the item is less popular as compared to the items having lower value of novelty score. Using the above relation between novelty and popularity, the novelty score (NS) of an item ‘i’ is computed as follows [23]:

$NS (i) = IF (i) * IUF (i)$ (11)

Where,

IF = item frequency which is given as:

$\begin{matrix} IF (i) \\ = \frac{no of rating received by item i}{Maximum number of ratings received by any item} \end{matrix}$

IUF is inverse user frequency for unseen item ‘I’ given by: $IUF (i) = \log {\frac{total no of users}{no of users who have rated item i}}$

The item frequency is the measure of how frequently an item has been rated indicating the popularity of an item whereas, the inverse user frequency is the measure to balance the popularity of the item.

3.1.2 Relevance score

An item is supposed to be serendipitous only when the item has relevance with the user’s choice. The relevance of the item is related to the user interest depicted as in profile of the user. The most common way to get the interesting items for the target user is to consider the similar user’s choice which is generated by the neighborhood computation. The formulation of relevance score of the target user is computed using following formula: $\hat{r} (u, i) = \bar{r} (u) + C \sum_{v \in N_{k}} sim (u, v) (r (v, i) - \bar{r} (v))$ (12)

Where,

$\hat{r} (u, i)$ denotes the relevance of the unseen item ‘i’ by the user ‘u’.

$\bar{r} (u)$ is the average ratings provided by user u.

C is the normalizing constant.

sim (u, v) is the similarity score between the user ‘u’ and ‘v’.

r (v, i) is the rating given by user ‘v’ to item ‘i’.

The relevancy score $\hat{r} (u, i)$ is computed for each item present in the unseen item list by the target user. This is further taken into consideration as input for the fuzzy based computation. A serendipitous item should be novel as well as relevant because for a user novel item could be both relevant and irrelevant.

3.1.3 Unexpectedness

Unexpected item could be both relevant and irrelevant but while considering the serendipity scenario an item should be unexpected, interesting and relevant at the same time. The basis for this measure is to get the items which are similar with the user profile. To measure the unexpectedness [33] has given a method which is based on the co-occurrence based item unexpectedness. This can be measured by the probability for the item to be seen (i.e. rated) together with the items in the user’s profile. It actually gives the approximate idea that if the items are not seen together then they are likely to be different. $PMI = - \log_{2} \frac{p (i, j)}{p (i) p (j)} / \log_{2} p (i, j)$ (13)

${Unexpectedness}^{co - occ 1} (i) = \max_{j \in P} PMI (i, j)$ (14)

Where,

p(i) and p(j) represent the probability for the items to be rated by any user, and p(i,j) is the probability for the same user that has rated both of the items

p(i) = a user who have rated item i/Total number of item a user has rated.

p(i,j) = a user who rated both i and j/Total number of item a user has rated.

For both of the above values lower values signifies better results.

Step-3.2: Fuzzification operation:

To design the fuzzy system we require input parameters for the fuzzy sets. Fuzzy set is a collection of pairs consisting of members and degrees of “support” and “confidence” for those members. A linguistic variable whose values are words, phrases or sentences are labels of fuzzy sets. For fuzzification, all input values in respect of the above discussed input parameters are defined as a fuzzy number instead of crisp numbers by using suitable fuzzy sets.

Fuzzy sets shown above in turn are represented by a membership function. The membership function is the representation of the magnitude of the participation of each input. Here we have used triangular membership function to represent three regions for each input variable.

The analysis is done for the serendipity component of the items. The resultant is output as in serendipity score of the item that shows the best serendipitous item for the user. This includes following modules:

Weights of various components i.e. novelty score, relevancy score and unexpectedness.

Serendipity score

Selection of items on the basis of threshold value of serendipity

Recommendation of the items

Step-3.3: Rule Evaluation operation

Once the input and output fuzzy sets and membership functions are constructed, the fuzzy if-then rules are then framed to reflect the relationships between any possible relation of input variables and the output variable.

The levels of input parameters defined in above step are used in the antecedent of rules and the serendipity level as the consequent of rules. Usually T-norm and T-conorm operators are used in the evaluation of antecedent and consequents respectively as given below:

T-norm operator is T min (a, b) = min (a, b) and

T-conorm operator is T max (a, b) = max (a, b)

The rules are formulated using Mamdani’s inference method (Mamdani, 1993). Rules formulated in the rule base of the fuzzy inference system (serendipity assessment module) are represented in the format as given in the previous section.

Step-3.4: Defuzzification operations on input parameters

After if-then rules are applied, the crisp output is calculated through a process called defuzzification. Defuzzification refers to the way a crisp value is extracted from a fuzzy set as a representative value. The most widely adopted method for defuzzifying a fuzzy set A of a universe of disclosure Z is centroid of area. This method is used to calculate the crisp value of output parameter context level.

Step-3.5: Final List of item selection

The final list of items will be recommended that are selected using all the component of serendipity computation and is sorted in decreasing order of serendipity score.

4 Experimental evaluation

To evaluate the accuracy of the list presented to user is in itself a challenging task which needs a tradeoff between two major evaluations metric: one is coverage and another is serendipity score [6]. They have focused on the quality impression perceived by the user that measures user satisfaction. In serendipity one of the key features is unexpectedness which boosts the user satisfaction [11]. Has discussed that providing the unexpected item increases the user satisfaction and apart from what they expect from the system. The evaluation metric to measure the prediction accuracy of the system is very frequently handled by many researchers. However, when it comes to evaluate the serendipitous recommendation [34] has provided unexpectedness, measured as the distance between the result produced by the method to be evaluated and the result produced by primitive prediction method. They have also taken into account the ranking in the presented list and evaluated the unexpectedness [35]. Discussed various researches around serendipity, e.g. various concept needed to generate serendipitous recommendation and evaluation that justify serendipity in the presented list.

To accomplish the evaluation different metric is used depending on the different measuring parameter. To evaluate the accuracy of the presented list Mean Absolute Error (MAE) and Root mean square error (RMSE) is used. Precision and recall are used as a measure to evaluate the effects of recommendation system. Also, the combination of the precision and recall are used to get the F-measure. The unexpectedness result informs about the serendipitous behavior of the items. In order to evaluate the system’s performance a prototype of the system is developed using Java; to develop the framework whereas JADE (Java Agent Development Environment); to provide the multi-agent environment and MySql; repository of the system.

4.1 Dataset

To evaluate the system we have used tourism dataset of Delhi, India. This domain further includes four sub-domains as: restaurants, hotels, travel places and shopping places. These sub-domains have information related to the system. The information about restaurants includes restaurants name, address, their opening and closing time, average cost per person etc. For hotels, this information includes hotel name, their location, charges etc. For shopping places and travel places, it includes their name, location, opening and closing time etc. Various websites are used to get these details e.g. Zomato, MakeMyTrip, TripAdvisor, Delhi Tourism, ShopKhoj. The dataset contains 8857 restaurants, 1023 hotels, 139 places to shop and 115 places to visit for entertainment or tourist spot.

4.2 Evaluation metric

To measure the accuracy of the system various metric have been used in the literature that includes statistical accuracy metric like MAE and RMSE. These metric actually helps us to know about how well a system can predict the rating for a specific item.

MAE is the mean absolute error which is deviation from the actual rating and the predicted rating are commonly used in the evaluation and is formulated as: $MAE = \frac{\sum_{i = 1}^{n} | p_{i} - a_{i} |}{n}$ (15)

The MAE comparison is between the proposed approach i.e. FSCDRS and the traditional approach i.e. collaborative filtering (CF). We have compared the MAE by varying number of items and it has been found that the proposed system outperforms.

Root Mean Square Error (RMSE): It compares the predicted value and actual value that squares the average squared error and provides high weight to large errors. $RMSE = \sqrt{\frac{\sum_{i = 1}^{n} (p_{i} - a_{i})^{2}}{n}}$ (16)

Again the RMSE comparison helps to get the difference between the two methods and again the proposed system outperforms.

User Specific Unexpectedness: Unexpectedness is the measure that can decide the ‘surprise me’ component in the presented recommendation. It measures how different the recommendations are, as compared to the items rated by the user to get the novelty score of the items. ${USU}_{u} (\tilde{I}) = \frac{1}{| \tilde{I} | | R_{u} |} \sum_{i_{j} \in \tilde{I}} \sum_{i_{k} \in \tilde{I}} sim (i_{j}, i_{k})$ (17)

Where,

$\tilde{I}$ is the set of recommended items and R_u is the set of items rated by the user u.

The unexpectedness is computed for varying top-N items. This shows that as the top-N items increases the proposed system exhibit a better ‘surprise me’ component.

Precision: Precision measures the rate of the item the user like therefore consumed. Precision@K reflects the fraction of relevant items retrieved by a recommender system in the first K results. $Precision @ K = \frac{1}{| | U | |} \sum_{u \in U} \frac{| | {RS}_{u} (K) \cap {REL}_{u} | |}{K}$ (18)

Where, U is a set of users, while RS_u (K) is a set of top-K suggestions for user u. Recording from the test set for user u are represented by REL_u.

Recall: Recall is the number of consumed items in the recommendation list out of the total number of items the user consumed. $Recall @ K = \frac{1}{| | U | |} \sum_{u \in U} \frac{| | {RS}_{u} (K) \cap {REL}_{u} | |}{Total_I {tems}_{u}}$ (19)

F-Measure: F-measure is the measure that actually provides the balance between the two i.e. precision and recall. It is formulated as: $F - Measure = \frac{2 * Precision * Recall}{Precision + Recall}$ (20)

The comparison between CF and TFCDSRS has been done for Precision, Recall and F-measure when the top-N items variation. The comparison shows the changes when it varies the top-N item as 5, 12, 17, 22, 27 and 32. The outcome of the comparison shows that the proposed work outperforms as compared to the traditional approach.

5 Conclusion

In this paper, Trust and Fuzzy Inference based Cross Domain Serendipitous Item Recommendation is presented that incorporates Trust and Cross Domain approach (TFCDSRS). The main emphasis of this paper is to incorporate fuzzy based approach to generate serendipitous item. The reason for incorporating the fuzzy approach is to combine and balance each of the factors that are necessary for serendipity computation. It also exhibits the capability to deal with the problem of sparsity and cold start problem by the use of trust and cross domain approach. Serendipity recommendation is considered as the combination of various components i.e. novelty, relevancy and unexpectedness. At the same time each components has its effect while generating the recommendation. So, the presented approach uses fuzzy based computation to make a decision on the serendipity score of the item present in the list of recommendations. While narrowing down the neighbourhood for the target user, if the system encounters the sparsity problem, then it goes for cross domain computation. Trust computation shows the effort in dealing with the cold start and sparsity issues. The proposed system is implemented using multi-agent approach and the performance of the system is measured using various standards metric. The evaluation of the proposed system shows that the presented system outperforms as compared to the traditional approach of recommendation. Overall the proposed system is able to present serendipitous recommendation while maintaining the accuracy.

References

Schafer

J.B.

, Frankowski

, Herlocker

and Sen

, Collaborative filtering recommender systems, in SpringerBerlin Heidelberg (2007), 291–324.

Adomavicius

and Kwon

Y.O.

, Improving aggregate recommendation diversity using ranking-based techniques, IEEE Transactions on Knowledge and Data Engineering24(5) (2011), 896–911.

Burke

, Hybrid Recommender System: survey and experiments, User Modeling and User-Adapted Interaction Journal12(4) (2002), 331–370.

Burke

, Knowledge-based recommender systems, in Encyclopedia of Library and Information Systems (2000), 69.

Bridge

, Göker

M.H.

, McGinty

and Smyth

, Case-based recommender systems, The Knowledge Engineering Review20(3) (2005), 315–320.

, Delgado-Battenfeld

and Dietmar

, Beyond accuracy: evaluating recommender systems by coverage and serendipity, in fourth ACM conference on Recommender systems (2010), 257–260.

Szpektor

, Maarek

and Pelleg

, When relevance is not enough: promoting diversity and freshness in personalized question recommendation, in International conference on World Wide Web (2013), 1249–1260.

De Gemmis

, Lops

, Semeraro

and Musto

, An investigation on the serendipity problem in recommender systems, Information Processing & Management51(5) (2015), 695–717.

Kotkov

, Wang

and Veijalain

, A survey of serendipity in recommender systems, Knowledge-Based Systems111 (2016), 180–192.

10.

Yin

, Cui

, Li

, Yao

and Chen

, Challenging the long tail recommendation, arXiv preprint arXiv, (2012).

11.

Panagiotis

and Alexander

, On unexpectedness in recommender systems: Or how to expect the unexpected, in Workshop on Novelty and Diversity in Recommender Systems (2011).

12.

Anelli

V.W.

, Di Noia

, Di Sciascio

, Lops

and Joseph

, Moving from Item Rating to Features Relevance in Top-N Recommendation, IIR (2018).

13.

Cantador

, Fernández-Tobías

, Berkovsky

and Cremonesi

, Cross-domain recommender systems, in Recommender Systems Handbook.: Springer US (2015), 919–959.

14.

Khan

M.M.

and Ibrahim

, Cross Domain Recommender Systems: A Systematic Literature Review, ACM Computing Surveys (CSUR)50(3) (2017), 1–34.

15.

O’Donovan

and Smytha

, Trust in recommender systems, in 10th international conference on Intelligent user interfaces (2005), 167–174.

16.

Berkovsky

, Kuflik

and Ricci

, Mediation of user models for enhanced personalization in recommender systems, User Modeling and User-Adapted Interaction18(3) (2008), 245–286.

17.

Dell’Agnello

, Fanelli

A.M.

, Mencar

and Minervini

, Serendipitous fuzzy item recommendation with profilematcher, in International Workshop on Fuzzy Logic and Applications (2011), 220–227.

18.

Ricci

, Rokach

, Shapira

and Kantor

, Recommender Systems Handbook. New York, USA: Springer, (2011).

19.

Badrul

, Karypis

, Konstan

and Riedl

, Item-Based Collaborative Filtering Recommendation Algorithms, in Proceedings of the 10th international conference on World Wide Web (2001), 285–295.

20.

Linden

, Smith

, York

and com

Amazon.

, recommendations: Item-to-item collaborative filtering, IEEE Internet Computing7(1) (2003), 76–80.

21.

Menk

, Sebastia

and Ferreira

, Curumim: A serendipitous recommender system for tourism based on human curiosity, in IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI) (2017), 788–795.

22.

Vargas

and Castells

, Rank and relevance in novelty and diversity metrics for recommender systems, in fifth ACM conference on Recommender systems (2011), 109–116.

23.

Bedi

, Anjali Gautam

and Sharma

, Using novelty score of unseen items to handle popularity bias in recommender systems, in International Conference on Contemporary Computing and Informatics (IC3I), (2014), 934–939.

24.

Zhang

, The Definition of Novelty in Recommendation System, Journal of Engineering Science & Technology Review6(3) (2013).

25.

Kotkov

, Wang

and Veijalainen

, Improving Serendipity and Accuracy in Cross-Domain Recommender Systems, in International Conference on Web Information Systems and Technologies (2016), 105–119.

26.

Berkovsky

, Kuflik

and Ricci

, Distributed collaborative filtering with domain specialization, in ACM conference on Recommender systems (2007), 33–40.

27.

Richa and Punam

, Parallel proactive cross domain context aware recommender system, Journal of Intelligent & Fuzzy Systems34(3) (2018), 1521–1533.

28.

Richa and Punam

, Combining trust and reputation as user influence in cross domain group recommender system (CDGRS), Journal of Intelligent & Fuzzy Systems Preprint: 1–12.

29.

Kant

and Bharadwaj

K.K.

, Incorporating fuzzy trust in collaborative filtering based recommender systems, in International Conference on Swarm, Evolutionary, and Memetic Computing (2011), 433–440.

30.

Kaminskas

and Bridge

, Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems, ACM Transactions on Interactive Intelligent Systems (TiiS)7(1) (2016), 1–42.

31.

Bedi

and Agarwal

, SRPRS: Situation-Aware Reputation Based Proactive Recommender System, Journal of Information Assurance & Security8(4) (2013), 220–229.

32.

Dubois

and Prade

, Fundamentals of fuzzy sets, Springer Science & Business Media7 (2012).

33.

Bouma

, Normalized (pointwise) mutual information in collocation extraction, in Proceedings of GSCL (2009), 31–40.

34.

Murakami

, Mori

and Orihara

, Metrics for evaluating the serendipity of recommendation lists, in In Annual conference of the Japanese society for artificial intelligence (2007), 40–46.

35.

Silveira

, Zhang

, Lin

, Liu

and Ma

, How good your recommender system is? A survey on evaluations in recommendation, International Journal of Machine Learning and Cybernetics10(5) (2019), 813–831.