An improved similarity measure for collaborative filtering-based recommendation system

Abstract

Collaborative filtering (CF), a representative algorithm of recommendation systems, is a method of using information of the neighbors of active user. The main idea of CF is that users who agreed in the ratings of certain items are likely to agree again in new items. The degree to which the two users’ tendencies in the ratings of the co-rated items are consistent is measured using a similarity measure. Therefore, the similarity measure in CF plays a key role in the extraction of the representative neighbors. Studies on the improvement of similarity indicators for selecting representative neighbors are still ongoing. Recently, a new similarity measure, named OS, was proposed to enhance the recommendation performance by utilizing mathematical equations, such as the integral equation, system of linear differential equations, and non-linear systems. This study aims to understand the limitations of OS and overcome these limitations using the proposed method. In the proposed method, a sigmoid function was used to reflect preferences, such as the positive or negative sentiment of user ratings. In addition, to consider the absolute score difference, some of the formulas were modified, and finally, the performance improvement of the recommendation system was proved through experiments.

Keywords

Recommendation systems collaborative filtering similarity measure neighborhood based CF

ï»¿

1. Introduction

The importance of recommendation systems in e-commerce and content platform companies overflowing numerous products, news, and information is increasing [1]. Many companies have introduced a recommendation system that efficiently provides a list of recommended products or contents to their customers among a wide range of products or contents to increase user satisfaction and the purchase or service subscription rate [2].

Most recommendation systems analyze historical users’ purchase or rating data to generate a list of recommended items that a user has a high probability of purchasing, or is expected to give a high rating score among items that the user has not yet purchased or evaluated [3]. Collaborative filtering and content-based filtering are very popular for building recommendation systems.

Collaborative filtering recommends items based on the interests of similar users of a specific user or items based on the similarity between items using user ratings [4]. In general, collaborative filtering utilizes data that contain a set of items and a set of users who have evaluated some of the times, and it shows good recommendation performance without complicated calculations. In contrast, content-based filtering uses item features to recommend items similar to what the user likes based on user ratings [5]. Since it uses the characteristics of items for recommendation, it is possible to recommend a new item to a user, which is impossible in collaborative filtering.

Collaborative filtering includes memory-based,model-based, and hybrid memory- and model-based approaches. Among them, memory-based methods utilize the similarity between users or items based on user ratings for recommendations. The performance of recommendation by the memory-based methods is highly affected by a similarity measure.

To measure the similarity between users or items in memory-based collaborative filtering algorithms, traditional similarity measures such as cosine (COS), Pearson correlation coefficient (PCC), mean squared difference (MSD), and Jaccard (JAC) measures have been widely used. However, they have certain limitations. COS and PCC may fail to obtain reliable similarity values between two users when the number of common items evaluated by the two users is small and they mainly focus on the directions of the rating vectors but ignore their lengths. MSD ignores the number of co-rated items and JAC ignores the rating values in the similarity calculation. Hence, many studies have proposed new similarity measures for collaborative filtering to improve traditional ones.

Recently, [6] proposed a new similarity measure, named OS, by transforming some intuitive and qualitative conditions that should be satisfied by the similarity measure into relevant mathematical equations, such as the integral equation, system of linear differential equations, and non-linear systems. OS consists of two parts: (1) percentage of non-common ratings (PNCR) which takes into account the number of co-rated items, and (2) absolute difference of ratings (ADF) which uses elementary similarity expression for all items rated by two users. PNCR uses an exponential function to reduce the range of similarity values that change sensitively according to the number of common evaluation items, which is one of the limitations of JAC. In addition, ADF also uses an exponential function and calculates the relative difference according to the rating value instead of the absolute difference in ratings between two users, unlike MSD. As a result, OS showed better performance than the other similarity measures.

Although OS addresses the limitations of traditional similarity measures, it also has some limitations. First, the ADF value when both ratings are small is larger than that when both ratings are high because the relative difference of the two ratings in ADF is obtained by dividing the absolute difference between the two ratings by the larger value of the two ratings. Second, OS does not consider whether the sentiments of the two ratings (positive or negative) coincide. Even if the absolute difference of ratings is the same, it might be better to have a larger difference when the sentiments of the two ratings do not match than when they match.

In this study, we propose a new similarity measure to overcome the limitations of OS. The proposed similarity measure reflects whether the sentiments of the two ratings coincide to calculate the relative difference between them. The idea behind this approach is that it should be considered as having a larger difference if the sentiment of the ratings by two users is different than if the sentiment is the same, even if the difference between the ratings is the same. To reflect this idea in the new similarity measure, a function using a sigmoid function to transform the ratings of items was proposed in this study. In addition, instead of the maximum value of the ratings of two users for the same item, the user-defined constant is used in ADF to address the issue that ADF does not treat the difference between low ratings and the difference between high ratings equally.

In subsequent sections, first, a literature review that includes some limitations of traditional and advanced similarity measures to enhance the performance of recommendations is presented. Second, the proposed similarity measure to address the limitations of OS and improve the recommendation performance is proposed with a brief explanation of OS. Third, the experimental procedure is illustrated and then the results are presented. Finally, this paper concludes with a discussion of the study’s limitations and future research directions.

2. Related work

In memory-based collaborative filtering, the set of recommended items for a specific user is determined based on the similarity between the users or items. User-based collaborative filtering (UBCF) identifies users that are similar to the queried user based on the similarity between users, whereas item-based collaborative filtering is based on the similarity between items calculated using people’s ratings of those items. In UBCF, the $k$ -nearest neighbors from the candidate neighbors comprised from all users are first selected, and then the ratings of unrated items are predicted as follows:

$\displaystyle\hat{r}_{ui}=\bar{r}_{u}+\frac{\sum_{v\in\textit{NN}(u)}\textit{% sim}(u,v)(r_{vi}-\bar{r}_{v})}{\sum_{v\in\textit{NN}(u)|\textit{sim}(u,v)|}}$ (1)

where $\textit{sim}(u,v)$ is the similarity between users, $u$ and $v$ , $r_{ui}$ is the rating of item $i$ rated by user $u$ , $\bar{r}_{u}$ is the mean rating of user $u$ , and $\textit{NN}(u)$ represents the $k$ -nearest neighbors of user $u$ . In general, $\textit{NN}(u)$ is determined by selecting the top $k$ $v$ after sorting the $\textit{sim}(u,v)$ values in the descending order for all $v$ , except for $u$ . In CF, the unknown rating of item $i$ by user $u$ is determined by the ratings of item $i$ by the $k$ -nearest users of user $u$ . The similarity between users $\textit{sim}(u,v)$ is used to find the $k$ -nearest users $(\textit{NN}(u))$ and compute the predicted rating as the weighting factor.

Several traditional similarity measures have been used to calculate $\textit{sim}(u,v)$ . COS measures the angle between two rated vectors where a smaller angle indicates greater similarity as follows [7]:

$\displaystyle\textit{sim}(u,v)^{\textit{COS}}\!=\!\!\frac{\sum_{i\in I_{u}\cap I% _{v}}r_{ui}\cdot r_{vi}}{\sqrt{\sum_{i\in I_{u}\cap I_{v}}\!r_{ui}^{2}}\sqrt{% \sum_{i\in I_{u}\cap I_{v}}\!r_{vi}^{2}}}$ (2)

where $I_{u}$ denotes the set of items rated by user $u$ . The range of cosine similarity is 0 to 1. COS cannot take into account the difference in rating scales of different users, and it has the problem that when the number of co-rated items is 1, it is always 1.

PCC estimates the similarity between users as the ratio of the cross product of overrating or underrating of means divided by the product of the sum of squares of the mean rating difference as follows [8]:

$\displaystyle\textit{sim}(u,v)^{\textit{PCC}}=$ (3) $\displaystyle\frac{\sum_{i\in I_{u}\cap I_{v}}(r_{ui}-\bar{r}_{u})(r_{vi}-\bar% {r}_{v})}{\sqrt{\sum_{i\in I_{u}\cap I_{v}}(r_{ui}-\bar{r}_{u})^{2}}\sqrt{\sum% _{i\in I_{u}\cap I_{v}}(r_{vi}-\bar{r}_{v})^{2}}}$

Unlike COS, the range of cosine similarity is $-$ 1 to 1 where $-$ 1 indicates a negative correlation between users, 0 indicates a neural correlation between users and $+$ 1 indicates a positive correlation between users. Similar to COS, PCC is always 1 when the number of co-rated items is 1 and cannot consider the difference in rating scales of different users.

JAC mainly focuses on the number of co-rated items of two users and is defined as follows [6]:

$\displaystyle\textit{sim}(u,v)^{\textit{JAC}}=\frac{|I_{u}\cap I_{v}|}{|I_{u}% \cup I_{v}|}$ (4)

where $|\cdot|$ represents the size of a set. JAC only considers the ratio of the cardinality of co-rated items to the cardinality of all items rated by both users; therefore, it cannot reflect the ratings.

MSD is calculated by the ratio of the sum of squares of the difference of ratings on co-rated items to the cardinality of co-rated items as follows [9]:

$\displaystyle\textit{sim}(u,v)^{\textit{MSD}}=1-\frac{\sum_{i\in I_{u}\cap I_{% v}}(r_{ui}-r_{vi})^{2}}{|I_{u}\cap I_{v}|}$ (5)

Unlike JAC, MSD does not consider the number of co-rated items.

Several studies have been steadily conducted to overcome the limitations of the traditional similarity measures.

[10] proposed the new similarity measures that combine the balance factor with the traditional similarity measures such as the adjusted cosine (ACOS) [11] and PCC measures. The balance factor takes the differences in users’ rating scales into account in the user similarity calculation to compensate for the shortcoming of the traditional similarity calculation method, which is defined as follows:

$\displaystyle w(u,v)=\lambda^{\sqrt{\frac{\sum_{i\in I_{u}\cap I_{v}}(r_{ui}-r% _{vi})^{2}}{|I_{u}\cap I_{v}|}}},0<\lambda<1$ (6)

where $\lambda$ is the weight index of the balance factor. If the difference in the ratings of the co-rated items between two users is small, $w(u,v)$ increases, which implies that a large weight is applied to the similarity between them. The new similarity measures in [10] were calculated by multiplying the balance factor by ACOS or PCC.

[12] proposed the modified version of PCC by replacing the average rating with an absolute reference as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{CPCC}}$ $\displaystyle\quad=\frac{\sum_{i\in I_{u}\cap I_{v}}(r_{ui}-r_{\textit{med}})(% r_{vi}-r_{\textit{med}})}{\makecell{\sqrt{\sum_{i\in I_{u}\cap I_{v}}(r_{ui}-r% _{\textit{med}})^{2}}}}$ (7)

where $r_{\textit{med}}$ is the absolute reference, which is usually set as the median value in the rating scale (e.g., 3 on a rating scale of 5). The main drawback of CPCC is its poor performance for sparse data.

[13] proposed sigmoid function-based Pearson correlation coefficient (SPCC) to enhance the performance for sparse data by applying an weight calculated from the number of co-rated items as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{SPCC}}=\textit{sim}(u,v)^{\textit{PCC}% }\cdot\frac{1}{1+\exp\left(-\frac{|I_{u}\cap I_{v}|}{2}\right)}$ (8)

When the number of co-rated items between two users is small, the similarity value decreases as the weight factor has a small value.

[14] developed the modified version of JAC, named relevant Jaccard (RJAC) to solve the problem of the traditional similarity measures that they sometimes select users who does not evaluate the target item as k-nearest neighbors and RJAC is defined as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{RJAC}}\!=\!\frac{1}{1\!+\!\frac{1}{|I_% {u}\cap I_{v}|}\!+\!\frac{|\bar{I}_{u}|}{1+|\bar{I}_{u}|}\!+\!\frac{1}{1+|\bar% {I}_{v}|}}$ (9)

where $\bar{I}_{u}$ is the set of items in $I_{u}$ that are not in $I_{u}\cap I_{v}$ .

In addition, several studies have combined several similarity measures to compensate for the limitations of similarity measures. [15] proposed a method that combined JAC and MSD (JMSD), in which JAC is used to capture the proportion of the co-rated items and MSD is used to obtain the information of ratings. JMSD is defined as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{JMSD}}=\textit{sim}(u,v)^{\textit{JAC}% }\cdot\textit{sim}(u,v)^{\textit{MSD}}$ (10)

[16] proposed a new similarity measure of triangle multiplying Jaccard (TMJ) which combines triangle similarity (TRI) and JAC to improve recommendation accuracy where the triangle similarity considers both the length and the angle of rating vectors between them, defined as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{TRI}}=1-$ (11) $\displaystyle\quad\frac{\sqrt{\sum_{i\in I_{u}\cap I_{v}}(r_{ui}-r_{vi})^{2}}}% {\sqrt{\sum_{i\in I_{u}\cap I_{v}}r_{ui}^{2}}+\sqrt{\sum_{i\in I_{u}\cap I_{v}% }r_{vi}^{2}}}$

Because $\textit{sim}(u,v)^{\textit{TRI}}$ considers both the length of vectors and the angle between them, it is more reasonable similarity measure compared with COS. However, TRI only considers the co-rated items. To address this drawback, TMJ combines TRI with JAC through multiplication as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{TMJ}}=\textit{sim}(u,v)^{\textit{TRI}}% \cdot\textit{sim(}u,v)^{\textit{JAC}}$ (12)

[17] proposed a new similarity measure, named Cosine-Jaccard-Mean of Divergence (CjacMD), which combines COS, JAC, mean measure of divergence (MMD). In CjacMD, MMD is the most common and popular distance measure used for the computation of bio-distances based on non-metric traits, which is defined as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{MMD}}=$ (13) $\displaystyle\frac{1}{1+\frac{1}{|I_{u}\cap I_{v}|}\sum_{i\in I_{u}\cap I_{v}}% \{(\theta_{u}-\theta_{v})^{2}-\frac{1}{|I_{u}|}-\frac{1}{|I_{v}|}\}}$

where $\theta_{u}$ represents a vector in which each element represents the number of times that each rating value occurs in the rating vector of user $u$ . That is, MMD has a larger value as the number of the rated items increases while the rating distributions of two users are similar. Based on MMD, CjacMD is calculated as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{CjacMD}}=\textit{sim}(u,v)^{\textit{% COS}}$ (14) $\displaystyle\quad+\textit{sim}(u,v)^{\textit{JAC}}+\textit{sim}(u,v)^{\textit% {MMD}}$

In other words, CjacMD is defined as the sum of the three similarity values: COS, JAC, and MMD. In addition, COS, JAC, and MMD contribute equally to CjacMD, because they have values between 0 and 1.

[18] proposed a new similarity measure inspired by a physical resonance phenomenon, named resonance similarity (RES). RES consists of three different factors: consistency, distance, and Jarccard factors as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{RES}}=\sum_{i\in|I_{u}\cap I_{v}|}C(r_% {ui},r_{vi},k_{1})$ (15) $\displaystyle\quad\cdot D(r_{ui},r_{vi},k_{2},k_{3})\cdot J(u,v,k_{4})$

where $C(r_{ui},r_{uv},k_{1}),D(r_{ui},r_{uv},k_{2},k_{3}),$ and $J(u,v,$ $k_{4})$ represent the consistency, distance and Jaccard factors and parameters $k_{1},\ldots,k_{4}$ are learned from a dataset. The consistency factor is the measurement of two users’ rating consistency by modeling the users’ initial phase angles in a virtual resonance system, and the distance factor reflects the similarity of the users’ opinions on the same items. Finally, the Jaccard factor is quite similar to the traditional JAC, but it introduces parameter $k_{4}$ to sharpen the result.

[19] suggested an improved similarity measure, which takes three impact factors of similarity into account to minimize the deviation of similarity calculation. Moreover, a new similarity measure that employs the information entropy of user ratings so that the user’s global rating behavior on items can be reflected was proposed in [20]. The entropy-weighted similarity measure (EW) provided in [20] introduces the weighting factor using entropy into traditional similarity measures, such as COS and COR. For example, entropy-weighted COS (COS ${}_{\textit{EW}}$ ) can be computed as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{COS}_{\textit{EW}}}=$ $\displaystyle\quad\frac{\sum_{i\in I_{u}\cap I_{v}}E(i)(r_{ui}\cdot r_{vi})}{% \sqrt{\sum_{i\in I_{u}\cap I_{v}}r_{ui}^{2}}\sqrt{\sum_{i\in I_{u}\cap I_{v}}r% _{vi}^{2}}}$ (16)

where $E(i)$ is the entropy for item $i$ , which is computed as follows:

$\displaystyle E(i)=-\sum_{r=r_{\text{min}}}^{r_{\text{max}}}p(r_{i}=r)\log_{2}% p(r_{i}=r)$ (17)

where the probability that the rating of item $i$ is $r$ , $p(r_{i}=r)$ , is calculated as follows:

$\displaystyle p(r_{i}=r)=\frac{\{u\in U|r_{ui}=r\}}{\{u\in U|r_{ui}\in[r_{% \text{min}},r_{\text{max}}]\}}$ (18)

$p(r_{i}=r)$ is the same as the ratio of ratings with value $r$ to total ratings. The key idea of the entropy-weighted similarity measure is that the similarity between two users with respect to an item with a high entropy of ratings should be estimated to be relatively higher than when low entropy is with the item.

3. Proposed similarity measure

Before explaining the proposed similarity measure, OS, which is the basis of the proposed similarity measure, is explained in detail. As explained in Section 1, OS is the multiplication of PNCR and ADF defined as follows [6]:

$\displaystyle\textit{sim}(u,v)^{\textit{PNCR}}=\exp\left(-\frac{N-|I_{u}\cap I% _{v}|}{N}\right)$ (19) $\displaystyle\textit{sim}(u,v)^{\textit{ADF}}\!=\!\frac{\sum_{i\in I_{u}\cap I% _{v}}\!\exp\!\left(\!\!-\frac{|r_{ui}-r_{vi}|}{\text{max}(r_{ui},r_{vi})}\!% \right)}{|I_{u}\cap I_{v}|}$ (20)

where $N$ denotes the total number of items in a dataset. PNCR has a large value when the number of co-rated items between two users is large, and ADF has a large value when the ratings of two users are similar. Unlike MSD, which uses the squared difference in ratings, ADF transforms the absolute difference in ratings by using an exponential function, which causes a sharp decrease in similarity as the absolute difference in ratings increases.

In addition, ADF divides the absolute difference by the maximum value of two ratings. Hence, the ADF value when $r_{\textit{ui}}=5$ and $r_{\textit{vi}}=4$ is greater than the ADF value when $r_{\textit{ui}}=2$ and $r_{\textit{vi}}=1$ , even if the absolute difference is the same. Even if OS is obtained by transforming some intuitive and qualitative conditions into relevant mathematical equations, such as the integral equation, the linear system of differential equations, and a non-linear system, the result of the transformation is not intuitive.

To overcome this limitation of OS, this study suggests considering the sentiments of ratings rather than using their maximum value when calculating the relative difference in the ratings used in ADF. The underlying assumption of this suggestion is that the relative difference when one rating is positive and the other is negative is greater than the relative difference when both ratings are positive or negative if the absolute difference is the same.

To reflect this assumption in the relative difference, the original rating is first transformed as follows:

$\displaystyle r_{ui}^{\prime}=\frac{1}{1+\exp\left(-a\cdot\frac{r_{ui}-r_{% \textit{med}}}{r_{\text{max}}}\right)}$ (21)

where $r_{\text{max}}$ is the maximum value in the rating scale and $r_{\textit{med}}$ is the reference rating to divide the positive and negative ratings, which is set as the median value in the rating scale. Figure 1 shows the relationship between the original ratings and those converted by the sigmoid function when a $=$ 1, $r_{\textit{med}}=3$ , and $r_{\text{max}}=5$ on a rating scale of 5. When the original ratings are close to $r_{\textit{med}}$ , the difference in the converted ratings sharply increases compared with when the original ratings are far from $r_{\textit{med}}$ . Using the converted ratings, it is possible to assign a large value to the relative difference when the sentiments of the two ratings differ, as illustrated in Fig. 1.

Figure 1.

Sigmoid transformation of ratings.

Using the converted ratings, the modified version of ADF (MADF) is defined as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{MADF}}\!=\!\frac{\sum_{i\in I_{u}\cap I% _{v}}\!\exp\!\left(\!-\frac{|r_{ui}^{\prime}-r_{vi}^{\prime}|}{z}\!\right)}{|I% _{u}\cap I_{v}|}$ (22)

where $z$ is the scale parameter. Unlike ADF, MADF does not use the maximum value of the two ratings, which is used to obtain the relative difference in ADF; therefore, the asymmetry according to the magnitude of ratings disappears in MADF. The proposed similarity measure, named ‘OS on the ratings converted by the sigmoid function (SIGOS)’, is defined as follows:

$\displaystyle\textit{sim}(u,v)^{\textit{SIGOS}}=\textit{sim}(u,v)^{\textit{% PCNR}}\cdot\textit{sim}(u,v)^{\textit{MADF}}$ (23)

4. Experimental procedure

In our experimental study, four datasets were utilized: MovieLens100K, MovieLens1M, Yahoo-Music and FilmTrust. MovieLens100K and MovieLens1M were obtained from GroupLens Research website1 and Yahoo-Music data was obtained from Yahoo Research by request.2 FilmTrust data are open to anyone through the Harvard Dataverse.3 The properties of the four datasets in terms of number of users, number of items, and density are summarized in Table 1.

For each experiment, 5-fold cross-validation for the evaluation was applied for comparison, which implies that 80% of the data were used for training and the rest were used for testing for each fold and this was repeated five times for each dataset. Each user had at least 20 ratings for all datasets; therefore, all users could be included in the training and test sets. In the training process, the similarity values between users were calculated to determine the $k$ -nearest neighbor users of each user. $k$ is selected among $\{10,20,$ … $,90,100\}$ . Next, in the testing phase, the ratings of unrated items for each user were calculated.

In this study, SIGOS was compared with OS and other traditional or recently developed similarity measures such as COS, PCC, TMJ, and RJAC. For SIGOS, $z$ was set to 1 to ensure a sufficient range in the converted ratings between 0 and 1, and $a$ was selected among $\{1,2,3,4,5\}$ . The overall experimental procedure is illustrated in Fig. 2. All the similarity measures and collaborative filtering processes were implemented in Python.

Table 1
Properties of datasets for experiments

Dataset	Users	Items	Ratings	Density
MovieLens100K	943	1,682	100,000	6.3%
MovieLens1M	6,040	3,706	1,000,209	4.4%
Yahoo-Music	15,400	1,000	354,200	2.3%
FilmTrust	1,508	2,071	35,497	1.1%

Figure 2.

Experimental procedure.

For the evaluation of the predicted ratings and recommendation performance, mean absolute error (MAE) and F1 were used. MAE is a representative metric for evaluating the prediction accuracy and measures the average difference between the predicted ratings and their corresponding actual ratings by users. MAE is defined as follows:

$\displaystyle\text{MAE}=\frac{1}{|U|}\sum_{u\in U}\frac{\sum_{i\in I_{u,% \textit{test}}}|r_{\textit{ui}}-\hat{r}_{\textit{ui}}|}{|I_{u,\textit{test}}|}$ (24)

In addition, F1, which is the harmonic mean of precision and recall, measures the performance of the recommendations. In this study, the set of items that are actually relevant to the user ( $R_{\textit{true}}$ ) and the set of items that are predicted to be relevant to users ( $R_{\textit{pred}}$ ) are defined as follows:

$\displaystyle R_{\textit{true}}=\{i|r_{u,i}>\bar{r}_{u}\}$ (25)

$\displaystyle R_{\textit{pred}}=\{i|\hat{r}_{u,i}>\bar{r}_{u}\}$ (26)

In other words, items with higher predicted ratings than the average rating of the training set were finally determined as the recommended items. Then, precision and recall are defined as follows:

$\displaystyle\textit{Precision}=\frac{|\{i|i\in R_{\textit{true}}\text{ and }i% \in R_{\textit{pred}}\}|}{|R_{\textit{pred}}|}$ (27)

$\displaystyle\textit{Recall}=\frac{|\{i|i\in R_{\textit{true}}\text{ and }i\in R% _{\textit{pred}}\}|}{|R_{\textit{true}}|}$ (28)

Using precision and recall, F1 is defined as follows:

$\displaystyle\text{F1}=\frac{2\cdot\textit{Precision}\cdot\textit{Recall}}{% \textit{Precision}+\textit{Recall}}$ (29)

After predicting the ratings of items in each validation set, MAE and F1 were computed using the predicted ratings. Then, the average values of MAE and F1 for the five validation sets were calculated and these average values were used to compare the performance of SIGOS with that of other similarity measures.

Table 2

Evaluation results using different $a$ corresponding to MovieLens100K

$k$	MAE					F1
$k$	$a$
	1	2	3	4	5		1	2	3	4	5
10	0.9011	0.8480	0.8234	0.8106	0.8052		0.6586	0.6763	0.6850	0.6894	0.6892
20	0.8678	0.8288	0.8064	0.7934	0.7863		0.6615	0.6743	0.6832	0.6880	0.6903
30	0.8446	0.8086	0.7930	0.7835	0.7783		0.6632	0.6774	0.6838	0.6880	0.6901
40	0.8242	0.7935	0.7816	0.7749	0.7710		0.6672	0.6811	0.6862	0.6904	0.6912
50	0.8100	0.7843	0.7737	0.7678	0.7653		0.6692	0.6828	0.6873	0.6900	0.6910
60	0.7964	0.7764	0.7669	0.7624	0.7608		0.6714	0.6846	0.6883	0.6910	0.6922
70	0.7863	0.7691	0.7614	0.7584	0.7570		0.6749	0.6858	0.6898	0.6908	0.6924
80	0.7770	0.7639	0.7576	0.7555	0.7542		0.6782	0.6873	0.6901	0.6927	0.6933
90	0.7714	0.7601	0.7549	0.7533	0.7525		0.6794	0.6884	0.6909	0.6930	0.6939
100	0.7672	0.7572	0.7533	0.7517	0.7512		0.6817	0.6886	0.6916	0.6926	0.6941

Table 3

Evaluation results using different $a$ corresponding to MovieLens1M

$k$	MAE					F1
$k$	$a$
	1	2	3	4	5		1	2	3	4	5
10	0.8844	0.8257	0.7931	0.7821	0.7785		0.6880	0.6955	0.7000	0.7033	0.7046
20	0.8752	0.8194	0.7909	0.7743	0.7668		0.6797	0.6923	0.6986	0.7040	0.7071
30	0.8676	0.8128	0.7883	0.7724	0.7632		0.6770	0.6911	0.6976	0.7036	0.7070
40	0.8619	0.8082	0.7863	0.7712	0.7613		0.6745	0.6902	0.6971	0.7026	0.7064
50	0.8566	0.8042	0.7838	0.7701	0.7606		0.6728	0.6895	0.6965	0.7024	0.7058
60	0.8516	0.8008	0.7810	0.7690	0.7597		0.6708	0.6887	0.6967	0.7021	0.7052
70	0.8462	0.7970	0.7786	0.7673	0.7588		0.6694	0.6882	0.6967	0.7017	0.7048
80	0.8415	0.7939	0.7762	0.7655	0.7577		0.6689	0.6876	0.6965	0.7016	0.7044
90	0.8370	0.7914	0.7739	0.7637	0.7565		0.6680	0.6870	0.6966	0.7015	0.7044
100	0.8328	0.7890	0.7721	0.7623	0.7553		0.6674	0.6865	0.6965	0.7012	0.7043

Table 4

Evaluation results using different $a$ corresponding to Yahoo-Music

$k$	MAE					F1
$k$	$a$
	1	2	3	4	5		1	2	3	4	5
10	1.1037	1.0973	1.0922	1.0848	1.0793		0.6476	0.6470	0.6450	0.6439	0.6411
20	1.0864	1.0818	1.0773	1.0728	1.0689		0.6430	0.6421	0.6399	0.6378	0.6365
30	1.0771	1.0726	1.0698	1.0649	1.0602		0.6387	0.6388	0.6365	0.6360	0.6353
40	1.0715	1.0667	1.0632	1.0603	1.0554		0.6352	0.6357	0.6337	0.6328	0.6331
50	1.0669	1.0621	1.0589	1.0560	1.0523		0.6326	0.6335	0.6318	0.6305	0.6316
60	1.0627	1.0581	1.0556	1.0523	1.0494		0.6315	0.6316	0.6308	0.6296	0.6298
70	1.0582	1.0540	1.0514	1.0492	1.0462		0.6305	0.6299	0.6298	0.6278	0.6282
80	1.0540	1.0510	1.0485	1.0459	1.0433		0.6295	0.6290	0.6286	0.6272	0.6272
90	1.0506	1.0481	1.0459	1.0429	1.0402		0.6282	0.6277	0.6275	0.6268	0.6265
100	1.0472	1.0452	1.0434	1.0406	1.0374		0.6273	0.6267	0.6261	0.6255	0.6257

Table 5

Evaluation results using different $a$ corresponding to FilmTrust

$k$	MAE					F1
$k$	$a$
	1	2	3	4	5		1	2	3	4	5
10	0.6968	0.6946	0.6917	0.6802	0.6669		0.5975	0.5945	0.5943	0.5916	0.5840
20	0.6949	0.6946	0.6868	0.6767	0.6637		0.5740	0.5714	0.5740	0.5690	0.5658
30	0.6919	0.6872	0.6802	0.6669	0.6552		0.5644	0.5642	0.5657	0.5614	0.5585
40	0.6841	0.6774	0.6663	0.6541	0.6413		0.5603	0.5591	0.5593	0.5559	0.5633
50	0.6748	0.6646	0.6522	0.6382	0.6291		0.5534	0.5567	0.5576	0.5617	0.5647
60	0.6613	0.6484	0.6376	0.6274	0.6199		0.5541	0.5562	0.5594	0.5617	0.5658
70	0.6473	0.6356	0.6257	0.6187	0.6126		0.5585	0.5613	0.5667	0.5692	0.5730
80	0.6372	0.6270	0.6190	0.6133	0.6091		0.5582	0.5645	0.5667	0.5716	0.5755
90	0.6306	0.6211	0.6148	0.6094	0.6068		0.5564	0.5634	0.5683	0.5739	0.5776
100	0.6251	0.6164	0.6114	0.6079	0.6061		0.5585	0.5674	0.5729	0.5765	0.5821

Figure 3.

Comparison results by MAE.

Figure 4.

Comparison results by F1.

5. Results

Before comparing SIGOS with the other similarity measures, the optimal $a$ was determined. Tables 2–5 show the evaluation results of SIGOS for each dataset. For MovieLens100K and MovieLens1M, the best prediction performance was achieved when $a$ was 5, regardless of the evaluation metrics. However, in the case of Yahoo-Music and FilmTrust, the lowest MAE was obtained with a $=$ 5, whereas the highest F1 was obtained with a $=$ 1.

As $a$ increases, the difference between the positive and negative ratings affects the similarity between users more. In other words, the difference in ratings with the same sentiment is considered to be smaller than that in ratings with other sentiments when $a$ is large. Therefore, it is helpful not to seriously consider the sentiment of the ratings to obtain high-quality recommendations for Yahoo-Music and FilmTrust. Yahoo-Music and FilmTrust have lower densities than MovieLens100K and MovieLens1M, implying that the number of co-rated items evaluated by the two users is more likely to be larger in MovieLens100K and MovieLens1M than in Yahoo-Music and FilmTrust. Considering the characteristics of the datasets and the evaluation results depending on $a$ , it is better to set $a$ to achieve a high F1, when the number of co-rated items is not large enough to reliably measure the similarity between users.

Next, Figs 3 and 4 show the evaluation results obtained by MAE and F1 for each dataset, varying with the number of nearest neighbors. For comparison, $a$ of SIGOS was set to the best value based on the evaluation results shown in Tables 2–5. In the case of MovieLens100K, SIGOS shows the smallest MAE and highest F1 values, regardless of $k$ , which implies that SIGOS outperforms the other similarity measures. In the case of MovieLens1M, SIGOS showed smaller MAE values than the other similarity measures, except for TMJ. Compared with TMJ, SIGOS performs slightly worse than TMJ when $k$ is large. However, when $k$ is small, TMJ exhibits much poorer performance than SIGOS. In terms of F1, SIGOS achieved the best performance for MovieLens1M similar to MovieLens100K. In the case of Yahoo-Music, it was observed that SIGOS was superior to the other measures in terms of MAE, except for RJAC with $k>$ 30. In terms of F1, SIGOS and OS showed similar results and they outperformed the other methods. Finally, for FilmTrust, SIGOS shows the lower MAE than OS, COS, PCC, and RJAC regarless of $k$ , but it shows the lower MAE than TMJ when $k$ is greater than 60. In the case of F1, SIGOS shows relatively poor performance for FilmTrust compared with the results for the other datasets. However, SIGOS achieves higher F1 values than OS when $k$ is greater than 20.

6. Conclusion

This study proposes a new similarity measure that enhances the recently developed similarity measure, OS for recommendation systems. In this study, SIGOS was proposed to consider whether the sentiments of ratings from two different users match in the calculation of the relative difference in ratings. SIGOS considers that the relative difference in ratings is larger when the sentiments of ratings are not the same than when the sentiments of ratings are identical.

The experimental results on four datasets, MovieLens100K, MovieLens1M, Yahoo-Music and FilmTrust, showed that SIGOS generally outperformed OS for UBCF in terms of the prediction accuracy evaluated by MAE and F1. The proposed similarity is obtained from the assumption that resolving the asymmetry according to the magnitude of the ratings will improve the performance of OS, while considering whether or not the sentiment of the ratings agrees. Comparing SIGOS with OS, the prediction accuracy for the ratings was significantly improved in terms of MAE. In particular, the performance improvement in MAE was remarkable for MovieLens1M and FilmTrust datasets. In terms of F1, SIGOS showed better recommended performance than OS, except for Yahoo-music dataset. These results imply that the shortcomings of OS raised in this study degrade the recommendation performance, and the hypothesis that the recommendation performance will be improved if this shortcoming is eliminated is true. Moreover, SIGOS exhibits the excellent performance when $k$ is small in terms of F1. This means that SIGOS evaluates the similarity with other users well even for users who do not provide ratings for a few items, compared with the other similarity measures.

Moreover, the performance of SIGOS was improved as the value of $a$ increased. As the value of $a$ increases, the difference between negative ratings (1 or 2 on a rating scale of 5) and positive ratings (4 or 5 on a rating scale of 5) on the converted rating scale increases. This implies that the similarity between two users decreases significantly when the sentiments of their ratings for the same item do not coincide. In other words, it can be seen that considering not only the difference in the rating values but also the matching of sentiments of ratings is very important in measuring the similarity between users for recommendations.

This study also has some limitations. SIGOS does not consider users’ rating behaviors. The thresholds for dividing positive and negative ratings by users may differ. Therefore, we will improve SIGOS to reflect users’ rating behaviors. In addition, we will investigate the performance of SIGOS for other evaluation metrics, such as the normalized discounted cumulative gain in future work.

Footnotes

https://grouplens.org/datasets/movielens/.

https://webscope.sandbox.yahoo.com/.

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/AKVGJ9.

Acknowledgments

This work was supported by the NationNR (NRF) grant funded by the Korea government (MSIT) (No. 2020R1F1A1054496).

References

Hwangbo

Kim

Cha

. Recommendation systemdevelopment for fashion retail e-commerc. Electronic Commerce Research and Applications. 2018; 28: 94-101. doi: 10.1016/j.elerap.2018.01.012. https//www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S1567422318300152.

Barkan

Koenigstein

. ITEM2VEC: Neural item embedding for collaborative filtering. In: IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). 2016; 1-6. doi: 10.1109/MLSP.2016.7738886.

Acilar

Arslan

. A collaborative filtering method based on artificial immune network. Expert Systems with Applications. 2009; 36(4): 8324-8332. doi: 10.1016/j.eswa.2008.10.029. https//www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S0957417408007100.

Gupta

Gadge

. Performance analysis of recommendation system based on collaborative filtering and demographics. In:International Conference on Communication, Information Computing Technology (ICCICT). 2015. pp. 1-6. doi: 10.1109/ICCICT.2015.7045675.

Basilico

Hofmann

. Unifying collaborative and content-based filtering. In:Proceedings of the Twenty-First International Conference on Machine Learning, ICML. Association for Computing Machinery. New York, NY, USA. 2004. p. 9. ISBN 1581138385. doi: 10.1145/1015330.1015394.

Gazdar

Hidri

. A new similarity measure for collaborative filtering based recommender systems. Knowledge-Based Systems. 2020; 188: 105058. doi: 10.1016/j.knosys.2019.105058.

Khoshgoftaar

. A survey of collaborative filtering techniques. Advances in Artificial Intelligence. 2009.

Sheugh

Alizadeh

. A note on pearson correlation coefficient as a metric of similarity in recommender system. In:AI Robotics (IRANOPEN). 2015; pp. 1-6. doi: 10.1109/RIOS.2015.7270736.

Bobadilla

Ortega

Hernando

. A collaborative filteringsimilarity measure based on singularities. Information Processing Management. 2012; 48(2): 204-217. doi: 10.1016/j.ipm.2011.03.007. https//www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S0306457311000409.

10.

Chen

. An improved collaborative recommendation algorithm based on optimized user similarity. The Journal of Supercomputing. 2016; 72(7): 565-2578. doi: 10.1007/s11227-015-1518-5.

11.

Sarwar

Karypis

Konstan

Riedl

. Item-based collaborative filtering recommendation algorithms. In:Proceedings of the 10th International Conference on World Wide Web WWW ’01 Association for Computing Machinery, New York, NY, USA. 2001; pp. 285-295. ISBN 1581133480. doi: 10.1145/371920.372071.

12.

Shardanand

Maes

. Social information filtering: algorithms for automating “word of mouth”. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI. ACM Press/Addison-Wesley Publishing Co., USA, 210 ISBN 0201847051. 1995. pp. 210-217. doi: 10.1145/223904.223931.

13.

Jamali

Ester

. TrustWalker: A random walk model for combining trust-based and item-based recommendation. In:Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09. Association for Computing Machinery, New York, NY, USA. ISBN 9781605584959. 2009; pp. 397-406. doi: 10.1145/1557019.1557067.

14.

Bag

Kumar

Tiwari

. An efficient recommendation generation using relevant Jaccard similarity. Information Sciences. 2019; 483: 53-64. doi: 10.1016/j.ins.2019.01.023.

15.

Bobadilla

Serradilla

Bernal

. A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems. 2010; 23(6): 520-528. doi: 10.1016/j.knosys.2010.03.009. https//www.sciencedirect:com/science/article/pii/S0950705110000444.

16.

Sun

Zhang

Dong

Zhang

Min

. Integrating triangle and jaccard similarities for recommendation. Plos One. 2017; 12(8): e0183570. doi: 10.1371/journal.pone.0183570.

17.

Suryakant Mahara

. A new similarity measure based on mean measure of divergence for collaborative filtering in sparse environment. Procedia Computer Science. 2016; 89: 450-456. doi: 10.1016/j.procs.2016.06.099. https//www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S1877050916311644.

18.

Tan

. An efficient similarity measure for user-based collaborative filtering recommender systems inspired by the physical resonance principle. IEEE Access. 2017; 5: 272211-27228. doi: 10.1109/ACCESS.2017.2778424.

19.

Feng

Fengs

Zhang

Peng

. An improved collaborative filtering method based on similarity. Plos One. 2018; 13(9): e0204003. doi: 10.1371/journal.pone.0204003.

20.

Lee

. Using entropy for similarity measures in collaborative filtering. Journal of Ambient Intelligence and Humanized Computing. 2020; 11(1): 363-374. doi: 10.1007/s12652-019-01226-0.

An improved similarity measure for collaborative filtering-based recommendation system

Abstract

Keywords

1. Introduction

2. Related work

Table 1 Properties of datasets for experiments

6. Conclusion

Footnotes

Acknowledgments

References

Table 1
Properties of datasets for experiments