Abstract
A recommendation system is based on the user and the items, providing appropriate items to the user and effectively helping the user to find items that may be of interest. The most commonly used recommendation method is collaborative filtering. However, in this case, the recommendation system will be injected with false data to create false ratings to push or nuke specific items. This will affect the user’s trust in the recommendation system. After all, it is important that the recommendation system provides a trusted recommendation item. Therefore, there are many algorithms for detecting attacks. In this article, it proposes a method to detect attacks based on the beta distribution. Different researchers in the past assumed that the attacker only attacked one target item in the user data. This research simulated an attacker attacking multiple target items in the experiment. The result showed a detection rate of more than 80%, and the false rate was within 16%.
Introduction
E-commerce has substantially changed the business environments by introducing applications that strive to provide a wide variety of products and information to consumers. However, information overload makes choosing the right products difficult for consumers. Fortunately, the development of data analysis techniques has enabled researchers to find efficient ways to help consumers to identify their preference by utilizing recommendation systems [25, 35].
With the prevalence of recommendation systems, people have become accustomed to recommendation lists, but because these systems usually collect ratings data from the public, some users may manipulate the ratings to promote or demote certain products. For example, many companies hire employees to monitor the ratings and comments posted on websites to emphasize their products’ strengths or to distinguish their products from those of their rivals. Rivals may also unscrupulously use false ratings, opinions, or comments to degrade their competitors’ products. As a result, these artificial manipulations may cause consumers to develop biased opinions on certain products, which can further affect their friends [17, 32]. Some famous examples have been publicized recently. Amazon updated its commodity evaluation system in order to maintain credibility and fight back false ratings [2]. As the world’s largest PC game trading platform, Steam was affected by false comments and ratings [22].
The artificial manipulation of recommendation systems based on collaborative filtering has been classified into shilling attacks and profile injection attacks [15]. Shilling attacks consist of two techniques: push attacks that promote a particular product by providing high ratings, and nuke attacks that deliberately provide low ratings. Both push and nuke attacks encourage the recommendation systems to recommend certain products, which influences the customers’ purchasing intention [15].
To cope with these types of malicious attacks, researcher use several algorithms to detect shilling attacks. They can be categorized as user-based and item-based methods [20]. User-based detection algorithms view user rating profiles as rater – item matrices to identify abnormal raters [15, 30]. These methods aim to identify malicious users, but not the items under attack. Therefore, the systems may know that some users are suspicious, but do not know which items are under attack. In contrast, item-based methods detect the attacked items according to an item – rater matrix. Bhaumik et al. [30] proposed an item-based method based on statistical process control (SPC) to calculate the upper and lower bounds of ratings for each item. Items with ratings above the upper bound or below the lower bound were identified as being attacked. However, this method requires a homogenous distribution and density of the rating data to produce the correct answer. As a result, SPC-based methods have to partition data into several categories according to the rating distributions. Detection accuracy is very sensitive to the appropriateness of the partition. To make things worse, categories without a sufficient amount of data have to be given up.
Lam and Riedl [33] proposed that the impact of attacks on the item-based is more robust than that on the user-based [24]. This present research presents a method for detecting an attacked item without dividing the data. This method is based on the beta distribution of the item ratings. The proposed method does not require any data partitioning and does not give up any category due to low data volume. The experiment results showed that the recall can exceed 80%, while keeping the false rate as low as 16%. For comparison, the detection rate of SPC can reach 80%, but the false rate can exceed 60% in some categories.
The remainder of this paper is organized as follows. Section 2 provides a review of the existing literature concerning malicious attacks on recommender systems. Section 3 describes the proposed method in detail. Section 4 outlines the experimental procedure and results. Finally, Section 5 offers a discussion and conclusion.
Literature review
Modes of shilling attack
The term shilling attack was first proposed in 2002 [33]. It is used to falsify a user’s rating data through some attack models, allowing the data of fake users to mix in with the normal data in the rating matrix. Figure 1 shows the format of typical attacking data [1, 5, 10, 21, 30]. Given m items in the targeted recommender system, the attack data consist of
An attack profile.
Shilling attacks may incorporate two types of profile injections: push and nuke. Push attacks promote scores of
Average attack mode
The average attack mode randomly selects
Bandwagon attack mode
With a bandwagon attack, popular items are chosen for
Segment attack mode
Instead of affecting the general public, segment attacks are designed to affect a specific group preferring particular items collected in
Love/hate attack mode
A love/hate attack is simpler than the previous attack mode, given the highest (lowest) rating for the target item and the highest (lowest) rating for the other items for push (nuke) attacks. Items in
Among these types of attacks, the most commonly utilized are the random and average attack modes. As the average attack mode is more efficient and difficult to detect than the other modes [9, 23], in this study we dealt with the push and pull attack, which assigns an irregular volume of positive and negative ratings in
Detection method
Numerous studies have been proposed to detect fake user selection [5, 11, 15, 21, 27, 30]. Chung et al. [15] used the beta distribution of the probability method to detect attackers However, this present paper uses the same beta distribution method to find out which item was attacked. O’Mahony et al. [27] proposed the detection of attackers using proximity selection in order to filter suspicious users through clustering. The cluster center is regularly checked for changes, and if any abnormal data affect the cluster center, then they are regarded as attack data [27]. Furthermore, a PCA based method was proposed to filter anomaly profiles [5]. The ratings of items in the user profiles were interpreted as features. As the outliers (namely, the attack profiles) and the general profiles shared low commonality, their correlation should be zero [5, 15, 28]. Another approach utilizes classification techniques to classify profiles [21]. In addition, Zhang et al. [18] adopted the concept of LDA (latent Dirichlet allocation) to group users’ rating data and to find out the attacker. However, attacker profiles are difficult to obtain because attacker information is a small fraction of the total user information.
Item-based approaches strive to detect items under attack. Bhaumik et al. [30] utilized statistical process control (SPC) to detect outliers which were attach profiles. Items are separated into five groups according to the sparsity and rates received. Sparsity is classified into low, medium, or high density. Rates received are classed as low or high average. The combinations of sparsity and rates received create six possible groups. After discarding the group of high density and low average, the remaining five groups are preserved from outlier detection. The average ratings of normal items are assumed to be within the range of three deviations centered at the average ratings of the group to which the items belong.
Many current research studies cover the detection of false review comments with the text mining approach. This is because consumers use reviews to understand products as the basis for their consumption decisions, and so reviews of products on e-commerce websites have become important [32]. According to research, there are many spam reviews on websites such as Amazon.com and hopZilla.com [17]. This not only affects the brand manufacturers, but also the recommender system provided by e-commerce vendors. However, the identification of false comments is not the subject of this study. This research aims to detect items under attack without having to collect balanced numbers of training cases for classification purposes.
Proposed methodology
Beta distribution
In this research, a beta distribution was used to detect the attack items in the system. A beta distribution is a probability distribution based on past events that uses a statistical approach to predict future behavior [29].
The following formula shows the probability density function (PDF) of beta distributions [3, 4]:
where 0
When observed in a randomized trial as binary independent events, p represents the number of successes and q represents the number of failures,
Changes in the parameters
When applied to recommendation system ratings, a high average rating produces a beta distribution that is skewed to the right. Conversely, when the average rating is low, the beta distribution is skewed to the left. When the numbers of the high and the low scores are approximately equal, the beta distribution is evenly distributed. Figure 2 shows the beta distribution PDF.
Beta distribution PDF.
An item under attack should have a seriously skewed distribution. That is, the distribution of the item should be very different from the distributions of the others. In other words, the average expected value of all the items should become an extreme value on the distribution of the attacked item. Therefore, in this study, several beta distributions were defined to identify abnormal items that treated the average expected values as outliers.
Let
Let
With the above definition, different beta distributions can be derived from different combinations of
Sample use-item matrix
Given a beta distribution of item
Upper and lower bounds with quantile
In this study, an item is exceptional if the associated rating exhibits an abnormal distribution, which is defined as the expected values of the entire user-item matrix falling outside the upper or lower bounds.
Table 5 shows the two exceptional items that exclude
Identifying exceptional items based on Table 4
Because of the features of a beta distribution, the shape of the beta PDF is different for different
Such features are utilized to describe the skewness of the preference scores with five different combinations. The first one represents the overall ratings. The fourth one tries to catch a bandwagon attack. The second and third attempt to prevent the nuke and push attacks, respectively. The fifth was designed to detect extreme attacks that used the highest and the lowest scores to wedge the attacks.
Let
where
The pseudocode of the algorithm, called Detection of Items under Attack (DIA) is given in Fig. 3.
Pseudo-code of rating scores.
To examine the accuracy of the detected results, the detection and false rates are defined as follows. The detection rate is the number of attacked items that are correctly detected divided by the total number of attacked items [15]. The false rate is defined as the number of unattacked items identified as attacked divided by the number of items [15].
Let DI be the set of detected items and AI be the set of attacked items.
Data description
To evaluate the results of the proposed detection method, the MovieLens dataset [19] published by GroupLens was used. This dataset contains 100,000 ratings of 1682 movies by 943 raters. The sparse rate of the rating data is approximately 93.7%. Each rater reviewed at least 20 movies using the whole number scores of 1 to 5. The minimum score of 1 indicates that the rater dislikes the movie, whereas the maximum score of 5 signals that the rater highly approves of the movie. The average rating of all the users is 3.53. The approximately 7% is average number of users rating the item [15]. On average, each movie was rated by 59 users.
Attack simulation
Referring to prior works [6, 7, 8, 9, 10, 15, 25, 28, 30, 31], we know that attack profiles are generated with the following parameters: types of attacks, filler size, attack size, and attack numbers. The attack type decides whether the attack is a nuke attack or a push attack. The former tries to destroy while the later tries to enhance the reputation of the items being attacked. The attack size is the number of attack profiles injected and is measured as a percentage of the preattack user count [9, 10, 30]. The filler size is the number of unrated cells that were filled with fake ratings when creating the attacking profiles [28]. As on average, 7% of items are rated in normal profiles, the filler size used in the experiment in this study was set around this number. The attack item numbers decided the number of items under attack at the same time.
Based on previous studies, the attack size was set to range from 1% to 10% and the filler sizes were set between 2% and 12% [7, 8, 15, 28, 31].
Finding the quantile
The quantile of a beta distribution is critical to the detection and false alarm rates. To decide upon a suitable quantile, three experiments with gradually increasing attack and filler sizes were conducted. In the first experiment, the attack and filler sizes were 3% and 4%, respectively; in the second experiment, the attack and filler sizes were 6% and 7%, respectively; and in the third experiment, the attack and filler sizes were 9% and 10%, respectively. In Fig. 4 readers can see that the false rate gradually increased with an increase in the attack sizes. The detection rate was less stable with a relatively good performance at 0.002. As a result, the quantile was set at 0.002.
Test quantile from 0.001 to 0.009.
In a previous study [15], the attack-item contained only one item. After finding the optimal quantile, the researchers used the randomly generated attacked item and filled in the highest or the lowest rating according to whether the attack was a push or a nuke attack, respectively. Then, the filler item was selected according to the filler size, and the average rating was filled in. The number of times to be filled in was determined by the attack size. Under the same parameters, the procedure was performed 10 times, and each metric was averaged to obtain the final result. The following experiments were performed separately for the attack size and the filler size.
In the case of a push attack, Fig. 5 displays the detection rate and the false rates when the attack size was increased from 1% to 10% while the filler size was 6%. The detection rate increased with an increase in the attack size and reaches 100%. At the same time, the false rate gradually increased from 10% to 12%.
Detection and false rates for attack sizes of 1%–10% in the case of a push attack.
In the case of a push attack, for filler sizes between 2% and 12%, Fig. 6 shows that the detection rate was more than 80% and the false rate was approximately 16%. Compared with the variations in attack size, the detection rate was very good. Therefore, the attack size had a significant effect on the detection rate.
Detection and false rates for filler sizes of 2%–12% in the case of a push attack.
Previous experiments simulated a push attack; thus, this research also experimented with a nuke attack. Figure 7 shows that when the attack size was low, the detection rate was not high, but as the attack size increased, the detection rate also increased to 100%. Moreover, the false rate was maintained at around 12%.
Detection and false rates for attack sizes of 1%–10% in the case of a nuke attack.
For filler sizes between 2% and 12%, Fig. 8 shows that the detection rate was more than 90% and the false rate was approximately 16%. The detection rate performance was considerably good.
Detection and false rates for filler sizes of 2%–12% in the case of a nuke attack.
For all the adjustment parameters, the detection rate was more than 80%, and the false rate was relatively stable at approximately 12%–16%. The experiment results showed that under the two attacks of push and nuke, the results for the nuke attack were more stable and better, indicating that the beta distribution method had a better detection rate for nuke attacks. In a number of such attacked items, the attack size affected the detection result. When the attack size was large, the detection rate usually exhibited better results.
In order to demonstrate the consistence of the proposed method, additional 9 sets experiments were performed. Both nuke and push attacks were tested 9 more times.
Figures 9 and 10 compared and contrasted detection and false rates of the average of the 10 and the first experiment of push attacks. The dark and gray line shows the performance of the first experiment and the average of all experiment, respectively. As readers can find that performance is quite consistent in all 10 cases.
Performance comparison of push attack for attack size from 1%–10%.
Performance comparison of push attack for filler sizes from 2%–12%.
Figures 11 and 12 compared and contrasted the performance of nuke attack for the first experiment and the average of the 10 experiments. The result can showed that the performance of the proposed algorithm is very consistent.
Performance comparison of nuke attack for attack sizes from 1%–10%.
Performance comparison of nuke attack for filler sizes from 2%–12%.
The performance of the proposed method was compared and contrasted with that of SPC [30]. In the original SPC content, the data were grouped and divided into five groups (HDHR, LDLR, LDHR, MDHR, and MDLR) according to the number of ratings and the average rating. Then, it calculated the average rating, standard deviation, etc., of the items to calculate the upper and lower control limits. Finally, it followed the upper and lower control limits to find out which item was being attacked. The detection and false rates on average were approximately 50% and 16%, respectively.
Even though with data grouping, SPC reached a reasonable performance, the division of data lacked theoretical support. In contrast, the proposed method did not require the additional grouping of data. To make a fair comparison, the following experiment therefore did not include the grouping for the SPC method.
In addition to the observation of the attack size, attacked item number, and filler item size, the effect of the SPC on the detection and false rates was investigated.
The attack size and the filler size were varied to compare the proposed method with the SPC. The results for the push attack and the nuke attack are shown in Figs 13 and 14, respectively. Both the methods had detection rates of 80% and above, and in some situations, the beta distribution method had a higher detection rate than SPC did. However, the false rate had clear differences. The false rate of SPC was approximately 60%. Although the SPC had a high detection rate, it had a seriously high false rate. In contrast, the beta distribution method exhibited more stable detection results and a lower false rate.
Detection and false rates for attack sizes of 1%–10% and filler sizes of 2%–12% in the case of a push attack.
Detection and false rates for attack sizes of 1%–10% and filler sizes of 2%–12% in the case of a nuke attack.
The e-commerce industry is booming. The recommendation system has been applied in various services, and its recommended content influences users’ purchase behaviors. Moreover, because of the open nature of recommendation systems, malicious users may be tempted to use false information to influence the recommendation list, rendering the system misleading. This is an attack on the recommendation system and in the long term may erode users’ trust in it, leading to the recommendations losing significance and a substantial reduction in business-related interests.
Most research thus far has focused on identifying attacking users through an analysis of the user–item matrix [7, 15, 30]. However, this present research proposes a beta distribution method that can detect the attacked items in order to help the recommendation system filter them and improve the process of building recommendation lists. This study analyzes the user–item matrix to identify the attacked items.
The overall detection rate is higher than 80%, and the false rate is approximately 16%, not to changes in the parameters increased and compared to other detection methods are stable. However, the dataset containing 943 users and 1682 movies is substantially smaller than that of commercially used recommendation systems. Therefore, future research should use a larger amount of ratings data for analysis and consider different time series in the rate changes.
