Developing a hybrid collaborative filtering recommendation system with opinion mining on purchase review

Abstract

The most commonly used algorithm in recommendation systems is collaborative filtering. However, despite its wide use, the prediction accuracy of this algorithm is unexceptional. Furthermore, whether quantitative data such as product rating or purchase history reflect users’ actual taste is questionable. In this article, we propose a method to utilise user review data extracted with opinion mining for product recommendation systems. To evaluate the proposed method, we perform product recommendation test on Amazon product data, with and without the additional opinion mining result on Amazon purchase review data. The performances of these two variants are compared by means of precision, recall, true positive recommendation (TPR) and false positive recommendation (FPR). In this comparison, a large improvement in prediction accuracy was observed when the opinion mining data were taken into account. Based on these results, we answer two main questions: ‘Why is collaborative filtering algorithm not effective?’ and ‘Do quantitative data such as product rating or purchase history reflect users’ actual tastes?’

Keywords

Collaborative filtering hybrid recommendation system opinion mining purchase review

1. Introduction

Recently, advances in communication technology, habituation of the Internet through the dissemination of computers and smart phones and the creation of payment systems and logistic systems have developed e-commerce and created an environment in which one can easily consume a desired product anywhere and at anytime in an online environment [1,2]. Due to these changes in communication technology and the economic environment, the consumption patterns of consumers have rapidly changed, and online shopping has increased [3]. However, since it is not easy to find the desired product among a number of products, people have come to want an environment in which they can select from among many products the products most suitable to their personal tastes or needs [4]. Accordingly, there has been an increased interest in a personalised recommendation system that helps find the desired information quickly and easily [5].

A recommendation system analyses the user evaluation rating for the product or product-selection-information-data, and it recommends the products selected by neighbour users with similar tendencies or predicts the products a user might prefer [6]. Collaborative filtering (CF) is the algorithm most frequently used in the referral system. It is a technique that predicts preferences for non-experienced content for a target user from similar user searches based on the user’s content-preference history [7,8]. The CF technique is an algorithm that has been used successfully in various fields for personalised recommendations. However, the CF algorithm has a drawback that its predictive ability is not significantly high in terms of excellence [9]. In addition, a question has been raised whether quantitative data, such as the selected content or rating for the content that is mostly used in the existing CF techniques, can represent the preferences of users [10]. Therefore, in this study, in order to improve the predictive ability of the CF technique, we propose a recommendation system that utilises opinion mining based not only on quantitative data but also on reviews after a purchase. The contributions of this study are as follows. First, it can consider the user’s preferences objectively, compared with conventional recommendation methods, using purchase reviews in the recommendation system written by the user. Second, it shows that the performance of the recommendation system increased using reviews data extracted with opinion mining in the recommendation system.

Our article presents pertinent works in section 2, while section 3 outlines the proposed recommendation system, section 4 describes our experimental evaluation and section 5 details the results and discussions. Section 6 is the conclusion and future work.

2. Related research

2.1. Opinion mining

Opinion mining, also known as sentiment analysis, refers to the usage of natural language processing, text analysis and computational linguistics to identify subjective information in source materials. Generally, speaking opinion mining aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgement or evaluation, or affective state. Opinion mining covers a range of topics, such as prediction of market size for certain services and products [11], consumer reaction [12], subjectivity or objectivity detection [13,14], classification of sentiment polarity at word level, sentence level or document level [15 –17] as well as aspect–opinion pair extraction [18,19]. Generally, opinion mining is conducted in one of the following two ways: coarse-grained method or fine-grained method [20]. Coarse-grained sentiment analysis aims to judge the overall sentiment polarity of a document using unsupervised machine learning [21,22], semi-supervised machine learning [23] and supervised machine learning methods [24,25]. Fine-grained sentiment analysis is to detect the sentiment polarity and strength towards certain product aspects at word level [26].

As an example of opinion mining, Liu and Zhang [27], addressing sentiment analysis and opinion mining, produced a study on construction of the sentiment classifier that can identify positive, negative and neutral sentiments about an article through a corpus and Twitter. Abulaish et al. [28] pursued a study on the opinion mining system that identifies the function of the product and opinions in the review documents of product users.

In another study, Alrababah et al. [11] proposed a domain-independent approach for identifying explicit opinionated features and attributes that are strongly related to a specific domain product using lexicographer files in WordNet. In their approach, N-gram analysis and the SentiStrength opinion lexicon have been employed to support the extraction of opinionated features [12].

2.2. Collaborative Filtering

Recommender systems typically produce a list of recommendations in one of two ways – through collaborative or content-based filtering. CF approaches build a model from a user’s past behaviour (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by others and then use the model to predict items (or ratings for items) that the user may have interested in. Content-based filtering approaches utilise a series of discrete characteristics of an item in order to recommend additional items with similar properties. In recent years, CF approaches have become extremely common and are applied in a variety of applications, such as movies, music, news, books, research articles, social tags and products in general [29]. CF approaches are divided into two basic types: item-based algorithms and user-based algorithms. The former is based on the similarity of products while the latter is based on the similarity of users. Both algorithms need to identify the user profiles [30]. The item-based algorithms are more widely used in e-commerce. Recommender systems employ rules to determine the relevance of items to users who do not have action on the items. CF approaches measure the user similarity from user’s past behaviour to generate his or her profile [31]. The experimental results indicate that CF algorithms can obtain a higher prediction when considering opinion-rich texts, and sentiment analysis thus enables the improvement of recommender systems [32]. However, these studies – compiling statistics and quantifying the sentiments presented on users’ reviews and utilising them to recommend contents for users – have thus far appeared to be insufficient in their results.

2.3. Combination of opinion mining and Collaborative Filtering

Only a few attempts are made combining CF methods with sentiment analysis. The combination is explored in two ways: considering the sentiment of users towards product aspects in the context where ratings are not available or modifying the existing CF algorithms to incorporate opinion similarities [33]. The findings suggest the applicability of that analytical framework to multi-domains where only a small number of labelled reviews (reviews with ratings) are available. The seminal work motivates our research of opinion-enhanced CF methods. Topic-based CF methods have been proposed in recent years. In practices, users often voice their preferences through online reviews, which are even more valuable than overall ratings [34]. Both user-level LDA CF (ULCF) and topic model–based CF (TMCF) are topic-based CF algorithms considering not only numerical ratings but also textual reviews. The average mean absolute error (MAE) values of TMCF are 0.571 and 0.833 for two data-sets, respectively, which are much better than the baseline. ULCF shows the same trend as TMCF. However, the topic model does not consider sentiment analysis, so it cannot incorporate the user opinions at the fine-grained level. It shows that combining user preferences and user opinions may generate accurate recommendation result (MAE with 1.19, while item-based CF with MAE 3.10 and user-based CF (UBCF) with MAE 2.96) [35]. Most of recommender systems are based on rating, and CF methods cannot work without overall scores. Alternatively, in such cases, textual reviews may serve as useful proxy, from which the overall rating can be measured through sentiment analysis [36]. Besides the overall ratings, users’ personality (e.g. pessimists and optimists) can be used to improve CF as well [31].

Furthermore, purchase review combining with sentiment analysis may generate more accurate recommendations, and this may be a promising direction [32,37]. Although past studies believe the helpfulness of sentiment analysis to CF performances [33], the ongoing researches are still insufficient. Accordingly, in this study, we aim to improve the performance of the recommendation system using hybrid CF which benefits from opinion mining.

3. Proposed recommendation system

3.1. Proposed recommendation system process

A suggested recommendation system proposed in this study shall be developed based on the purchase reviews written by users who bought products from e-commerce sites, with the assumption that those reviews represent users’ preferences. In the existing recommendation systems (see Figure 1), recommendations are performed only with an evaluation score between 1 and 5 points that users indicated through a product evaluation process. However, the recommendation system proposed in this study makes recommendations by applying a text score Text_Score = (TS₁, TS₂, TS₃, …, TS_N) together, which collects statistics and gives the quantified score of not only the existing evaluation score Score = (S₁, S₂, S₃, …, S_N) but also the user’s purchase review Text = (T₁, T₂, T₃, …, T_N) based on the opinion mining process.

Figure 1.

The proposed recommendation process.

3.2. Proposed recommendation system method

In this study, an analysis for the recommendation accuracy has been performed based on six models. Among them, the first model, which gives the existing data to which opinion mining is not applied, becomes the basis to analyse the recommendation accuracy. This model makes recommendations through data consisting of the evaluation score (Score) only and shall be referred to as OMNAD (opinion mining non-applied data).

The remaining five models are the ones applied to the text score, which is the result of opinion mining, or OMNAD. The first model applies opinion mining, but it performs recommendations through data separated into ‘Score’ and ‘Text Score’. It shall be referred to as OMAD (opinion mining applied data). The second model performs recommendations through data consisting of a value that is derived after adding up ‘Score’ and ‘Text Score’ as in formula (1). This model shall be referred to as AS_OMAD (addition score opinion mining applied data)

A S_O M A D = S + T S

(1)

The third model performs recommendations through data consisting of a value that is derived after applying a ceiling function to the average figure of ‘Score’ and ‘Text Score’ as shown in formula (2). This model shall be referred to as CS_OMAD (ceiling score opinion mining applied data)

C S_O M A D = ⌈ \frac{(S + T S)}{2} ⌉

(2)

The fourth model performs recommendations through data consisting of a value that is derived through a normalisation process by figures between 1 and 100 because the range of ‘Score’ and ‘Text Score’ is different as in formula (3). This model shall be referred to as NS_OMAD (normalisation score opinion mining applied data)

NS_OMAD = (\frac{S - min (S)}{max (S) - min (S)}) \cdot 100, (\frac{TS - min (TS)}{max (TS) - min (TS)}) \cdot 100

(3)

The fifth model performs recommendations through data consisting of a value that is derived after adding up ‘Score’ and ‘Text Score’, which undergoes a normalisation process like NS_OMAD as in the following formula (4). This model shall be referred to as NAS_OMAD (normalisation addition score opinion mining applied data)

NAS_OMAD = (\frac{S - min (S)}{max (S) - min (S)}) \cdot 100 + (\frac{TS - min (TS)}{max (TS) - min (TS)}) \cdot 100

(4)

4. Experimental evaluation

4.1. Data set

This study performs an analysis on opinion mining and recommendation accuracy based on Amazon users’ purchase histories and after-purchase reviews. Amazon data are composed as illustrated in Table 1. Opinion mining was carried out based on 120 K of ‘Amazon find food reviews data (October, 2012)’.

Table 1.

Opinion mining non-applied Amazon data example.

ID	Product ID	User ID	Score	Purchase review
1	B006K2ZZ7K	A1UQRSC LF8GW1T	5	Great taffy at a great price. There was a wide assortment of yummy taffy. Delivery was very quick. If your a taffy lover, this is a deal.
2	B0064KO0BU	A34D61R QILOKIJ	1	Besides being smaller than runts, they look the same and have the same consistency. Unfortunately, they taste nothing like banana runts … nor do they even taste good. Yucky stuff. Trying to return with vendor.
3	B004K2IHUO	A3D1TXE 98KRKYO	4	Arrived slightly thawed. My parents wouldn’t accept it. However, the company was very helpful and issued a full refund.
4	B000UUIYHQ	AWIR8TQZ GSVV9	3	I was a little disappointed with the taste (or lack thereof) of these cookies. They are low in sugar and taste like. If you want a healthy alternative and don’t mind sacrificing some taste, then this isn’t a bad option.
5	B002Y2QSLC	A2Y00L1N 2C8F66	2	This coffee is a disappointment. The taste goes nowhere … no full body and aroma. So much for the private reserve alluring trap, really!

4.2. Opinion mining

Opinion mining fulfilled in this study shall be seen as a series of processes to analyse a text and determine what type of opinion it holds among a positive, negative or neutral tendency; it is performed based on the emotional polarity of vocabularies, which are the minimum units of the document [38,39]. Opinion mining analyses the polarity in the text by matching it with the pre-built emotion dictionaries and users’ after-purchase reviews. For this reason, it is necessary to use a pre-validated emotion dictionary because the results of opinion mining shall vary depending on the type of emotion dictionary [40].

The emotion dictionary used in this study is the Sentiment Lexicon Consisting of Lists of Strings by Liu et al. [41]. This dictionary consists of 2006 positive words and 4783 negative words. It shows examples for the positive and negative words (see Table 2).

Table 2.

Positive words and negative words example.

Type of words	Word example
Positive words	Accommodative, adaptive, advanced, blossom, confidence, congratulate, encouragingly, endear, endearing, endorse, endorsed, energetic, energise, pleasant, pleasantly, pleased, pleases, pleasing, pleasingly, pleasurable, pleasurably, pleasure, plentiful, slick, smart, smarter, smartest, trivially, trophy, …
Negative words	Abominably, abominate, abomination, abort, aborted, aborts, abrade, abrasive, abrupt, abruptly, blind, blunder, bondage, brashness, detestably, detested, detesting, detracted, disaster, disbelieve, grievous, grievously, grim, grimace, grind, gripe, infiltrator, infiltrators, infirm, inordinate, revolting, unwell, …

4.3. User-Based Collaborative Filtering

The UBCF method, among CF methods, calculates the similarity between users in the user’s profile history to find the most similar neighbours; it then recommends the items that neighbours gave good reviews [42,43].

4.3.1. Standard analysis set-up

In this study, for comparative analysis through performance verification of data that did and did not apply the results of opinion mining analysis, we used figures of precision and recall and true positive recommendation (TPR) and false positive recommendation (FPR). Precision means the percentage of the contents an actual user likes among the contents recommended by the recommendation system, and recall indicates the number of contents provided accurately that a user actually likes. TPR means the percentage of the contents selected by a user among recommended contents by the system. FPR means the contents not selected by a user among recommended contents by the system. Furthermore, in a UBCF method, to find the most appropriate number of neighbours one must use an F-measure calculated on accuracy and reproducibility [36]. In this study, more information on the numerical value required for the verification of the recommendation system is shown in Table 3. Also, we predicted that the recommendation accuracy will increase as the number of recommended items increases. Therefore, the number of recommendation items to be generated for each user is divided into various categories (one, three, five items), and the analysis results are compared. Preliminary experiments showed that if the number of recommended items exceeds five, it has no effect on the accuracy, and therefore it is not included in the graph.

Table 3.

Numerical value required for the verification of the recommendation system.


Actual	Negative	Positive
Negative	a	b

Positive	c	d

(1)	$precision = (\frac{a + b}{a + b + c + d}), recall = (\frac{d}{c + d})$
(2)	$\begin{matrix} TPR (True Positive Recommendation) \\ = \frac{d}{b + d} \end{matrix}$
(3)	$\begin{matrix} FPR (False Positive Recommendation) \\ = \frac{b}{b + d} \end{matrix}$
(4)	$F - measure = \frac{2 \times precision \times recall}{precision + recall}$

In this study, 80% of the data are designated as training data. In addition, it performed the analysis only on products that were purchased more than three times to prevent the degradation of the performance that may occur because of the small number of items included in the formula designed for recommendation. Moreover, in order to measure the similarity between users, it used the cosine similarity measurement algorithm like formula (5). For UBCF, $x, y$ means two users subjected to the similarity calculation, and $r_{x, i}, r_{y, i}$ means the evaluation value for the item $r_{x, i}, r_{y, i}$ that two users evaluated in common. Cosine similarity value has a value of −1 up to 1.

\cos (\vec{x}, \vec{y}) = \frac{\sum_{i \in {I_{x}}_{y}} r_{x, i} r_{y, i}}{\sqrt{\sum_{i \in I_{x}} r_{x, i}^{2}} \sqrt{\sum_{i \in I_{y}} r_{y, i}^{2}}}

(5)

5. Result and discussion

5.1. Opinion mining

In this study, the opinion mining used is based on Amazon’s purchase-review-text data and compiled statistics, quantified as to whether those summarised data have positive (+), negative (−) or neutral opinions, and then presented as objective information.

In opinion mining, the positive opinion appears as a positive number, and as the degree of positivity gets stronger, the number increases. On the other hand, the negative opinion appears as a negative number, and as the degree of negativity gets stronger, the number decreases. As a result, various opinions between −21 and 56 points appeared; however, some comments showed excessively low or high scores. For this reason, as shown in Figure 2, 269 data entries with excessively high or low scores were classified as noise and deleted, and a total of 119,731 opinions only between −10 points and 20 points were used.

Figure 2.

The opinion mining results.

In opinion data, two pointed opinions appeared most frequently with 19,804 times (16.54%), and three pointed opinions appeared 17,944 times (14.99%). Moreover, one pointed opinions, four pointed opinions and neutral opinions appeared in order as the next largest with 16,380 times (13.68%), 13,409 times (11.20%) and 10,932 times (9.13%), respectively.

The value that is derived based on the results of opinion mining is called the ‘Text Score’. Sometimes, the results appeared that differ from the evaluation scores for the contents. Examples of data derived by the opinion mining results are shown in Table 4.

Table 4.

Opinion mining applied Amazon data example.

ID	Product ID	User ID	Score	Text_Score	Purchase Review
1	B001SATU8E	A1CR1DS15Z7XO1	4	5	Makes a tasty, super easy meal, fast. BUT high in calories. The instructions say to saute the veggies first but I recommend cooking the chicken first. The chicken takes longer to cook and the raw chicken on top of veggies just.
2	B001EPPI84	A27TZ4WBU7N0YF	1	−1	No tea flavour at all. Just whole brunch of artificial flavours. It is not returnable. I wasted 20+ bucks.
3	B004K2IHUO	A1ZKFQLHFZAEH9	3	2	Not what I was expecting in terms of the company’s reputation for excellent home delivery products.
4	B0093NIWVO	A1JT114SOITFFO	5	2	Fresh, a great way to get a little chocolate in my life without a million calories. They taste just like chocolate pudding.
5	B001EPPI84	A3Q0IDQ03S0158	2	0	It is okay. I would not go out of my way to buy it again.

5.2. User-Based Collaborative Filtering

In this study, we compared and analysed the results by applying UBCF to OMNAD and OMAD based on Amazon data. Figure 3 shows the result of the analysis to find the number of the most suitable neighbours through the F measuring value of OMNAD and the suggested five models. As a result of the analysis, in OMNAD, the F-measure appeared highest for 10 neighbours (F = 0.0425), and in OMAD, the F-measure appeared the highest for 10 neighbours (F = 0.039174). In AS_OMAD, the F-measure appeared the highest for 10 neighbours (F = 0.0655), and in CS_OMAD, the F-measure appeared the highest for 20 neighbours (F = 0.0672). In NS_OMAD, the F-measure appeared the highest for 40 neighbours (F = 0.2007), and in NAS_OMAD, the F-measure appeared the highest for 20 neighbours (F = 0.2417). In other words, OMNAD, OMAD and AS_OMAD described that the most appropriate number of neighbours is 10 people; in CS_OMAD and NAS_OMAD, it is 20 people, and 40 people in NS_OMAD.

Figure 3.

The F-measure for each model.

5.2.1. OMNAD

The analysis results of recall and precision in Figure 4 vary depending on the number of recommended items (one, three or five items) that would be produced per user. However, in OMNAD, when it set 10 people as the number of neighbours, each of the recall values showed the best results as 0.0368, 0.0488 and 0.0543, and precision values showed the best results as 0.0732, 0.0385 and 0.0265, respectively, according to the number of recommended products.

Figure 4.

The OMNAD recall and precision and TPR and FPR graph.

As for the results of TPR and FPR analyses, for 10 people as neighbours that showed the highest performance in the recommendation system, the values of TPR appeared as 0.0368, 0.0488 and 0.0543 and of FPR appeared as 0.0000460, 0.0001434 and 0.0002420, respectively, according to the number of recommended products.

5.2.2. OMAD

The analysis results of recall and precision appeared in OMAD as shown in Figure 5. When it set 10 people as the number of neighbours, each of the recall values showed the best results as 0.0328, 0.0469 and 0.0502, and precision values showed the best results as 0.0662, 0.0365 and 0.0242, respectively, according to the number of recommended products.

Figure 5.

The OMAD recall and precision and TPR and FPR graph.

As for the results of TPR and FPR analyses, for 10 people as neighbours that showed the highest performance in the recommendation system, the values of TPR appeared as 0.0328, 0.0469 and 0.0502 and of FPR appeared as 0.0000458, 0.0001419 and 0.0002395, respectively, according to the number of recommended products.

5.2.3. AS_OMAD

The analysis results of recall and precision appeared in AS_OMAD as shown in Figure 6. When it set 10 people as the number of neighbours, each of the recall values showed the best results as 0.0516, 0.0702 and 0.0803, and precision values showed the best results as 0.1165, 0.0635 and 0.0459, respectively, according to the number of recommended products.

Figure 6.

The AS_OMAD recall and precision and TPR and FPR graph.

As for the results of TPR and FPR analyses, for 10 people as neighbours that showed the highest performance in the recommendation system, the values of TPR appeared as 0.0516, 0.0702 and 0.0803 and of FPR appeared as 0.0000423, 0.0001347 and 0.0002288, respectively, according to the number of recommended products.

5.2.4. CS_OMAD

The analysis results of recall and precision appeared in CS_OMAD as shown in Figure 7. When it set 20 people as the number of neighbours, each of the recall values showed the best results as 0.0598, 0.0834 and 0.0923, and precision values showed the best results as 0.1048, 0.0588 and 0.0408, respectively, according to the number of recommended products.

Figure 7.

The CS_OMAD recall and precision and TPR and FPR graph.

As for the results of TPR and FPR analyses, for 20 people as neighbours that showed the highest performance in the recommendation system, the values of TPR appeared as 0.0598, 0.0834 and 0.0923 and of FPR appeared as 0.0000388, 0.0001225 and 0.0002081, respectively, according to the number of recommended products.

5.2.5. NS_OMAD

The analysis results of recall and precision appeared in NS_OMAD as shown in Figure 8. When it set 40 people as the number of neighbours, each of the recall values showed the best results as 0.1529, 0.1906 and 0.1955, and precision values showed the best results as 0.4601, 0.2289 and 0.1451, respectively, according to the number of recommended products.

Figure 8.

The NS_OMAD recall and precision and TPR and FPR graph.

As for the results of TPR and FPR analyses, for 40 people as neighbours that showed the highest performance in the recommendation system, the values of TPR appeared as 0.1529, 0.1906 and 0.1955 and of FPR appeared as 0.0000174, 0.0000744 and 0.0001375, respectively, according to the number of recommended products.

5.2.6. NAS_OMAD

The analysis results of recall and precision appeared in NAS_OMAD as shown in Figure 9. When it set 20 people as the number of neighbours, each of the recall values showed the best results as 0.2039, 0.2500 and 0.2572, and precision values showed the best results as 0.5093, 0.2383 and 0.1503, respectively, according to the number of recommended products.

Figure 9.

The NAS_OMAD recall and precision and TPR and FPR graph.

As for the results of TPR and FPR analyses, for 20 people as neighbours that showed the highest performance in the recommendation system, the values of TPR appeared as 0.2039, 0.2500 and 0.2572 and of FPR appeared as 0.0000203, 0.0000948 and 0.0001763, respectively, according to the number of recommended products. Furthermore, in all models, the more the number of neighbours increased, the values of recall and TPR decreased.

Further analysis on the number of recommended items showed that the precision decreased while the recall increased, and both TPR and FPR values increased. In other words, although the probability that an item recommended by the recommendation system being a favourite item of the user decreases, the probability of recommending the item from a user’s favourite item is increased. Also, as the number of recommended items increases, the ratio of recommended items which were user-selected items also increases. However, the proportion of items that users did not select also increased.

Except for the OMAD model, all four models showed increased performance compared with OMNAD. In other words, through the application of opinion mining, it was demonstrated that the performance of the recommendation system can be improved. However, among the proposed models, two models (NS_OMAD and NAS_OMAD) that applied normalisation in particular showed the greatest improvement in performance. As illustrated in Table 5, the numbers regarding recall, precision, TPR and FPR confirm how significantly the performance was improved on models that applied normalisation compared with the initial proposed OMAD model. Also, Table 6 shows the summarisation of how the F-measure changes according to the recommended number in the existing OMNAD model and the proposed five models. According to these tables, the performance decreased a little in OMAD compared with OMNAD; however, the remaining five models demonstrated an improvement in performance. In particular, two models (NS_OMAD and NAS_OMAD) that applied normalisation showed three times greater improvement in performance than the previous two models. Figure 10 illustrates a comparison of each model’s F-measure. Through the comparison figure of F-measure analysis, the improvement of the performance of the proposed models can be confirmed.

Table 5.

Model-specific performance evaluation measure comparison.

Recommended number		One item	Three items	Five items
OMNAD	Recall	0.0368	0.0488	0.0543
	Precision	0.0732	0.0385	0.0265
	TPR	0.0368	0.0488	0.0543
	FPR	0.0000460	0.0001434	0.0002420
OMAD	Recall	0.0328	0.0469	0.0502
	Precision	0.0662	0.0365	0.0242
	TPR	0.0328	0.0469	0.0502
	FPR	0.0000458	0.0001419	0.0002395
AS_OMAD	Recall	0.0516	0.0702	0.0803
	Precision	0.1165	0.0635	0.0459
	TPR	0.0516	0.0702	0.0803
	FPR	0.0000423	0.0001347	0.0002288
CS_OMAD	Recall	0.0598	0.0834	0.0923
	Precision	0.1048	0.0588	0.0408
	TPR	0.0598	0.0834	0.0923
	FPR	0.0000388	0.0001225	0.0002081
NS_OMAD	Recall	0.1529	0.1906	0.1955
	Precision	0.4601	0.2289	0.1451
	TPR	0.1529	0.1906	0.1955
	FPR	0.0000174	0.0000744	0.0001375
NAS_OMAD	Recall	0.2039	0.2500	0.2572
	Precision	0.5093	0.2383	0.1503
	TPR	0.2039	0.2500	0.2572
	FPR	0.0000203	0.0000948	0.0001763

OMNAD: opinion mining non-applied data; TPR: true positive recommendation; FPR: false positive recommendation; OMAD: opinion mining applied data; AS_OMAD: addition score opinion mining applied data; CS_OMAD: ceiling score opinion mining applied data; NS_OMAD: normalisation score opinion mining applied data; NAS_OMAD: normalisation addition score opinion mining applied data.

Table 6.

Model-specific F-measure comparison.

Recommended number		One item	Three items	Five items
OMNAD	Average F	0.0492	0.0430	0.0356
OMAD		0.0438	0.0410	0.0326
AS_OMAD		0.0715	0.0667	0.0584
CS_OMAD	Average F	0.0761	0.0689	0.0566
NS_OMAD		0.2296	0.2080	0.1666
NAS_OMAD		0.2912	0.2440	0.1897

OMNAD: opinion mining non-applied data; OMAD: opinion mining applied data; AS_OMAD: addition score opinion mining applied data; CS_OMAD: ceiling score opinion mining applied data; NS_OMAD: normalisation score opinion mining applied data; NAS_OMAD: normalisation addition score opinion mining applied data.

Figure 10.

The model-specific F-measure comparison.

6. Conclusion and future work

In the previously conducted recommendation study based on the CF algorithm, the following concerns were raised. First, compared with the superiority of the CF algorithm, its predictive ability is not high. Second, it was questioned whether quantitative data, such as evaluation scores and selected contents, can represent the preferences of users. Therefore, in this study, we proposed a recommendation system based on opinion mining utilising after-purchase reviews. The system makes recommendations through two steps. First, it conducts opinion mining utilising summarised data of after-purchase reviews. Second, it recommends contents by applying the results of opinion mining to the existing data. Therefore, the recommendation method being proposed in this study, based on text mining and using users’ after-purchase reviews, reaches two conclusions: (1) with regard to product recommendation, using after-purchase reviews that users wrote after experiencing the contents directly (in addition to the quantitative data used in the existing recommendation method), it can consider users’ preferences objectively. (2) It demonstrated that such a recommendation system’s predictability increases through the use of after-purchase reviews written by users when recommending the products.

This study proves that the application of opinion mining leads to performance improvement; however, a problem in interpreting reviews has also been found. For instance, in an after-purchase review containing more than two negative words that says ‘NOT GOOD’, the recommendation system cognised ‘Not’ and ‘Good’ separately and thus may not interpret the meaning correctly. Also, there is the problem that not every user writes a purchase review. This problem can be an obstacle to the application of the recommendation system proposed by this study. In future work, we shall prepare the solutions for the problem mentioned above, and at the same time, we shall also prepare methods that can apply the proposed methods to actual situations effectively.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

This work is supported by the ICT R&D programme of MSIP/IITP (Development of distribution and diffusion service technology through individual and collective intelligence to digital contents, 2016) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP; no. R1610941).

References

Kim

Park

D-S

Hong

. Personalized movie recommendation system using context-aware collaborative filtering technique. KIPS Trans Comput Commun Syst 2015; 4(9): 289–296.

Kim

Lee

Choi

. Relevance analysis online advertisement and e-commerce sales. J Korea Entertain Ind Assoc 2016; 10(2): 27–35.

Jeong

Park

. A study on the effect of the facilitating factors of B2C e-commerce on the online shopping and the overseas direct purchase. Int Commer Inf Rev 2016; 18(2): 27–51.

Kim

Park

. A study about the impact of music recommender systems on online digital music rankings. Inf Syst Rev 2014; 16(3): 49–68.

Ekstrand

Riedl

Konstan

. Collaborative filtering recommender systems. Found Trends Hum: Comput Interact 2011; 4(2): 81–173.

Kim

Park

. Personalized group recommendation using collaborative filtering and frequent pattern. J Korean Inst Commun Inf Sci 2016; 41(7): 768–774.

. A study of improvement of individual item diversity in collaborative filtering-based recommendation. J Korean Inst Inf Technol 2016; 14(8): 89–94.

Herlocker

Konstan

Terveen

. Evaluating collaborative filtering recommender systems. ACM T Inform Syst 2004; 22(1): 5–53.

Son

Kim

. A study on development of hybrid collaborative filtering algorithm. J Bus Res 2010; 25(4): 47–66.

10.

Zhang

Lai

. urCF: user review enhanced collaborative filtering, 2014, http://userpages.umbc.edu/~zzhang3/papers/urCF_TonyZhang.pdf

11.

Alrababah

SAA

Gan

Tan

T-P

. Mining opinionated product features using WordNet lexicographer files. J Inf Sci. Epub ahead of print 7 September 2016. DOI: 10.1177/0165551516667651.

12.

Azmi

Alzanin

. Aara’– a system for mining the polarity of Saudi public opinion through e-newspaper comments. J Inf Sci 2014; 40(3): 398–410.

13.

Balahur

Mihalcea

Montoyo

. Computational approaches to subjectivity and sentiment analysis: present and envisaged methods and applications. Comput Speech Lang 2014; 28(1): 1–6.

14.

Bruce

Wiebe

. Recognizing subjectivity: a case study in manual tagging. Nat Lang Eng 1999; 5(2): 187–205.

15.

Turney

. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics, Stroudsburg, PA, 7–12 July 2002, http://www.aclweb.org/anthology/P02-1053.pdf

16.

Turney

Littman

. Measuring praise and criticism: inference of semantic orientation from association. ACM T Inform Syst 2003; 21(4): 315–346.

17.

Wang

Yin

Zheng

. Sentiment classification of online reviews: using sentence-based language model. J Exp Theor Artif In 2014; 26(1): 13–31.

18.

Guo

J-L

Peng

J-E

Wang

H-C

. An opinion feature extraction approach based on a multidimensional sentence analysis model. Cybernet Syst 2013; 44(5): 379–401.

19.

Jindal

Liu

. Identifying comparative sentences in text documents. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, Seattle, WA, 6–11 August 2006. New York: ACM.

20.

Darena

Burda

. Grouping of customer opinions written in natural language using unsupervised machine learning. In: Proceedings of the 2012 14th international symposium on symbolic and numeric algorithms for scientific computing, Timisoara, 26–29 September 2012. New York: IEEE.

21.

Chen

Z-Y

C-Y

. An unsupervised approach for person name bipolarization using principal component analysis. IEEE T Knowl Data En 2012; 24(11): 1963–1976.

22.

Paltoglou

Thelwall

. Twitter, MySpace, Digg: unsupervised sentiment analysis in social media. ACM Trans Intell Syst Technol 2012; 3(4): 66.

23.

Kim

Lee

. Sentiment visualization and classification via semi-supervised nonlinear dimensionality reduction. Pattern Recogn 2014; 47(2): 758–768.

24.

Moraes

Valiati

Neto

WPG

. Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 2013; 40(2): 621–633.

25.

Sayeedunnissa

Hussain

Hameed

. Supervised opinion mining of social network data using a bag-of-words approach on the cloud. In: Bansal

Singh

Deep

. (eds) Proceedings of seventh international conference on bio-inspired computing: theories and applications (BIC-TA 2012). Delhi, India: Springer, 2013 pp. 299–309.

26.

Kanayama

Nasukawa

. Unsupervised lexicon induction for clause-level detection of evaluations. Nat Lang Eng 2012; 18(1): 83–107.

27.

Liu

Zhang

. A survey of opinion mining and sentiment analysis. In: Aggarwal

Zhai

(eds) Mining text data. New York: Springer, 2012, pp. 415–463.

28.

Abulaish

Doja

Ahmad

. Feature and opinion mining for customer review summarization. In: Proceedings of the international conference on pattern recognition and machine intelligence, New Delhi, India, 16–20 December 2009. Berlin: Springer.

29.

Billsus

Pazzani

. Learning collaborative information filters. In: Proceedings of the fifteenth international conference on machine learning (ICML 1998), San Francisco, CA, 24–27 July 1998. New York: ACM.

30.

Lin

C-C

Tsai

C-C

. Applying social bookmarking to collective information searching (CIS): an analysis of behavioral pattern and peer interaction for co-exploring quality online resources. Comput Hum Behav 2011; 27(3): 1249–1257.

31.

García-Cumbreras

MÁ

Montejo-Ráez

Díaz-Galiano

. Pessimists and optimists: improving collaborative filtering through sentiment analysis. Expert Syst Appl 2013; 40(17): 6758–6765.

32.

Choi

Ahn

. A recommender system fusing collaborative filtering and user’s review mining. Int J Comput Electr Autom Control Inf Eng 2016; 10(8): 1362–1365.

33.

Leung

Chan

Chung

F-I

. Integrating collaborative filtering and sentiment analysis: a rating inference approach. In: Proceedings of the ECAI 2006 workshop on recommender systems, Riva del Garda, 28–29 August 2006.

34.

Zheng

Ding

. Personalized recommendation based on reviews and ratings alleviating the sparsity problem of collaborative filtering. In: Proceedings of the 2012 IEEE ninth international conference on e-business engineering, Hangzhou, China, 9–11 September 2012. New York: IEEE.

35.

Liu

Wang

. Combining user preferences and user opinions for accurate recommendation. Electron Commer R A 2013; 12(1): 14–23.

36.

Oghina

Breuss

Tsagkias

. Predicting IMDB movie ratings using social media. In: Proceedings of the 34th European conference on IR research (ECIR 2012), Barcelona, 1–5 April 2012. Berlin: Springer.

37.

Jeon

Ahn

. A collaborative filtering system combined with users’ review mining: application to the recommendation of smartphone apps. J Intell Inf Syst 2015; 21(2): 1–18.

38.

Jeong

Lee

. A design of SNS emotional information analysis strategy based on opinion mining. J Korea Inst Inf Electr Commun Technol 2015; 8(6): 544–550.

39.

Vinodhini

Chandrasekaran

. Sentiment analysis and opinion mining: a survey. Int J Adv Res Comput Sci Software Eng 2012; 2(6) pp. 282–292.

40.

Chen

Zimbra

. AI and opinion mining. IEEE Intell Syst 2010; 25(3): 74–80.

41.

Liu

Cheng

. Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on world wide web, Chiba, Japan, 10–14 May 2005. New York: ACM.

42.

Junh-He

Jong-Woo

. Collaborative filtering techniques using social network analysis for UCC recommendation. J Korean Inst Inf Technol 2013; 11(1): 185–195.

43.

Zhao

Z-D

Shang

. User-based collaborative-filtering recommendation algorithms on Hadoop. In: Proceedings of the third international conference on knowledge discovery and data mining, Phuket, 9–10 January 2010. New York: IEEE.