Abstract
WEB 2.0 facilitates the bidirectional communication capabilities of online review, causing the personalization and asymmetry of online review. Despite the problem of online personalized recommendation system, the influence of consumer characteristics on consumer repurchase intention is insufficiently examined in the extant literature. To address this issue, this study proposes a new online personalized recommendation approach based on the perceived value of consumer characteristics. Two aspects of the proposed framework are addressed. The first aspect is the linguistic information transformation model, which converts online reviews to unbalanced linguistic label cloud. The second aspect is an online recommendation approach based on the linguistic information trans-formation model. A series of experiments are conducted based on a set of hotel assessment data from four cities and the electronic consumer record of four consumers selected randomly. Results show that the proposed cluster method is useful for identifying consumer characteristics and gives personalized recommendation. Overall, this method reduces computation and provides a reference point based on consumer characteristics.
Keywords
Introduction
With the rise of WEB 2.0, an increasing number of new technologies and achievements materialize to improve the bidirectional communication capabilities of online review. Consumers can easily comment or review online anywhere and anytime, but online reviews are personalized because of their anonymity, openness, and autonomy. To motivate consumer repurchase intention and develop long-term relationships, vendors tend to recommend as many resources (i.e., web pages, products, advertisements, etc.) as they can [1]. Under this circumstance, potential consumers face information overload when they make purchasing decisions online. To address this issue, a new personalized recommendation model that can meet the characteristics of online consumers is needed.
A personalized recommendation system is considered as a means to guide potential consumers to filter interesting or useful objects from those that they do not like. The preferences of consumers are measured with their perceived value, which is the combination of the shopping and product values [2]. Usually,e-service quality, product quality, and sacrifice are used to describe the perceived value of online shoppers [3]. The linkage between means–end Chain (MEC) Theory [4] and value-repurchase intention [5] indicates that the perceived value of consumers motivates their repurchase intention. Benefits and sacrifices deeply influence the perceived value of online shoppers [6]. Consumer characteristics [7], such as gender and age, also relate to consumer repurchase intention.
Fuzzy-based recommendation systems [8] are fairly successful solutions for facilitating access to the perceived value of online users. Fuzzy logic tools are useful for assessing the preferences [9] and needs of online users. The two main recommendation paradigms are collaborative filtering-based recommendation [10] and content-based recommendation [11]. User preferences are important to recommendation generation and thus are used to obtain accurate recommendations. Linguistic variables [12, 13] can help to model uncertain and vague preferences, and they are widely employed in multi-criteria decision making (MCDM) [14–16]. Expressing preference information of consumers on alternatives is beneficial [17]. The assessments and judgments of consumers should be precisely transformed from a qualitative form to a quantitative one [18]. Several linguistic approaches represent qualitative aspects with linguistic variables based on symmetrically distributed linguistic term sets [19].
A main feature of online reviews is the buckets effect, which indicates that consumers are upset with negative online reviews. Consumer purchase intention is reduced with negative online reviews [20]. Researchers suggest that tourists may pay more attention to passive information than active and neutral information. To emphasize the effect of negative online reviews, unbalanced linguistic term sets [21] are used as a good solution. Unlike symmetrically distributed linguistic term sets, unbalanced linguistic term sets can provide unbalanced distribution for a series of linguistic variables. The reference domain of consumers under these situations is unbalance, such as one side is higher than the other [22]. The system can be expressed as S3 = {s1/3 = extremelypoor (EP) , s2/3 = poor (P) , s1 = fair (F) , s3/2 = good (G) , and s3 = extremelygood (EG)}. Unbalanced multiplicative linguistic label set is useful for managing incomplete information.
In extant studies, fuzzy logic tools are often applied to recommendation systems. Cloud model is an effective tool for dealing with linguistic assessment information. This powerful tool can transform qualitative concepts to quantitative instantiations. The fuzziness of a qualitative concept is described with normal (Gauss) membership function, whereas the randomness is described with normal distribution. Cloud model is used to assess linguistic information when facing MCDM problems [23]. A new method of linguistic decision-making problems is proposed that is based on probability theory and fuzzy set theory [24]. The linguistic variables and clouds [25] are used to transform linguistic information and applied to multi-criteria group decision making (MCGDM). To select a reference point according to the evaluations of the potential or similar consumers regarding certain items, a new method is proposed based on cloud model and prospect theory items [26, 27]. Trapezium cloud, which is more general than normal cloud, also attracts considerable attention in MCDM problems with linguistic information [28].
Although they can be effectively applied to recommendation systems, the aforementioned models bear limitations in the following aspects: Unfortunately, most research on linguistic term set fails to consider the unbalance of reviews; examples include 2-tuple linguistic model [29] and hesitant fuzzy linguistic term set [30]. Negative reviews, which spread quickly online and potentially harm brand equity, are discounted [31]. Statistical results displayed on sites also play a leading role when consumers make purchasing decisions. Consumer purchase intention reduces or disappears when the ratio of negative reviews exceeds the psychological expectation of consumers. All existing recommendation paradigms [11, 12] overlook consumers’ degree of dislike in the final results and thus cannot tackle large-scale linguistic data. WEB2.0 offers facilities for consumers to individually comment on their consumption experiences and meets the requirements of interaction on all kinds of network environment excellently. Vendors can capture large-scale information about their consumers, such as the personal information of consumers. Traditional aggregation operators [32], such as ordered weighted averaging and uncertain linguistic aggregation operators, cannot reflect consumers’ degree of dislike and tackle large-scale linguistic data. Recommendation paradigms and data reduction techniques [33, 34] should be improved to address this challenge.
To overcome the aforementioned drawbacks, this study develops an applicable online personalized recommendation system with the main innovations and significant contributions summarized asfollows: To reflect the unbalanced influences between negative and positive reviews, unbalanced linguistic label set with cloud is developed to transfer review information into high-dimensional normal cloud. Each dimensional normal cloud evaluates a linguistic label. A reference point of each potential consumer is provided in the light of evaluating the characteristics of potential consumers and used to tackle large-scale data. A two-step cluster method is provided to cluster consumer profile, which is expressed with a hyper-spherical region. To reduce sample size, anomaly detection method and cluster method are used to tackle large-scale data, wherein the data are filtered first to abandon anomaly data. The recommendation data are then divided into several clusters, and the consumer assessments data are clustered and used to determine the hyper-spherical region, which is selected as the reference point of consumer. The cluster of recommendation data closest to the reference point is recommended to a consumer to reduce sample capacity.
The remaining parts of this paper is organized as follows. Section 2 describes the cluster method, unbalanced multiplicative linguistic label set, and cloud model. Section 3 discusses the unbalanced multiplicative linguistic label cloud and unbalanced linguistic label cloud based on the perceived value of consumers. Section 4 puts forward a new online purchasing recommender system based on the model. Section 5 includes a theoretical analysis based on the empirical research on hotel bookings that yields valid statistical results to support the relevant theory. Finally, Section 6 concludes the paper with a discussion of the limitations and future research directions.
Related work
Cluster method
Cluster analysis or clustering is used to divide objects which are similar in the same group. Cluster is usually defined as groups with small distances among the cluster members, dense areas of the data space, and intervals or particular statistical distributions. Two clustering algorithms, namely, the density-based spatial clustering of applications with noise (DBSCAN) [35] and k-Nearest Neighbors(k-NN) algorithm are described below. DBSCAN is a density-based clustering algorithm that requires two parameters, that is, ɛ (eps) and n (the minimum number of points), which are used to form a dense region. DBSCAN is a convenient and rapid method for the detection of number of group k, and the quality of DBSCAN depends on the distance measure. Considering that clustering data sets with large differences in densities is difficult, results of DBSCAN are unsatisfactory.
k-NN algorithm is an instance-based learning algorithm that is a non-parametric method [36]. The result of k-NN algorithm depends on the number of k groups that first needs to be defined by a user.
To overcome these shortcomings, a two-step classification is provided that combines DBSCAN and k-NN algorithm. DBSCAN can determine k when ɛ and n are given. Then, k-NN algorithm can precisely divide a set of objects into k groups. This method covers the shortcomings that exist in DBSCAN and k-NN algorithm.
Unbalanced multiplicative linguistic label set
In the context of online shopping, the preference information of decision makers on objects (attributes or alternatives) are expressed with linguistic labels. To the best of our knowledge, most existing linguistic variables are uniformly and symmetrically distributed. Nevertheless, the granularity between each pair of cardinalities may vary by online decision maker. Negative comments deeply influence consumers, and to solve this problem, unbalanced linguistic labels are used to assess the linguistic variables in online shopping. The unbalanced linguistic term set based on numerical scales is defined as follows:
A unique constant λ > 0 exists, such that s
i
- s
j
= λ (i - j). S- = { s|s ∈ S
g
, s < s0 } , S+ = {s|s ∈ S
g
, s > s0} , the cardinality of S- and S+ is equal.
A new unbalanced multiplicative linguistic label set is defined as follows [37]:
where s
τ
in S
g
is a multiplicative linguistic label, g (g ≥ 2) is a positive integer, and the cardinality of S
g
is 2g - 1. Unbalanced multiplicative linguistic label set S
g
exhibits the following characteristics: s
α
> s
β
if α > β. A reciprocal operator: rec (s
β
) = s
α
exists, such that αβ = 1. Especially, rec (s1) = s1.
For example, a set of five multiplicative linguistic labels S g , can be S3 = {s1/3 = extremelypoor (EP) , s2/3 = (P) , s1 = fair (F) , s3/2 = good (G) , and s3 = extremelygood (EG)} (see Fig. 1).

Set of five multiplicative linguistic labels S3.
Figure 1 demonstrates that the mid-multiplicative linguistic label s1 indicates the “center” of assessments with the rest of multiplicative linguistic labels placed reciprocally around it. Especially, s1/g and s g indicate the lower and upper limits of the multiplicative linguistic labels in S g , respectively, g (g ≥ 2).
To transform qualitative concepts to quantitative expressions, the definition and properties of cloud models are outlined, and a cloud is shown in Fig. 4.

Clustering algorithm.

Reference points of each consumer.

Clusters of hotel data.
3 En rule is used to divide all the elements with a cloud into key, basic, peripheral, and weak peripheral elements. Table 1 lists these facts expressed in mathematical notation.
Distribution of cloud droplet contribution range
The perceived value of consumers is transformed into cloud in this section.
Unbalanced multiplicative linguistic label cloud
Online reviews can significantly influence the purchasing decisions of consumers [38]. In the Internet era, online reviews essentially embody consumer characteristics, and the semantics and distribution involved in the assessments of consumers are usually unevenly distributed. To obtain the comprehensive linguistic scale values, a transformation between unbalanced multiplicative linguistic label and clouds is introduced. The transformation of qualitative concepts to quantitative values is realized objectively and interchangeably with cloud model.
Cloud transformation has been studied previously. Wang [28] proposed a method for cloud generation by combining the exponential scale linguistic function and -n ∼ n scale. However, this method [28] can only handle uniformly and symmetrically distributed variables. The present study improves the linguistic assessment scale without tackling dissymmetrical distributed variables.
Equation (5) has the advantages of an index scale and a
To preserve all information, the mapping from s
τ
to Y
τ
(Ex
τ
, En
τ
, He
τ
)
(1) Ex
τ
for τ is calculated as:
(2) Calculate En τ
Let a droplet of cloud be (x, y),
Then,
En
τ
is considered the mean value of
(3) Calculate He τ
Given that
Where En′+ is expressed as:
After conversion, clouds will are integrated as in [28]. The integrated clouds are used to compare alternatives A i (i = 1, 2, ⋯ , m).
A method is proposed to acquire the perceived value of consumers with unbalanced multiplicative linguistic label cloud. First, the identification of the factors affecting the perceived value of consumers is crucial. Although the perceived value is the synthesis of benefits and sacrifice, the linguistic evaluations are transformed to quantitative instantiations and aggregated in most existing studies covering the perceived value of consumers. The unbalanced multiplicative linguistic label set can realize the differences between adjacent linguistic terms.
To sufficiently examine the influence of consumer characteristics on benefits and sacrifice, a conceptual framework based on unbalanced linguistic label cloud is provided. W = (ω1/g, ⋯ , ωg - 1/g, ω1, ωg/g - 1, ⋯ , ω
g
) is considered as a synthesis of perceived value of all consumers. Let N be the overall number of consumer and n
τ
be the number of consumers s
τ
, then ω
τ
is shown as follows:
We aim to utilize the associated weight vector ω τ and explore the effects of ω g and ω1/g on behaviors related to buying. The consumption record of consumers is used to mine the personalized perceived value based on the data derived from websites. Fuzzy logic tools are used to cluster the historical data of consumers and provide a hyper-spherical region CL, which is regarded as a threshold. This threshold motivates consumer purchasing intention and varies per individual.
Two-step cluster method is used to cluster the historical data of the consumers and recommended products. Through this method, the historical data of the recommended products is divided into several classes that decrease the sample size.
In this section, a framework of fuzzy-based recommendation system is constructed with unbalanced multiplicative linguistic label cloud.
Initialization
The anomalous data is examined by signs of deviation in a fuzzy boundary [45]. The data are divided into five grades: A (extremely good), B (good), C (fair), D (poor), and E (extremely poor) as follows:
The first step in our algorithm is to initialize an electronic consumer record (ECR). The consumer record is built and applied in classification and is defined by two parts: consumer profile and consumer evaluation data. The content for each element is detailed as follows:
ECR is used to provide users with complete electronic archives management and network query function.
A two-step method is employed for clustering consumer evaluation data. The first step determines the number of clusters k with DBSCAN, and the second step clusters consumer evaluation data with k-NN. Figure 2 illustrates the clusteringalgorithm.
DBSCAN is used to determine k with given ɛ and n. Based on the obtained k, k-NN clusters the data of W from ECR. The reference point is denoted as CL j , 1 ≤ j ≤ k and a set of cluster is denoted as cl m , 1 ≤ m ≤ M where k is the number of consumer’s choice pattern, cl m represents a cluster of consumer evaluation data, M is the number of clusters. Each cl m , 1 ≤ m ≤ M and CL j , 1 ≤ j ≤ k are hyperspherical region. The center of the hyperspherical region m is a vector W m with unbalanced multiplicative linguistic label cloud. The W m is the cluster center of m.
A purchasing decision-making approach is proposed in this subsection. This approach links the consumers’ overall evaluation and judgment to purchasing decision behavior. The product or service criteria facilitate the values or goals that consumers in the same group use to determine consumers’ perceived values. The perceived values are the final goals that trigger consumers’ behavior [5].
The concept of distance is used to look for the optimal selection, which can be measured by the distance between the cluster centers. Assume that CL
j
, 1 ≤ j ≤ k and cl
m
, 1 ≤ m ≤ M are any two hyper-spherical region. While d
C
(CL
j
, cl
m
) is the distance between CL
j
, 1 ≤ j ≤ k and cl
m
, 1 ≤ m ≤ M, and is provided as follows:
In general, the perceived values of consumers is the key to influence their choice patterns. To obtain the perceived values of consumer, the consumer evaluation data is clustered by two-step method. One or more clusters are given. The cluster centers CL j , 1 ≤ j ≤ k are the perceived values of consumers in the same cluster.
The online product or service’s associated weight vector of evaluations and judgments are clustered. Then, the centers cl
m
, 1 ≤ m ≤ M are given, and the distance between cl
m
, 1 ≤ m ≤ M and CL
j
, 1 ≤ j ≤ k is calculated. Purchasing decision behavior is triggered when cl
m
, 1 ≤ m ≤ M is nearest to CL
j
, 1 ≤ j ≤ k. Reference point CL
j
, 1 ≤ j ≤ k is used to obtain threshold. The associated weight vector generated from cl
m
, 1 ≤ m ≤ M is selected when
In this section, experimental studies are conducted, and the obtained results are presented.
Description of data sets
To test the performance of the approach proposed in this paper, a hotel-booking problem is used to illustrate the model’s effects. Data on hotels were collected through a software (http://www.skieer.com) from http://www.tripadvisor.cn on May 26, 2016. Four cities and four consumers, who are senior critics on the related tourism websites, were randomly selected to verify the feasibility and robustness of the proposed method. The data were calculated using MATLAB2013 on a notebook computer with Intel (R) Core(TM) i5-3210M CPU.
Experiment evaluation
To prove the validity of this method, the hotel-booking problem was addressed using our model and the comprehensive evaluation of hotel was presented, which is a purchasing decision-making procedure. The main procedures are outlined in the following steps:
First, anomalous data were examined using the method described in Subsection 4.1. Then, an ECR was established (see Table 2).
ECR of consumers
ECR of consumers
The linguistic variables were transformed into unbalanced linguistic label cloud. Given the universe [Xmin, Xmax] = [0, 5], an unbalanced multiplicative linguistic term set with five labels can be transformed into five clouds (see Table 3).
Five-label linguistic clouds
According to the proposed cluster method, the reference point of a consumer can be determined using the two-step cluster method. Table 4 lists the reference points of the four selected consumers.
Reference points of four consumers
Table 4 indicates two clusters in the data of User 2, which may be attributed to different travel purposes or demands. To describe the data of the four consumers more intuitively, the associated weights of “extremely good” and “extremely poor” consumer data were used and drawn. The weight of “extremely good” in consumer data are marked in x-axis whereas the weight of “extremely poor” in consumer data are marked as y-axis. The reference points of all consumers are shown in Fig. 3.
Figure 3 represents the significant individual differences in consumers’ behavior. In most cases, Users 2, 3, and 4 did not select the weight of “extremely poor” over 0.03. By contrast, for User 1, the threshold of weight of “extremely poor” is far beyond 0.03. Therefore, heterogeneity is observed among different consumers. The threshold of weight of “extremely good” also vary with each individual. The threshold of “extremely good” of User 4 is over 0.5, which is higher than that of User 1.
Moreover, hotel review data were also calculated using the two-step cluster method. Four cities were selected, including Beijing, Haikou, Shanghai, and Sanya. The center of the hyper-spherical region is determined and shown in Table 5, and hotel review data are visually presented in Fig. 4.
Center of hyperspherical region
With the proposed model, the results reveal a set of interesting findings. Figure 4 shows that different cities vary in consumption service levels, and each city has different preference in the types of hotel. Specifically, a type of hotel exists in Haikou and Shanghai that nobody had ever thought of as “extremely good.”
The distance between CL j , 1 ≤ j ≤ k and cl m , 1 ≤ m ≤ M can be calculated using Equation (21). The closest center of hyper-spherical region is marked and drawn on the chart. The data of User 1 is used to cluster, and the results are shown in Fig. 5(A).

User 1’s data and our recommended data.
The hotel review data that the User 1 chose were eliminated, and the new center of User 1 was used to look for an appropriate hotel. The result is shown in Fig. 5(B). After the moderation of eliminating the user’s chosen data, the influence of the perceived value is weak. Four cities continue to choose the same cluster. However, the new center in Sanya is not far from the data that were not chosen. The differences are very slight between the never eliminated data and eliminated data. In general, our empirical results are consistent with the data of the hotels that the consumers chose. Consumer characteristics are reflected by their online shopping experiences and confirmed stable. The data of other consumers can also confirm the above results, and the procedures are omitted here.
The applicability of the proposed approach is verified in the following. The hotel data in Haikou and the data of User 2 are selected and used to test our method. The traveler type, price and rank are derived from TripAdvisor. There are four types of travels, such as family travel (FT), couples (CP), traveling alone (TA) and business travel and tourism (BT). The results of recommendation is shown in Table 6.
Recommendation result from Haikou hotel data
The data in Table 6 indicate that our method can recommend two different types of hotel according to the two clusters of User 2 obtained through Step 2. Obvious differences are observed between the two recommended hotels in terms of price, rank, assessment, and traveler type. Our method can provide personalized recommendation for consumers. The data of other consumers also confirm the results and can be omitted.
To validate system robustness, this section conducts several experiments to test the predictive validity of the proposed model.
(1) Reliability of selected data
In order to verify the reliability and feasibility of selected data, data on five cities and four consumers with different types were used. These data are sufficient to illustrate the proposed method.
(2) Parameter sensitivity
To measure the degree of sensitivity on the change of surrounding conditions, several experiments were repeatedly conducted. The stability of clustering method and the robustness of the proposed recommendation system were used to verify the parameter sensitivity. The result can guarantee the stability of the parameter k.
The stability of our method and recommendation system is tested with the data of the same city tested 4 times to explain the stability of the cluster method. The results are shown in Table 7.
Cluster result from Shanghai hotel data
Cluster result from Shanghai hotel data
By comparing Table 5 with Table 7, the results reveal that the proposed cluster method has high stability. The center and number of the clusters are almost fixed. Tests of the data of the other four cities were repeatedly conducted in similar manner. The results are also stable and the experimental results are omitted in this paper. Therefore, it can be concluded that all cluster results are stable and the robustness of the cluster method is reliable.
Second, the stability of the new recommendation system was repeatedly tested for four times. The results are stable, as shown in Fig. 6.

Clusters of Sanya hotel data.
The robustness of the new recommendation system is tested from two aspects, that is cities and consumers. First, two cities were used, and the results are shown in Fig. 7.

Hotel recommendation for User 2.
The results reveal that our recommendation data contains the data the consumer have adopted and it demonstrates it is moderately reliable. The unselected cities are proven the same and thus are omitted. Therefore, the robustness of the new recommendation system is verified.
Second, two consumers were used to prove this robustness. The results of recommendation are stable, as shown in Fig. 8. The results of Users 1 and 2, as shown in Fig. 5, were omitted as they are proven to be stable as well. The hotel recommended to consumers include the hotels consumers have been to.

Hotel recommendation for Sanya.
The effectiveness of the proposed recommendation approach is verified with a comparative study. The approach developed by Zhang et al. [27] based on prospect theory and cloud model was used to conduct this comparison and the analysis is based on the data of hotel recommendation presented under Step 5 in subsection 5.2. The recommendation results are shown in Table 8.
Recommendation results from Haikou hotel data
Recommendation results from Haikou hotel data
According to the final recommendation the results of our method and the method proposed in [27] are different. Based on prospect theory and cloud aggregation operators, the approach developed in [27] uses the difference between expectations and results to measure gains and loss. However, the negative perceived value is ignored using aggregation operators in [27]. Therefore, the results in [27] is the best but are not the most appropriate. The patterns of the different consumers’ choices are evaluated and used to provide different personalized recommendations. The method proposed in our paper recommends two hotels according to the consumers’ choice patterns. The first option is the Longquan Zhixing Hotel, which does not provide the specific labels of traveler type, and the second option is the Hainan Junhua Haiyi Hotel. The traveler type of most consumers is either business travel or tourism. On the contrary, the approach developed in [27] cannot identify the different patterns of choices of consumers. Furthermore, only one point of reference exists for User 2, which does not conform to his/her selection demands. Appropriate hotels are recommended through our method. The hotel recommended in [27] is the hotel that ranks first the web. The price in this hotel is higher than the one recommended by the method proposed in our paper, and consumers cannot pay for the former.
The proposed recommendation approach can provide more appropriate hotels for consumers than other existing methods. The method proposed in this paper does not aim to provide an option that is ranked high. The option recommended by our method can identify personalized recommendations and maximize consumer satisfaction.
This study proposes an online recommendation approach based on two-step cluster method and unbalanced linguistic label cloud. Unbalanced linguistic label is initially extended with clouds, and two-step cluster method is subsequently used to measure the perceived value of consumers. The significant contributions of this paper are listed as follows: Unbalanced linguistic label cloud is used to reflect the unbalanced influences between negative and positive reviews. The influences of negative review is strengthened. The personalized perceived value of consumers is extracted with cluster method and regarded as threshold, which is the final goal that triggers behavior and used as a filter to reduce the sample size.
In our future research, we aim to improve the uncertain linguistic transformation approach and develop new fuzzy-based recommendation systems, such as group recommendation and recommendation with deep learning.
Footnotes
Acknowledgments
The author would like to thank the editors and the anonymous referees for their valuable and construc-tive comments and suggestions that greatly help the improvement of this paper. This work was supported by the National Natural Science Foundation of China (No. 71571193).
