A multi-granularity fuzzy computing model for sentiment classification of Chinese reviews

Abstract

With the rapid growth of user-generated contents online, unsupervised methods which do not need to use labeled training data have become increasingly important in sentiment classification. But the performance of unsupervised methods is unsatisfactory. This is because sentence structure and ambiguity of sentiment intensity are usually ignored in existing unsupervised methods. To address these problems, we propose a multi-granularity fuzzy computing model which involves two innovations. Firstly, we come up with a multi-granularity computing method to compute sentiment intensity of reviews. To be specific, we deconstruct those reviews into three levels of language units—words, phrases and sentences, and consequently manage to compute the sentiment intensity of reviews by combining rule-based methods and statistic-based methods. Secondly, a fuzzy classifier is constructed to solve the ambiguity of sentiment intensity. Furthermore, two different self-supervised methods using pseudo-labeled training data are proposed to learn the optimum parameters of the fuzzy classifier. Experimental results in four different datasets prove that our model improves 6.25% more accuracy on average than the competitive baselines in sentiment classification of Chinese reviews.

Keywords

Sentiment classification unsupervised methods fuzzy computing model multi-granularity computing

1 Introduction

With the continuous escalation of social networks such as Twitter, Facebook and various forums, user-generated contents such as feedback from consumers, status of social networking service (SNS) and micro-blogs are exploding on the Internet. With the great surge of subjective contents, how to mine opinions from them has become a new focus in the area of natural language processing (NLP) and web mining [1]. Since the early 2000s, as a specific category of text classification, sentiment classification has drawn increasing attention from researchers and become a fairly popular research topic in handling user-generated contents [2, 3].

Up to now, many researches have been conducted on sentiment classification of the subjective texts [4]. A subjective text is mainly classified into positive or negative polarity in document level of sentiment classification. For example, it is very reasonable to classify a produce review, which expresses an overall positive or negative opinion about the produce, into positive or negative polarity. In this paper, we focus on the document-level sentiment classification.

At present, there are mainly two types of methods in document-level sentiment classification, supervised methods and unsupervised methods [5]. Many supervised methods in topic classification have been applied to sentiment classification, but their performance is not satisfying [6]. Different from traditional topic classification [7], sentiment classification has been proved to be highly sensitive to the domain of data that are related to some certain subjects [8]. To improve the performance of sentiment classification, supervised methods insist that the domain of the training data should be consistent with that of the test data [9]. However, annotating data for each domain is rather costly. Thus, unsupervised methods are more desirable when it comes to efficiency and cost [10].

In existing unsupervised methods, there are mainly two problems. Firstly, when computing the sentiment intensity of reviews, only sentiment polarity or sentiment intensity of words and phrases is employed, sentence types and relations between sentences are seldom taken into account [11]. However, different sentence types and relations between sentences have different effects on sentiment of sentence. For example, declarative sentences and interrogative sentences which have the same contents may express an opposite polarity. Secondly, when performing sentiment classification according to sentiment intensity of reviews, sentiment classification is formulated as an either-or problem, the fuzziness of sentiment intensity is usually ignored [12]. However, it is difficult to distinguish two opposite sentiment polarities from the two reviews whose sentiment intensity is close to.

To solve these two problems, we propose a multi-granularity fuzzy computing model (MGFCM) for sentiment classification of reviews. Our contributions have mainly two points.

Firstly, a multi-granularity computing method, which computes sentiment intensity of reviews in terms of three levels of language component-words, phrases and sentences, is proposed. Specifically, we compute sentiment intensity of a review by merging the sentiment intensity of words, phrases and sentences in the review. On the word level, we distinguish two types of different sentiment words and propose corresponding methods for two types of different sentiment words. On the phrase level, we propose a method based on sliding window of fixed length to construct sentiment phrases and compute sentiment intensity of sentiment phrases. On the sentence level, we propose a rule-based method which considers three types of sentences and two types of logical relation between sentences.

Secondly, given the fuzziness of sentiment intensity, we construct a fuzzy classifier and a corresponding classification function of the fuzzy classifier by virtue of fuzzy sets and the principle of maximum membership degree. Furthermore, we propose two different self-supervised methods using pseudo-labeled training data to learn the parameters of the fuzzy classifier. Experience results prove the validity of both the two different methods.

This paper is organized as follows. Related studies are presented in Section 2. Section 3 casts light on the multi-granularity fuzzy computing model and the two key parts of it. In Section 4, a key sentiment morpheme lexicon (KSML) and a quantified whole sentiment lexicon (QWSL) are codified. In Section 5, the experimental results in four review datasets are analyzed to prove the performance of our model. Finally, the whole work is summarized and the final conclusions are drawn.

2 Related work

As a popular research topic in natural language processing and web mining, sentiment classification has been extensively studied since 2002 [1, 4]. Existing methods of sentiment classification are mainly divided into two categories: supervised methods and unsupervised methods [5]. The supervised methods include many traditional text classification methods, such as Naive Bayes [29], Support Vector Machine [30], and Neural Networks [31]. The unsupervised methods primarily include two key steps [12 –15]. The first step is to compute the sentiment intensity of reviews. The second step is to achieve sentiment classification of the review according to sentiment intensity of reviews. Since our methods are unsupervised methods, next, we mainly introduce related work in unsupervised sentiment classification.

As the basis of unsupervised sentiment classification, identifying the polarity of sentiment words is a research focus. There are three mainstream types of methods for identifying the polarity of Chinese sentiment words. The first is a kind of thesaurus-based method which evaluates the similarity between the undetermined sentiment words and the known sentiment words for reference by their distances in thesaurus [16 –18]. The second is a sort of corpus-based method which evaluates such similarity by statistical methods in corpus [18 –20]. And the third is a type of morpheme-based method which directly calculates the sentiment intensity of the sentiment words, with that of the Chinese sentiment morphemes available [21 –24].

After obtaining sentiment polarity of sentiment words, to compute the sentiment intensity of reviews, previous work tended to simply add up the sentiment polarity of each sentiment word or phrase in these reviews with a certain weight or proportion which was, by and large, determined by degree adverbs around each of them [11 , 23]. Sentence types and relations between sentences are seldom taken into account in computing the sentiment intensity ofreviews.

Back to the second step, existing research implements sentiment classification of reviews by roughly comparing sentiment intensity of reviews with a single threshold “0”, which primarily left out the fuzzy region between sentiment categories [10–12 , 23]. However, both the rule of thumb and the research in NLP indicate that sentiment intensity of reviews involves fuzziness [25 –27], and it is not proper to identify the polarity of reviews by a rough either-or method.

However, there is still some previous research investigating the effect that fuzzy sets may have on sentiment classification [27 , 32]. For example, Wang et al. proposed a unsupervised fuzzy computing model to identify the polarity of Chinese sentiment words [32]. Fu and Wang together invented an unsupervised method using fuzzy sets for sentiment classification of Chinese sentences [27]. Wang et al. developed an ensemble learning method, which is a supervised method, to predict sentiment intensity of consumers with an online sequential extreme learning machine and an intuitive fuzzy set [28].

What distinguishes our work from the previous is the multi-granularity fuzzy computing model we proposed. Our model mainly has two key steps. Firstly, when calculating the sentiment intensity of reviews, a multi-granularity computing method which uses three levels of language component-words, phrases and sentences is proposed. The multi-granularity computing method uses sentence types and relations between sentences, which are seldom taken into account in previous computing sentiment intensity of reviews. Secondly, when setting parameters of the unsupervised fuzzy classifier, two self-supervised methods are investigated. The two self-supervised methods use pseudo-labeled training data to avoid the unnecessary trouble of annotating data for each subject domain, as far as we know, which is unprecedented in setting the parameters of unsupervised sentiment classifiers.

Up to now, multi-granular fuzzy linguistic model has been applied successfully in areas such as information retrieval, recommender systems and decision making [33]. But as far as we know, multi-granular fuzzy linguistic model has not been applied in the sentiment classification of reviews. Different from the existing multi-granular fuzzy linguistic model, our proposed model is an unsupervised framework to implement sentiment classification of Chinese reviews. Our model uses three levels of language granularity-words, phrases and sentences to compute the sentiment intensity of reviews, and then identifies the sentiment polarity of reviews based fuzzy sets and sentiment intensity of reviews.

3 A multi-granularity fuzzy computing model

3.1 General framework of multi-granularity fuzzy computing model

The unsupervised methods primarily include two steps. The first step is to compute the sentiment intensity of reviews. In existing unsupervised methods, only sentiment polarity or sentiment intensity of words and phrases is employed by statistics-based methods, sentence types and relations between sentences are always ignored [11]. However, different sentence structures have different effects on sentiment polarity or sentiment intensity of sentence. The second step is to perform sentiment classification of review based on sentiment intensity of reviews. In existing unsupervised methods, sentiment classification is regarded as an either-or problem, and the fuzziness of sentiment intensity is usually ignored [12].

To solve the two problems, we propose a multi-granularity fuzzy computing model (MGFCM) for sentiment classification of reviews. The general framework of MGFCM is described in Fig. 1.

The general framework of MGFCM consists of three sections: review datasets (RD), key sentiemnt morpheme lexicon (KSML) and quantified whole sentiment lexicon (QWSL) as a whole, MGFCM. RD serves as test datasets to verify the performance of MGFCM. KSML and QWSL serve as basis thesaurus of MGFCM. While MGFCM is kernel composed of two key parts.

The first part of MGFCM involves a multi-granularity computing method for calculating sentiment intensity of reviews. Firstly, we break down reviews into three levels of language components by a top-down strategy. Then we adopt a bottom-up strategy to compute sentiment intensity si (w_j) of sentiment word w_j, sentiment intensity si (p_i) of sentiment phrase p_i and sentiment intensity si (s_i) of sentence s_i. Finally, we compute sentiment intensity si (r_i) of review r_i.

The second part of MGFCM is constructing a fuzzy classifier for sentiment classification of reviews. We firstly construct a classification function f_k (si (r_i)) of the fuzzy classifier, and then, we propose two self-supervised methods to learn the parameter k of f_k (si (r_i)).

The two key parts of MGFCM in detail are as follows.

3.2 Multi-granularity computing method

When computing sentiment intensity of reviews, the existing researches concentrate on merely the word level and the phrase level, without taking into account the sentence level. However, different sentence structures have different effects on sentiment polarity or sentiment intensity of sentence. So we propose a multi-granularity computing method, which calculates sentiment intensity of each language unit to sum the overall sentiment intensity of a single review. The whole process of the multi-granularity computing method is described in Fig. 2.

3.2.1 Computing methods on the word level

On the word level, we divide sentiment words into two categories, static sentiment words and active sentiment words. The static sentiment words consist of both words in and outside QWSL. The types of sentiment words and the corresponding method of computing sentiment intensity are shown in Table 1.

For static sentiment words belonging to QWSL, we gain their sentiment intensity by traversing the lexicon. For static sentiment words out of QWSL, we compute their sentiment intensity by statistic method based on morpheme. The specific process for computing sentiment intensity of static sentiment words outside QWSL is demonstrated as follows.

1. For each word w_j which is outside QWSL, split the word into morphemes m_ij.

2. For each morpheme m_ij in word w_j, traverse KSML. If morpheme m_ij is included in KSML, we directly gain its sentiment intensity, otherwise, we appoint the sentiment intensity of morpheme m_ij to be 0.

3. For each word w_j, compute sentiment intensity si (w_j) by equation (1). $si (w_{j}) = \frac{1}{number (m_{ij}, w_{j})} \sum_{i = 1}^{number (m_{ij}, w_{j})} si (m_{ij})$ (1)

Here number (m_ij, w_j) is the number of morphemes m_ij in sentiment words w_j, si (m_ij) is the sentiment intensity of morpheme m_ij.

As for active sentiment words, their sentiment intensity depends on the context. So, we compute sentiment intensity of active sentiment words by including the nearest neighbor in an unfixed length sliding window and in reference to sentiment consistency principle. The concrete steps are as follows.

1. For each active sentiment word w_k, we obtain the initial sentiment intensity si_initial (w_k) of w_k by looking up active sentiment lexicon which is provided by Guohong Fu [27].

2. Using active sentiment word as midpoint, we employ an unfixed length sliding window to seek for forward sentiment words. The maximum of the unfixed length sliding window is the distance between active sentiment word and the beginning word of the sentence.

3. Compute the sentiment intensity of the activesentiment word according to sentiment intensity si (w_nearest) of the nearest noun or adjective w_nearest. The special method can be seen in (2). $si (w_{k}) = sign (si (w_{nearest})) \times s i_{initial} (w_{k})$ (2)Here $\begin{matrix} sign (si (w_{nearest})) = {\begin{matrix} 1, si (w_{nearest}) \geq 0 \\ - 1, si (w_{nearest}) < 0 \end{matrix} \end{matrix}$

For example, in the sentence like “The performance-cost ratio of the pad is high ”, the initial intensity of active sentiment word “high ” is 1. Since sentiment intensity of its front word “performance-cost ratio ” is 0, we gain that sentiment intensity of “high ” in this sentence is 1(1 × 1 =1) according to equation (2). On the contrary, in the sentence like “The risk of investing on the stock is high ”, the word “risk ” in the front of “high ” owns a sentiment intensity of -1. Thus sentiment intensity of “high ” here is -1(-1 ×1 = -1) according to equation (2).

3.2.2 Computing methods on the phrase level

Taking sentiment words as midpoints, we assemble the sentiment phrases with a sliding window of fixed length and some syntax rules. The size of the sliding window is 5 words. The specific rules are shown in Table 2.

When calculating sentiment intensity of sentiment phrases p_k, we come up with the following equation to take into account the effect of adverbs. $si (p_{k}) = f (class (adverb)) \times si (w_{k})$ (3)Here $\begin{matrix} f (class (adverb)) \\ = {\begin{matrix} - 1, class (adverb) \in negativeadverb \\ α, class (adverb) \in deg reeadverb \end{matrix} \end{matrix}$

Here class (adverb) is the type of adverb, α is the weighting coefficient of the degree adverb.

We divide the degree adverb into five kinds according to the degree and assigned to them different weighting factors. The levels of degree adverb and their weighting coefficients are defined in Table 3. We set β = 0.4 according to experience and prior knowledge in [12].

For example, in the sentence “The pad is very beautiful ”, we construct sentiment phrase “very beautiful ” according to Table 2. The sentiment intensity of sentiment word “beautiful “ is 1 by traversing the QWSL. The weighting coefficients of the degree adverbs “very ” is 1.6 (4 × 4 =1.6) based on Table 3. So, the sentiment intensity of the sentiment phrase “very beautiful ” is 1.6 (1.6 × 1 =1.6) according to equation 3.

3.2.3 Computing methods on the sentence level

On the sentence level, we consider different sentence patterns and relationships between sentences to deal with the sentiment intensity of sentences.

Firstly, we merge the sentiment intensity of each word and phrase to sum the overall sentiment intensity of a single sentence.

Secondly, according to punctuations, we segment reviews into different types of sentences, namely declarative sentences, interrogative sentences and exclamatory sentences. Then we adopt different strategy for sentence of each type. The specific strategies are shown in Table 4.

Based on linguistics knowledge, exclamatory sentence can strengthen the sentiment of sentence content. So, when computing the sentiment intensity of exclamatory sentence, we enhance the sentiment intensity of the sentence by assigning 2 to the weighted coefficient.

For example, in the exclamatory sentence “The pad is very beautiful! ”, the sentiment intensity of the sentence “The pad is very beautiful ” is 1.6 according to Section 3.2.2. The sentiment intensity of the exclamatory sentence “The pad is very beautiful! ” is 3.2 (2 × 1.6 = 3.2) based on Table 4.

Thirdly, we analyze the logical relations between sentences by primarily extracting the conjunctions. We mainly process adversative and summary relations between sentences. The processing methods are shown in Table 5.

Based on experience and linguistics knowledge, summary sentence reflect the whole sentiment of reviews. So, when computing the sentiment intensity of summary sentence, we strengthen sentiment intensity of the sentence by assigning 2 to the weighted coefficient.

3.3 Constructing method of the fuzzy classifier

After obtaining the sentiment intensity of the reviews by the multi-granularity computing method, given the fuzziness of sentiment intensity, we construct a fuzzy classifier for sentiment classification of reviews and propose two self-supervised methods to learn the parameter of the fuzzy classifier. We first appoint sentiment categories as fuzzy sets, and then define some member functions of the sentiment categories. According to the principle of maximum membership degree in fuzzy sets, we build a classification function for the fuzzy classifier. Finally, we propose two self-supervised methods to learn the parameter of the classification function.

3.3.1 Designing the membership function of fuzzy sets

Due to the fuzziness of natural language, especially that of sentiment intensity, when estimating sentiment category of reviews, we adopt fuzzy sets other than Cantor-sets to describe sentiment categories.

With the existing review sets R = {r_i} and sentiment intensity si (r_i) of review r_i, we can define the positive sentiment category of R as the fuzzy set P. $P = {(r_{i}, μ_{P} (r_{i})) | r_{i} \in R}$ (4)

Here μ_P (r_i) is the membership function of r_i belonging to the positive sentiment category P. we choose a semi-trapezoid function as the membership function of r_i, which is presented in (5). $μ_{P} (r_{i}) = {\begin{matrix} 0, si (r_{i}) < α \\ \frac{si (r_{i}) - α}{β - α}, α \leq si (r_{i}) \leq β \\ 1, si (r_{i}) > β \end{matrix}$ (5)

Here r_i is a review, si (r_i) is the sentiment intensity of r_i, α, β are adjustable parameters which determine the boundary of membership function.

Similarly, we define the negative sentiment category of R as the fuzzy set N. $N = {(r_{i}, μ_{N} (r_{i})) | r_{i} \in R}$ (6)

Here μ_N (r_i) is the membership function of r_i belonging to negative sentiment category. Also a semi-trapezoid function is chosen as the membership function of r_i. $μ_{N} (r_{i}) = {\begin{matrix} 1, si (r_{i}) < α \\ \frac{β - si (r_{i})}{β - α}, α \leq si (r_{i}) \leq β \\ 0, si (r_{i}) > β \end{matrix}$ (7)

Here r_i is a review, si (r_i) is the sentiment intensity of r_i, α, β are adjustable parameters.

3.3.2 Constructing classification function of the fuzzy classifier

We need to set the values of parameters α, β to complete μ_P (r_i) and μ_N (r_i). After completing μ_P (r_i) and μ_N (r_i), according to the principle of maximum membership, we can identify the sentiment categories of reviews.

In order to decrease the number of parameters, after defining μ_P (r_i) and μ_N (r_i), we unite μ_P (r_i) and μ_N (r_i) into one classification function f_k (si (r_i)) of fuzzy classifier, according to the principle of maximum membership.

$\begin{matrix} f_{k} (si (r_{i})) & = & max {μ_{P} (r_{i}), μ_{N} (r_{i})} \\ = & {\begin{matrix} r_{i} \in N, si (r_{i}) \leq \frac{α + β}{2} \\ r_{i} \in P, si (r_{i}) > \frac{α + β}{2} \end{matrix} \end{matrix}$ (8)

Here we define $k = \frac{α + β}{2}$

We do not directly set the values of the two parameters α, β, but simplify the two parameters to one parameter k by f_k (si (r_i)), and only set the value of para-meter k.

3.3.3 Computing parameter k of the classification funtion

After constructing the classification function of the fuzzy classifier, we come to setting the value of parameter k. At present, the unsupervised methods are mainstream methods for setting parameters in the classification function of fuzzy classifier, while supervised methods are customarily ignored through lack of labeled training data.

Parameter k directly determines the threshold of the fuzzy classifier. In order to get an ideal parameter, we propose two self-supervised methods to set the value of parameter k.

1. Self-supervised method based on the initial pseudo-labeled training datasets

In setting parameter k for the classification function of the fuzzy classifier, existing unsupervised methods use sentiment intensity of reviews and do without labeled training data, but the accuracy is far from satisfactory [27]. So we turn to the supervised methods. But there are two weaknesses in existing supervised methods. Firstly, labeled data as training data is indispensable. Secondly, the domain of training data related to some certain subject has to be consistent with that of test data, while annotating training data for each domain is very costly.

To overcome the two weaknesses, we develop two self-supervised methods, using pseudo-labeled training data, which is unprecedented. Details are as follows.

Firstly, for each review r_i in review datasets RD = {r_i}, we compute sentiment intensity si (r_i) of r_i by the multi-granularity computing method which is presented in Section 3.2.

Secondly, for RD = {r_i}, we reorder reviews in descending order of si (r_i), and therefore obtaining a new reviews sequence RD^T = {r_m}.

Thirdly, for the new reviews sequence RD^T = {r_m}, we appoint the number of positive reviews in RD^T as N_P (RD^T), the number of negative reviews in RD^T as N_N (RD^T) and the number of reviews in RD^T as N (RD^T).

Fourthly, we pick out N_P (RD^T)/2.5 positive reviews and N_N (RD^T)/2.5 negative reviews from RD^T = {r_m} to form an initial positive training datasets PT _ RD^T and an initial negative training datasets NT _ RD^T. The method of constructing PT _ RD^T and NT _ RD^T is shown in (9) and (10). $PT_{RD}^{T} = {r_{m} | m = 1, 2, . . ., \frac{N_{P} (R D^{T})}{l}}$ (9) $\begin{matrix} NT_{RD}^{T} = \\ {r_{m} | m = (N (R D^{T}) - \frac{N_{N} (R D^{T})}{l}), . . ., N (R D^{T})} \end{matrix}$ (10)

Here l = 2.5.

Finally, we obtain the optimum value of parameter k by equation (11) in the initial pseudo-labeled train datasets. $k = \frac{1}{2} (\begin{matrix} \frac{l}{N_{P} (R D^{T})} \sum_{i = 1}^{\frac{N_{P} (R D^{T})}{l}} si (r_{m}) + \\ \frac{l}{N_{N} (R D^{T})} \sum_{i = N (R D^{T}) - \frac{N_{N} (R D^{T})}{l}}^{N (R D^{T})} si (r_{m}) \end{matrix})$ (11)

Here l = 2.5, N_P (RD^T) is the number of positive reviews in RD^T, N_N (RD^T) is the number of negative reviews in RD^T and N (RD^T) is the number of reviews in RD^T.

2. Self-supervised Method Based on the updated initial pseudo-labeled train datasets

The first four steps are similar to that of the self-supervised method above.

In the fifth step, we generate a SVM classifier through PT _ RD^T and NT _ RD^T, and we use the SVM classifier to classify the rest of RD^T. If the sentiment intensity of a review is more than 0 and classification result of the review is positive according to the SVM classifier, we will add the review to PT _ RD^T. If the sentiment intensity of a review is less than 0 and classification result of the review is negative by the SVM classifier, we will add the review to NT _ RD^T. Finally, we obtain a new PT _ RD^T and NT _ RD^T.

In the sixth step, we repeat the fifth step with a new PT _ RD^T and NT _ RD^T. The iteration end until no new review is added to PT _ RD^T and NT _ RD^T.

Finally, we use equation (11) in the updated initial pseudo-labeled training datasets to obtain the optimum value of parameter k.

The value of parameters k in four different datasets are presented in Table 10 in part 5.2.

4 Construction of KSML and QWSL

In existing Chinese sentiment lexicons, sentiment words are divided into two classes— positive and negative, sentiment intensity of sentiment words is not further quantified. In order to solve the problem, based on the existing three Chinese sentiment lexicons (Tsinghua University sentiment lexicon (TUSL), National Taiwan University sentiment lexicon (NTUSL) and Hownet), we at first construct a KSML, and then based on KSML, we construct a QWSL. The whole process of our method is showed in Fig. 3.

Our method includes three steps. The first step is to construct a key sentiment word set (KSWS) which consists of both positive key sentiment words set (P_KSWS) and negative key sentiment words set (N_KSWS). we propose a method to extract unambiguous P_KSWS and N_KSWS.

The second step is to construct a key sentiment morpheme lexicon (KSML) which consists of positive key sentiment morphemes list (P_KSML) and negative key sentiment morphemes list (N_KSML). We at first split each sentiment word in KSWS into morphemes, and then put morphemes together as key sentiment morpheme sets (KSMS). Then we compute the sentiment intensity of each morpheme in KSMS by statistic method.

The third step is to construct a quantified whole sentiment lexicon (QWSL) which consists of unambiguous positive whole sentiment lexicon (P_WSL) and negative whole sentiment lexicon (N_WSL). We firstly construct a whole sentiment word set (WSWS) by incorporating typical Chinese sentiment lexicon, and then compute sentiment intensity of each word in WSWS with morpheme in KSML. At last, we get a QWSL.

4.1 Constructing KSWS

When constructing KSWS, we notice the fact that there are some sentiment words whose polarities are ambiguous among different sentiment lexicons. With the existing three Chinese sentiment lexicons (SL_i) consisting of both positive sentiment words list (P _ SWL_i) and negative sentiment word list (N _ SWL_i), we get some sentiment words of which the polarity is ambiguous. Table 6 presents the number of the sentiment words whose polarities are ambiguous between P _ SWL_i and N _ SWL_i.

From Table 6, we can see that there are some sentiment words whose polarities are ambiguous between P _ SWL_i and N _ SWL_i, which demonstrates that polarity of sentiment words is not always consistent among different sentiment lexicons. So, we eliminate these ambiguous sentiment words from P _ SWL_i and N _ SWL_i. Finally, we construct KSWS by choosing the sentiment words which are unambiguous in polarity and at least included in two sentiment lexicons. The method of constructing KSWS is shown in (12)and (13). $\begin{matrix} P_KSWS = ⋃_{i < j} (P_SW L_{i} \cap P_SWS L_{j}) \\ - {\begin{matrix} (⋃_{i < j} (P_SW L_{i} \cap P_SW L_{j})) \\ \cap (⋃_{i < j} (N_SW L_{i} \cap N_SW L_{j})) \end{matrix}} \end{matrix}$ (12) $\begin{matrix} N_KSWS = ⋃_{i < j} (N_SW L_{i} \cap N_SW L_{j}) \\ - {\begin{matrix} (⋃_{i < j} (P_SW L_{i} \cap P_SW L_{j})) \\ \cap (⋃_{i < j} (N_SW L_{i} \cap N_SW L_{j})) \end{matrix}} \end{matrix}$ (13)

Here i = {1, 2, 3}, j = {1, 2, 3}. The number of sentiment words in SL_i and KSWS is presented in Table 7.

4.2 Constructing KSML

KSML is composed of sentiment morphemes and sentiment intensity of sentiment morphemes. There are mainly four key steps in constructing sentiment morphemes and computing sentiment intensity of sentiment morphemes.

1. For each word in KSWS, we split it into morphemes and consequently construct a key sentiment morpheme set (KSMS).

2. For each morpheme m_i in KSMS, we calculate positive frequency f _ p (m_i) of m_i of appearing in P _ KSWS and negative frequency f _ n (m_i) of m_i of appearing in N _ KSWS of it in (14) and (15). $f_p (m_{i}) = \frac{number (m_{i}, P_KSWS)}{number (P_KSWS)}$ (14) $f_n (m_{i}) = \frac{number (m_{i}, N_KSWS)}{number (N_KSWS)}$ (15)

Here number (m_i, P _ KSWS) is the number of positive sentiment words containing the morpheme m_i, number (m_i, N _ KSWS) is that of negative sentiment words that contain the morpheme m_i, number (P _ KSWS) is the number of sentiment words in P _ KSWS, and number (N _ KSWS) is the number of sentiment words in N _ KSWS.

3. For each morpheme m_i in KSMS, we compute positive sentiment intensity si _ p (m_i) of m_i and negative sentiment intensity si _ n (m_i) of m_i of it in (16) and (17). $si_p (m_{i}) = \frac{f_p (m_{i})}{f_p (m_{i}) + f_n (m_{i})}$ (16) $si_n (m_{i}) = \frac{f_n (m_{i})}{f_p (m_{i}) + f_n (m_{i})}$ (17)

4. With si _ p (m_i) and si _ n (m_i) of m_i available, we compute sentiment intensity si (m_i) of m_i in (18). $si (m_{i}) = si_p (m_{i}) - si_n (m_{i})$ (18)

4.3 Constructing QWSL

We combine the three typical Chinese sentiment lexicons (TUSL, NTUSL and Hownet) into a quantified whole sentiment lexicon (QWSL). To ensure that the polarity of the sentiment words is unambiguous, we eliminate the sentiment words whose polarities are ambiguous in QWSL. The specific method of constructing QWSL is described as follows.

For each SL_i which consists of P _ SWL_i and N _ SWL_i, we build unambiguous positive whole sentiment word set (P _ WSWS) and negative whole sentiment word set (N _ WSWS) according to (19) and (20). $\begin{matrix} P_WSWS & = ⋃_{i} P_SW L_{i} \\ - {⋃_{i} P_SW L_{i} \cap ⋃_{i} N_SW L_{i}} \end{matrix}$ (19) $\begin{matrix} N_WSWS & = ⋃_{i} N_SW L_{i} \\ - {⋃_{i} P_SW L_{i} \cap ⋃_{i} N_SW L_{i}} \end{matrix}$ (20)

Here i = {1, 2, 3}, j = {1, 2, 3}. After obtaining unambiguous whole sentiment word set (WSWS), For each word w_j in WSWS, we compute sentiment intensity si (w_j) of w_j according to (21). $si (w_{j}) = \frac{1}{number (m_{i}, w_{j})} \sum_{i = 1}^{number (m_{i}, w_{j})} si (m_{i})$ (21)

Here number (m_i, w_j) is the number of the morpheme m_i included in the sentiment words w_j. The si (m_i) is the sentiment intensity of m_i. Finally, w_j and si (w_j) make up QWSL.

5 Performance evaluation

In this section, we firstly described the datasets and metric. Secondly, we discussed how to choose the optimum parameter k of MGFCM. Thirdly, we compared the performance of different methods and proved the efficiency of MGFCM. Fourthly, we employed the statistical significance tests of different methods. Finally, we discussed the effect that different parameter k have on the accuracy of MGFCM.

5.1 Datasets and metric

In order to prove the effectiveness of our model in different domains and unbalance datasets, we chose three balanced (books, hotels and notebooks) and one unbalanced datasets (hotels) from different domains as test datasets. The four datasets are provide by Songbo Tan (http://www.datatang.com/data/11936/). Each dataset consists of positive reviews and negative reviews. The four datasets is named after Books, Hotels, Notebooks and Hotels(U). The four datasets are summarized in Table 8.

Since our task is sentiment classification of reviews, we chose classification indicators: precision, recall, F1 measure and accuracy as metric. The four indicators are defined as follows. $Macro_P = \frac{1}{2} (\frac{R_{1}}{R_{1} + W_{2}} + \frac{R_{2}}{R_{2} + W_{1}})$ (22) $Macro_R = \frac{1}{2} (\frac{R_{1}}{R_{1} + W_{1}} + \frac{R_{2}}{R_{2} + W_{2}})$ (23) $Macro_F 1 = \frac{2 \times Macro_P \times Macro_R}{Macro_P + Macro_R}$ (24) $Acuracy = \frac{R_{1} + R_{2}}{W_{1} + W_{2} + R_{1} + R_{2}}$ (25)

Here R₁, R₂, W₁, W₂ are defined in Table 9.

To evaluate the overall performance of our model, we compared our model with existing methods on four different datasets. These methods are depicted as following:

MBSL: Method based on sentiment lexicon, which is presented in [10]. MBSL compute the sentiment intensity of reviews only using sentiment words and sentiment phrases. The sentiment intensity of sentiment words is gained by traversing sentiment lexicon. The sentiment intensity of sentiment phrases is gained by considering effect of negative adverb and degree adverb. After getting sentiment intensity of reviews, MBSL accomplishes sentiment classification by comparing sentiment intensity of reviews with threshold “0”.

MBSLFS: Method based on sentiment lexicon and fuzzy sets, which is presented in [27]. Similar to MBSL, MBSLFS computes the sentiment intensity of reviews only using sentiment words and sentiment phrases. After getting sentiment intensity of reviews, different from MBSL, MBSLFS accomplishes sentiment classification by comparing sentiment intensity of reviews with threshold “k”, which is used in sentiment classification of Chinese reviews.

MBMGC: Method based on multi-granularity computing which computes sentiment intensity of reviews by multi-granularity computing method and compares the sentiment intensity with only a single threshold “0”.

MBMGFCM1: Method based on multi-granularity fuzzy computing model which sets parameters by a self-supervised method using the initial pseudo-labeled train datasets.

MBMGFCM2: Method based on multi-granularity fuzzy computing model which sets parameters by a self-supervised method using the updated initial pseudo-labeled train datasets.

In order to further demonstrate the effect of our methods, we constructed a supervised method to learn parameter k of MGFCM. For each datasets, we randomly chose 40% reviews as training datasets, the rest 60% reviews as test datasets. And then, we used train datasets to learn parameter k of MGFCM and used test datasets to validate the performances of MGFCM. We repeated the experiments 10 times and report the results as the average of the 10 experiments. The method is depicted as following.

MBMGFCML: Method based on multi-granularity fuzzy computing model which sets parameters by using the labeled train datasets. The only difference between MBMGFCML and MBMGFCM1 is that MBMGFCML uses the labeled train datasets to estimate the parameter of fuzzy classifier, but MBMGFCM1 uses the initial pseudo-labeled train datasets.

We conducted different experiments with the four datasets to solve three problems.

1. How to set the parameter k in the classification function of the fuzzy classifier;

2. Studying the performance of our model in sentiment classification of reviews;

3. Analyzing the effect of different parameter k may have on the accuracy of our model;

5.2 Parameter initialization of MGFCM

In MGFCM, we have proposed two self-supervised methods to learn parameter k. One is a self-supervised method based on initial pseudo-labeled training datasets, the other one is a self-supervised method based on updated initial pseudo-labeled training datasets.

Firstly, we picked out 40% reviews from both ends of the descending list of reviews as the initial pseudo-labeled training datasets, and then we employed the two self-supervised methods mentioned in Section 3.3.3 to find the optimum value of parameter k. Finally, we obtained optimum parameters by the two self-supervised methods respectively. The special parameters are shown in Table 10.

5.3 Performance comparison of different methods

In order to verify performance of our model, we compared our model with MBSL, MBSLFS and MBMGFCML in four datasets. The experiment result is shown in Table 11 and Fig. 4.

From Table 11 and Fig. 4, we can see the value of the four classification indicators: precision,recall, F1 and the accuracy of MBMGC are higher than that of MBSL in four different datasets. The average accuracy of MBMGC and MBSL is 0.7748 and 0.7141 respectively on the four datasets. The MBMGC enjoys about 8.5% ((0.7748-0.7141)/0.7141) more accuracy than MBSL in sentiment classification of reviews, which proves that the multi-granularity computing method is more efficient and accurate than only taking into account sentiment polarity or sentiment intensity of words and phrases.

At the same time, we can see the accuracy of MBSLFS is higher than that of MBSL, and the accuracy of MBMGFCM1 and MBMGFCM2 is higher than that of MBMGC on the four different datasets. MBMGFCM1 and MBMGFCM2 improve about 3.77% ((0.8040-0.7748)/0.7748) and 4.48% ((0.8095-0.7748)/0.7748) more accuracy than MBMGC. This proves that sentiment classifiers based on a fuzzy set are more efficient than sentiment classifiers based on Cantor-sets.

The average accuracy of MBMGFCM1 and MBMGFCM2 is 0.8040 and 0.8095. MBMGFCM2 has 0.68% ((0.8095-0.8040)/0.8040) more accuracy than MBMGFCM1. This demonstrates that the self-supervised method based on the updated initial pseudo-labeled training datasets is superior to the self-supervised method based on the initial pseudo-labeled training datasets in setting the parameter k of MBMGFCM.

In Hotels datasets and Hotels(U) datasets, we can found that MBMGFCM1 only improves 0.28% ((0.7965-0.7943)/0.7943) and 0.35% ((0.823-0.8201)/0.8201) accuracy than MBMGC. But in Books datasets and Notebooks datasets, we can found that MBMGFCM1 improves 8.9% ((0.8073-0.741)/0.741) and 6.1% ((0.7893-0.7438)/0.7438) accuracy than MBMGC. This demonstrates that the performance of our methods is different in datasets from different domains. The reason is that distribution of sentiment words is diverse in different domains.

The average accuracy of MBMGFCML and MBMGFCM1 is 0.8048 and 0.8040. MBMGFCML has only improve 0.1% ((0.8048-0.8040)/0.8040) more accuracy than MBMGFCM1. This demonstrates that MBMGFCM1 with self-supervised is competitive with MBMGFCML using labeled training data. At the same time, we can see that MBMGFCM2 has 0.58% ((0.8095-0.8048)/0.8048) more accuracy than MBMGFCML. This demonstrates that MBMGFCM2 using more pseudo-labeled train data is superior to MBMGFCML using less labelel training data.

5.4 Statistical significance tests of different methods

In order to measure the statistical significance of our methods compared to other models, we firstly employed a chi-squared test among six different methods to compare the statistical significance of six different methods. The test results were shown in Table 12. And then, we respectively employed the chi-squared test between our methods and two baselines to compare the statistical significance of our methods and two baselines methods. The test results were shown in Table 13.

From Table 12, we can see $χ_{0.05 (5)}^{2} = 11 . 07$ , χ² > $χ_{0.05 (5)}^{2}$ in four different datasets, under α = 0.05 level, there are significant differences among the six different methods.

Furtherly, from Table 13, we can see $χ_{0.05 (1)}^{2} = 3 . 84$ , χ² of our four methods (MBMGC, MBMGFCML, MBMGFCM1 and MBMGFCM2) and two baseline methods (MBSL and MBSLFS) is all greater than $χ_{0.05 (1)}^{2}$ in four different datasets. Under α = 0.05 level, there are significant differences between our methods and two baseline methods.

5.5 Parameter optimization of MGFCM

In order to study the effect that a different parameter k may have on the accuracy of sentiment classification of reviews, we carried out some experiments using different values of parameter k in the four datasets. The results are presented in Fig. 5.

From Fig. 5, we know that the diffesrent parameter k plays different roles on the accuracy of sentiment classification. So we may get a different result if a different parameter k is chosen. But in general, MGFCM always enjoys higher accuracy than MBSL and MBSLFS.

6 Conclusion

In this paper, we propose a multi-granularity fuzzy computing model for sentiment classification of Chinese reviews. The contribution of our paper mainly involves two aspects. First, a multi-granularity computing method is contrived to compute sentiment intensity of reviews through synthesizing the language units of all levels. Second, given the fuzziness of sentiment intensity, a fuzzy classifier is built to facilitate sentiment classification of reviews. Furthermore, two self-supervised methods are designed to set an optimum parameter for the classification function of fuzzy classifier.

Acknowledgments

We thank Lei liu, Qing Cheng and the anonymous reviewers for helpful comments. This work was supported in part by the National Natural Science Foundation of China (U1405254, 61472092, 61402115, 61271392).

References

Pang

and Lee

, Opinion mining and sentiment analysis, Found Trends Inf Retr 2(1-2) (2008), 1–135.

Thelwall

, Buckley

and Paltoglou

, Sentiment in twitter events, J Am Soc Inf Sci Technol 62(2) (2011), 406–418.

O’Leary

D.E.

, Blog mining-review and extensions: “From each according to his opinion, Decis Support Syst 51(4) (2011), 821–830.

Bing

, Sentiment analysis and opinion mining, Synthesis Lectures on Human Language Technologies 5(1) (2012), 1–167.

Kim

and Lee

, Sentiment visualization and classification via semi-supervised nonlinear dimensionality reduction, Pattern Recognition 47(2) (2014), 758–768.

Pang

, Lee

and Vaithyanathan

, Thumbs up?: Sentiment classification using machine learning techniques, In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing –Volume 10, EMNLP 777 ’02, Stroudsburg, PA, USA, 2002, pp. 79–86. Association for Computational Linguistics.

Wang

B.-K.

, Huang

Y.-F.

, Yang

W.-X.

and Xing

, Short text classification based on strong feature thesaurus, Journal of Zhejiang University Science C 13(9) (2012), 649–659.

Xiao

and Guo

, Feature space independent semisupervised domain adaptation via kernel matching, Pattern Analysis and Machine Intelligence, IEEE Transactions on 37(1) (2015), 54–66.

Pan

S.J.

, Ni

, Sun

J.-T.

, Yang

and Chen

, Cross-domain sentiment classification via spectral feature alignment, In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, New York, NY, USA, 2010, pp. 751–760. ACM.

10.

Zhai

, Xu

and Jia

, An empirical study of unsupervised sentiment classification of chinese reviews, Tsinghua Science & Technology 15(6) (2010), 702–708.

11.

Turney

P.D.

, Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, Stroudsburg, PA, USA, 2002, pp. 417–424. Association for Computational Linguistics.

12.

Taboada

, Brooke

, Tofiloski

, Voll

and Stede.

, Lexicon-based methods for sentiment analysis, Comput Linguist 37(2) (2011), 267–307.

13.

Lewis

D.D.

, Naive (bayes) at forty: The independence assumption in information retrieval. In Proceedings of the 10th European Conference on Machine Learning, ECML ’98, London, UK, Springer–Verlag, 1998, pp. 4–15.

14.

Joachims

, Text categorization with suport vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning, ECML ’98, London, UK, , Springer–Verlag, 1998, pp. 137–142.

15.

Huang

G.-B.

, Zhu

Q.-Y.

and Siew

C.-K.

, Extreme learning machine: Theory and applications, Neurocomputing 70(1-3) (2006), 489–501.

16.

Neviarouskaya

, Prendinger

and Ishizuka

, Sentiful: A lexicon for sentiment analysis, IEEE Trans Affect Comput 2(1) (2011), 22–36.

17.

Tan

and Zhang

, An empirical study of sentiment analysis for chinese documents, Expert Systems with Applications 34(4) (2008), 2622–2629.

18.

Neviarouskaya

, Prendinger

and Ishizuka

, Affect analysis model: Novel rule-based approach to affect sensing from text, Nat Lang Eng 17(1) (2011), 95–135.

19.

Mohammad

, Dunne

and Dorr

, Generating highcoverage semantic orientation lexicons from overtly marked words and a thesaurus. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 –Volume 2, EMNLP ’09, Stroudsburg, PA, USA, 2009, pp. 599–608. Association for Computational Linguistics.

20.

Williams

G.K.

and Anand

S.S.

, Anand, Predicting the polarity strength of adjectives usingwordnet. In Adar

Eytan

, Hurst

Matthew

, Finin

Tim

, Glance

Natalie S.

, Nicolov

Nicolas

and Tseng

Belle L.

, editors, ICWSM. The AAAI Press, 2009.

21.

Dragut

E.C.

, Yu

, Sistla

and Meng

, Construction of a sentimental word dictionary. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, New York, NY, USA, 2010, pp. 1761–1764. ACM.

22.

Kanayama

and Nasukawa

, Fully automatic lexicon expan- 838sion for domain-oriented sentiment analysis. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, Stroudsburg, PA, USA, 2006), pp. 355–363. Association for Computational Linguistics.

23.

Wang

, Min

, Huang

, Liu

, Li

, Sun

and Sun

, Chinese reviews sentiment classification based on quantified sentiment lexicon and fuzzy set, In Information Science and Technology (ICIST), 2013 International Conference on, 2013), pp. 677–680.

24.

Yuen

R.W.M.

, Chan

T.Y.W.

, Lai

T.B.Y.

, Kwong

O.Y.

and T’sou

B.K.Y.

, Morpheme-based derivation of bipolar semantic orientation of chinesewords, In Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04, Stroudsburg, PA, USA, 2004. Association for Computational Linguistics.

25.

L.-W.

, Lee

L.-Y.

and Chen

H.-H.

, Opinion extraction, summarization and tracking in news and blog corpora, In Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.

26.

L.-W.

, Huang

T.-H.

and Chen

H.-H.

, Using morphological and syntactic structures for chinese opinion analysis. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP ’09, Stroudsburg, PA, USA, 2009, pp. 1260–1269. Association for Computational Linguistics.

27.

, Kit

and Webster

J.J.

, Chinese word segmentation as morpheme-based lexical chunking, Inf Sci 178(9) (2008), 2282–2296.

28.

Baccianella

, Esuli

and Sebastiani

, Sentiwordnet 3.0:An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, 2010). European Language Resources Association (ELRA).

29.

Zhou

, Chen

and Wang

, Fuzzy deep belief networks for semi-supervised sentiment classification, Neurocomput 131 (2014), 312–322.

30.

and Wang

, Chinese sentence-level sentiment classification based on fuzzy sets. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, Stroudsburg, PA, USA, 2010, pp. 312–319. Association for Computational Linguistics.

31.

Wang

, Huang

, Wu

and Xing

, A fuzzy computing model for identifying polarity of chinese sentiment words, Computational Intelligence and Neuroscience 2015(2015) (2010), 13.

32.

Wang

, Qian

and Feng

X.-Q.

, Predicting consumer sentiments using online sequential extreme learning machine and intuitionistic fuzzy sets, Neural Computing and Applications 22(3-4) (2013), 479–489.

33.

Morente-Molinera

J.A.

, Prez

I.J.

, Urean

M.R.

and Herrera-Viedma

, On multi-granular fuzzy linguistic modeling in group decision making problems: A systematic review and future trends, Knowledge-Based Systems 74 2015, 49–60.