Semi-supervised collective extraction of opinion target and opinion word from online reviews based on active labeling

Abstract

Online reviews play important roles in many Web Applications like e-business and government intelligence, since such user-generated-contents (UGC) contain rich user opinion. Opinion target and opinion word are a pair of core objects for user opinion expression in reviews. Extracting these two objects from reviews is crucial for the tasks of opinion mining. However, traditional extraction methods have various limitations such as ignoring the opinion relationship, the restriction of word span, the error propagation caused by iterative expansion, which would reduce the extraction performance. For the above deficiencies, we propose a supervised method based on the constrained word alignment model to extract opinion target and opinion word collectively at first. To tackle the time-consuming and error-prone problem of manual annotation encountered by the supervised method, we further devise a semi-supervised extraction method based on active learning. In this method, we design the sample uncertainty-based sampling strategy and the feature evidence-based one to choose the most informative samples for labeling manually. At last, a series of experiments on a real-world dataset show that our approaches outperform several state-of-the-art baselines significantly.

Keywords

Collective extraction opinion target opinion word active learning uncertainty measurement

1 Introduction

Opinion mining, also called sentiment analysis, has received a great deal of attentions in the past few years [1], which targets at understanding what opinion users express in various medias like texts, images and music. Online review plays an important role for many practical Web applications, especially for the e-commerce and government intelligence, in the thriving Internet environment. According to a Cone Inc.’s report [2], about four-out-of-five consumers have changed their purchase intention based solely on negative information. And positive information has a similar influence on decision making. However, it is impracticable to investigate and analyze the users’ opinion from massive reviews manually. Then, it brings urgent need for dealing with suck tasks automatically.

Compared with the coarse-grained opinion mining, such as on document level [3 –5] and topic level [6 –8], the fine-grained one [9 –11] is more suitable to meet the requirements of practical applications, since the latter explores the users’ opinion more finely and precisely. In this case, both opinion target and opinion word are the crucial objects. The former describes what is user opinion focused on, the latter indicates what type of opinion user wants to express. For example, in the sentence “the screen is clear and smooth”, “screen” is the opinion target, “light” and “smooth” are the opinion words. These objects contain all information on the user’s opinion in this comment. Thus, identifying and extracting the opinion targets and opinion words from reviews is one of the main key tasks for opinion mining.

The existing methods usually utilize modification relationship to extract the pairs of opinion target and opinion word (opinion pairs for short). These methods include the nearest neighbor-based methods [12, 13] and the syntactic pattern-based ones [14 –16]. The nearest neighbor-based methods use a fix length window to mine opinion targets and the opinion words based on the words’ modification relationship, which could achieve relatively poor extraction precision due to the limitation of predefined window size. The syntactic pattern-based methods are able to avoid the limitation of word span, but the predefined syntactic patterns would reduce the extraction recall. Moreover, most of the existing methods apply an iteration expansion strategy to enlarge the opinion target set and the opinion word set alternately. In such cases, some errors occurred in previous processes would propagate to the subsequent extraction processes.

To tackle these problems above encountered by existing methods, we propose a supervised collective extraction method for opinion pairs at first. In this framework, we treat all of the possible combinations of two different words in a sentence as the opinion pair candidates. Inspired by the Word Alignment Model [17], we capture such relationship by a monolingual constrained Word Alignment Model, in which we constrain that a word would construct the opinion pair candidates with different words in the same sentence. As shown in Fig. 1, “phone” would generate five opinion pair candidates with different words in the first sentence, and “works” would generate two candidates. For these candidates, we can construct a vector space model with feature engineering. Then, the extraction problem for the opinion pair can be treated as a classification task. By this way, we avoid the limitation the word span and predefined syntactic patterns at first. On the other side, the error propagation occurred in the alternate iterative extraction process of opinion targets and opinion words would not be generated, since we extract these two objects simultaneously.

Fig.1

Mining the pair of opinion target and opinion word using a constrained Word Alignment Model.

In summary, our work can be concluded mainly as follows:

We propose a supervised collective extraction for opinion pairs based on the constrained word alignment model, in which the extraction can be treated as a binary classification task. Compared with previous nearest-neighbor rules and syntactic patterns, the model can capture long-span modified relations and utterly avoid the problem of error propagation and dependence on parsing performance, which are encountered by existing methods.

We further design a semi-supervised extraction method based on active learning, since labeling samples for training is time-consuming and error-prone. In this framework, the most informative samples will be chosen to label by human, by which we can achieve the good extraction performance with less labeled samples.

In order to choose the most informative samples for labeling, we explore two sampling strategies. The first one focuses on the sample’s uncertainty, the other one is based on the feature’s evidence.

We conduct a real-world data for the collective extraction of opinion pairs. Moreover, we carry out a series of experiments to verify the proposed methods. The experiment results show that our proposed methods’ extraction effectiveness are promoted significantly compared with those of several state-of-the-art competitors.

The rest of the paper is organized as follows. We introduce the related work in Section 2. Section 3 declares the problem formally. Section 4 proposes a supervised collective extraction method for opinion pairs based on the constrained word alignment model. Section 5 presents a semi-supervised extraction method for opinion pairs by integrating an active labeling process to tackle the time-consuming and error-prone problem of manual annotation, and two sampling strategies are also explored in this section. The experiments and discussions are carried out to evaluate the proposed methods in Section 6. Section 7 draws the conclusions.

2 Related works

Opinion mining is an active research field in recent years, in which mining the opinion targets and opinion words has received many attentions due to their importance. Previous methods mainly focused on utilizing the opinion relations among word to extract the opinion pairs, in which the intuition is that opinion target would occur together with opinion word, and there are strong modification relationship between them.

Some previous works used the nearest neighbor-based methods to capture such modification relationship, in which they regarded opinion target should be surrounded by opinion word(s) within a given window, and vice versa. Hu and Liu applied the association rule mining to find the product features at first, and then they extracted the nearest adjective that modifies the product feature as the opinion word [12]. Wang et al. used a fix length window to filter the opinion pair candidates, and calculated their association scores by a revised mutual information [13]. In such methods, modification relationship cannot be captured precisely due to the limitation of window size, especially for long-span combinations, which would reduce the extraction performance.

The syntactic information like dependency trees is another important clue to mine the opinion pairs [14 , 18], then the syntactic-based methods can avoid the limitation of window size. In [15], Zhang et al. reported that the syntax-based methods could yield better performance than the nearest neighbor-based methods for small or medium corpora. Popsecu et al. used syntactic patterns to extract opinion target candidates, and they computed the point-wise mutual information (PMI) score between a candidate and a product category to refine the extracted results [14]. Kobayashi et al. used syntactic patterns learned via pattern mining to extract the opinion pairs in [19]. In [16], Li et al. extracted product features and opinion words through pattern-based bootstrapping, and exploited Prevalence and Reliability to assess both patterns and features. The performance of syntax-based methods heavily depends on the parsing performance, which would affect the extraction effectiveness due to the error propagation, because reviews could include mistakes, such as grammar mistakes, typos, improper punctuations.

Extracting opinion pairs is also regarded as a sequence labeling task [20, 21], where some classical sequence labeling methods like the Conditional Random Fields (CRFs) [22] and the Hidden Markov Model (HMM) [23] can be used to construct the extractor. These supervised methods need to label the training samples which is a time-consuming and error-prone process. Moreover, if training samples are insufficient or training samples and extracted samples belong to different domains, it would lead to poor extracting performance. Some graph-based extraction methods were proposed recently. Liu et al. presented an alignment-based approach with graph co-raking to collectively extract opinion target and opinion word in [24]. Wang et al. utilized a sorting algorithm on graph to estimate the confidence of all candidates in [25].

The Double Propagation model applies a bootstrapping strategy to implement the extraction, which can be used to generate the opinion word set and the opinion target set alternately. Qiu et al. expanded a domain sentiment lexicon and an opinion target set iteratively by a Double Propagation method. In [26], they exploited direct dependency relations between words to extract opinion targets and opinion words iteratively. Zhang et al. extended Qiu’s method with some other patterns like phrase/sentence patterns to increase the extraction recall in [15]. In this type of methods, opinion word set and opinion target set are always treated separately. Rather, these two sets are enlarged alternatively based on a predefined seed set. Our approach differs from such methods that we try to extract the pairs of opinion target and opinion word from the reviews directly.

3 Problem statement

For the convenience narration, we introduce some symbols firstly. Assuming S = {s₁, s₂, ⋯, s_n} to be the sentence set, in which s_i is a sentence coming from the online review corpus; O_T is the candidate opinion target set, ${ot}_{i}^{j} \in O_{T}$ is the j^th candidate opinion target in sentence s_i; O_W is the candidate opinion word set, ${ow}_{i}^{k} \in O_{W}$ is the k^th candidate opinion word in sentence s_i.

Users’ opinion expression mainly depends on the opinion targets and the opinion words. Then, we can summary users’ opinion with an ordered pair $〈 {ot}_{i}^{j}, {ow}_{i}^{k} 〉$ for the sentence s_i, which is called opinion pair in the rest of this paper. Notably, we restrict that an opinion target and the corresponding opinion word(s) would belong to the same sentence, which is consistent with users’ expression habits in general. This restriction would reduce the size of candidate ordered pairs significantly without extraction performance loss.

Formally, we try to learn a function $f : O^{'} \to C$ for each candidate, which could find the pairs of opinion target and opinion word in reviews. Here, $O^{'} \subseteq O_{T} \times O_{W}$ is the set of candidate opinion pairs and C = {0, 1}, where 1 means a candidate opinion pair is a true opinion pair and 0 means the false one. Formally, our target can be concluded as follows. $f^{*} = \underset{f \in F}{argmin} \sum_{〈 {ot}_{i}^{j}, {ow}_{i}^{k} 〉 \in O'} loss (f (〈 {ot}_{i}^{j}, {ow}_{i}^{k} 〉), l)$ (1) where F is the set of function hypothesis, loss is a predefined loss function such as the 0-1 loss, the hinge loss and the squared loss, l ∈ C is a mapping value for the candidate opinion pair $〈 {ot}_{i}^{j}, {ow}_{i}^{k} 〉$ under the function f.

4 The supervised collective extraction for opinion targets and opinion words

As discussed above, each word would be able to construct opinion pair candidates with different words in the same sentence in the constrained Word Alignment Model.

4.1 The constrained word alignment model

Given a sentence s_i with N words, let the sentence’s word set $S_{i} = {w_{i}^{1}, w_{i}^{2}, \dots, w_{i}^{N}}$ , $w_{i}^{j}$ is the j^th word of s_i. The word alignment $A = {〈 w_{i}^{j}, w_{i}^{k} 〉 | j \in {1, \dots, N}, k \in {1, \dots, N}}$ can be obtained by maximizing the word alignment probability of the sentence, according to the following equation. $A^{*} = \underset{A}{arg \max} P (A | S_{i})$ (2) where $〈 w_{i}^{j}, w_{i}^{k} 〉 \in A$ means that the word $w_{i}^{j}$ is aligned with the word $w_{i}^{k}$ .

To reduce the training space, we constrain that $w_{i}^{j}$ is different from $w_{i}^{k}$ , although we regard that all of the words in S_i could be the candidates for $〈 w_{i}^{j}, w_{i}^{k} 〉$ . Formally, given a constrained alignment $\hat{A} = {〈 w_{i}^{j}, w_{i}^{k} 〉 | j \in {1, \dots, N}, k \in {1, \dots, N}, w_{i}^{j} \neq w_{i}^{k}}$ , the Equation 2 can be rewritten as follows. $A^{*} = \underset{A}{arg \max} P (A | S_{i}, \hat{A})$ (3)

The probability of the constrained alignment sequence can be evaluated by Equation 4. $P (A | S_{i}) \propto \prod_{j = 1}^{N} t (w_{i}^{j} | w_{i}^{k})$ (4) where $t (w_{i}^{j} | w_{i}^{k})$ is the possibility of the word $w_{i}^{j}$ collocating with the word $w_{i}^{k}$ .

Some confidence estimation methods on graph can be used to determine the $t (w_{i}^{j} | w_{i}^{k})$ , such as the random walk (used in [24]) and the HITS (used in [15]). In this work, we treat the confidence estimation on $t (w_{i}^{j} | w_{i}^{k})$ as a supervised learning process, which is more flexible for integrating different features like part-of-speech (POS), location information, and so on.

4.2 Feature engineering

Let X to be the random variable representing as an opinion pair candidate $〈 {ot}_{i}^{j}, {ow}_{i}^{k} 〉$ for brevity, x to be a particular instance of X. Each instance can be described as a n-dimensional vector x = [x₁, x₂, …, x_n], where n is an integer depending on the number of the features used to describe the instance. Similarly, let Y to be the class variable of the instance Y, and y ∈ C to be a particular instance of Y. Then, the labeled sample set in which the samples’ labels are known can be denoted as L = {〈x⁽ⁱ⁾, y⁽ⁱ⁾〉|i = 1, ⋯, N₁}, and the unlabeled sample set U = {x⁽ⁱ⁾|i = 1, ⋯, N₂}.

We use seven features to represent different properties of opinion pair instances, which are shown in Table 1. F1 and F2 describe the POS information of opinion target ${ot}_{i}^{j}$ and opinion word ${ow}_{i}^{k}$ respectively. F3 and F4 describe the location information of ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$ in sentence s_i respectively. F5 presents the distance of ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$ , which is measured by the number of words between ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$ . F6 calculates ${ot}_{i}^{j}$ ’s frequency appearing in review corpus. F7 denotes the dependency representation of ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$ , which describes the grammatical relationship between them in sentence s_i. We can use the existing tools, such as the Stanford Parser [27], to parse the dependencies between individual words of a sentence.

Table 1
The feature presentation for the opinion pair instance $〈 {ot}_{i}^{j}, {ow}_{i}^{k} 〉$

Features Description

F1 The part-of-speech of ${ot}_{i}^{j}$

F2 The part-of-speech of ${ow}_{i}^{k}$

F3 The location of ${ot}_{i}^{j}$

F4 The location of ${ow}_{i}^{k}$

F5 The distance between ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$

F6 The frequency of ${ot}_{i}^{j}$

F7 The dependency representation of ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$

Features	Description
F1	The part-of-speech of ${ot}_{i}^{j}$
F2	The part-of-speech of ${ow}_{i}^{k}$
F3	The location of ${ot}_{i}^{j}$
F4	The location of ${ow}_{i}^{k}$
F5	The distance between ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$
F6	The frequency of ${ot}_{i}^{j}$
F7	The dependency representation of ${ot}_{i}^{j}$ and ${ow}_{i}^{k}$

Based on the vectorization for instances, we can use the existing classification methods to make prediction for an unlabeled sample, such as the Logistic Regression, the Support Vector Machine. For example, the parametric model assumed by Logistic Regression is as follows. ${\begin{matrix} P (Y = 0 | x^{(i)}) = \frac{1}{1 + e^{\sum_{j = 1}^{n} ω_{j} x_{j}^{(i)}}} \\ P (Y = 1 | x^{(i)}) = \frac{e^{\sum_{j = 1}^{n} ω_{j} x_{j}^{(i)}}}{1 + e^{\sum_{j = 1}^{n} ω_{j} x_{j}^{(i)}}} \end{matrix}$ (5)

Here, x₀ = 1. In this model, we use the probability values in Equation 5 to estimate the possibility $t (w_{i}^{j} | w_{i}^{k})$ in Equation 4. In this scenario, the class of an instance, which describes whether two candidate objects could make up an opinion pair, can be predicted by Equation 6.

$Y = sgn (\sum_{j = 1}^{n} ω_{j} x_{j}^{(i)})$ (6)

5 The active learning-based semi-supervised extracting method

Generally, the supervised extraction methods would have strong generalization ability and achieve good extraction effectiveness with sufficient training samples. However, labeling samples manually for training is a time-consuming and error-prone process. In fact, not all of the labeled samples would provide the same amount of information for training the model. For example, a small number of support vectors play key roles for learning the decision boundary. Then, it would be an effective way to reduce the labeling workload without loss in extraction effect by choosing the most informative samples to label for training.

Assuming we have a small labeled sample set L = {〈x⁽ⁱ⁾, y⁽ⁱ⁾〉|i = 1, ⋯, N₁} and a large unlabeled sample set U = {x⁽ⁱ⁾|i = 1, ⋯, N₂}, N₁ << N₂. Now our target is to get an extractor model with relative strong generalization based on L by using less as far as possible human annotation for U. The main processes of the active learning-based semi-supervised extraction method are described in Fig. 2. Firstly, we use the small labeled sample set L to train an extraction model, which could have relatively weak generalization due to the lack of training samples. Then, the model is applied to make predictions for the unlabeled samples. For the predicting results, we would choose the most informative samples for human to label. And the new labeled samples are put into the labeled sample set for training. This process would be executed iteratively until some predetermined stop conditions hold.

Fig.2

Iterative process for active Labeling.

5.1 Active labeling based on samples’ uncertainty

A key problem in above process is to ensure the samples we choose for labeling manually are the most informative ones. Uncertainty sampling [28] is one of the most used methods in active learning. In this way, the extraction model would choose the samples which are most uncertain to be made predictions by the model. That means the chosen samples should be located near the decision boundary. Thus, we can use margin of confidence to measure the uncertainty:

$x^{*} = \underset{x^{(i)} \in U}{arg min} (P (y^{(k)} | x^{(i)}) - P (y^{(j)} | x^{(i)}))$ (7) where P (y^(k)|x⁽ⁱ⁾) is the probability that x⁽ⁱ⁾ is labeled as y^(k), $y^{(k)} = \underset{y \in C}{arg max} P (y | x^{(i)})$ and $y^{(j)} = \underset{y \in C ∖ y^{(k)}}{arg max} P (y | x^{(i)})$ .

$x^{*} = \underset{x^{(i)} \in U}{arg min} P (y^{(k)} | x^{(i)})$ (8)

By Equation 7s, we can measure the uncertainty on sample level.

5.2 Measuring sample’s uncertainty with its features’ evidences

Measuring the uncertainty on sample level is one of the coarse-grained strategies. We can explore the sample’s uncertainty with finer granularity, which would achieve better effectiveness. Then, we consider to use the features’ evidences for evaluating the uncertainty based on the work in [29] further. Let P_{x
⁽ⁱ⁾} is the set containing the features providing evidence for class “1” and N_{x
⁽ⁱ⁾} is the set of features providing evidence for the class “0”: $\begin{matrix} P_{x^{(i)}} & = & {x_{j}^{(i)} | ω_{j} x_{j}^{(i)} > 0} \\ N_{x^{(i)}} & = & {x_{k}^{(i)} | ω_{k} x_{k}^{(i)} < 0} \end{matrix}$

Then, the evidence that instance x⁽ⁱ⁾ provides for the class “1” is:

$E_{1} (x^{(i)}) = \sum_{x_{j}^{(i)} \in P_{x^{(i)}}} ω_{j} x_{j}^{(i)}$ (9) and the evidence that instance x⁽ⁱ⁾ provides for the class “0” is:

$E_{0} (x^{(i)}) = - \sum_{x_{k}^{(i)} \in N_{x^{(i)}}} ω_{j} x_{k}^{(i)}$ (10)

Then, we would like to choose the instances with relatively large uncertainty values, which have both large E₀ (x⁽ⁱ⁾) and E₁ (x⁽ⁱ⁾):

$x^{*} = \underset{x^{(i)} \in U}{arg max} (E_{1} (x^{(i)}) \times E_{0} (x^{(i)}))$ (11)

In general, we should choose top k uncertain instances for next training step to improve the efficiency. The semi-supervised extraction algorithm for opinion pairs based on active labeling is described in Algorithm 2g:ALEA. We would get an extractor with relatively weak generalization ability after the initial extraction model is trained with a small labeled sample set (Line 1). The extractor is used to make the predictions for unlabeled samples in U (Line 4 and 5). The top k uncertain instances are chosen for manual annotation (Line 6). And then, we enlarge the labeled sample set L with the manually labeled samples (Line 7). The extractor is retrained with the new training set, and we remeasure its performance (Line 9 and 10). The steps above are implemented iteratively until the change of performance is less than a predetermined threshold.

Algorithm 1 Active Labeling-based Semi-supervised Extraction (ALSE)

Input: the labeled sample set L,

the unlabeled sample set U,

the initial model $M$

the stop threshold λ

Output: the trained model $M$

1: Training $M$ with L;

2: Measuring $M$ ’s performance p;

3: repeat

4: Computing $T = {〈 x^{(i)}, {\hat{y}}^{(i)} 〉 | x^{(i)} \in U, {\hat{y}}^{(i)} = M (x^{(i)})}$ ;

5: Evaluating the uncertainty of $〈 x^{(i)}, {\hat{y}}^{(i)} 〉$ in T with Equation 11;

6: Constructing S = {〈x⁽ⁱ⁾, y⁽ⁱ⁾〉} by labeling manually the top k uncertain instances in T;

7: L = L ∪ S;

8: $U = U - {x^{(i)} | 〈 x^{(i)}, {\hat{y}}^{(i)} 〉 \in S}$ ;

9: Training $M$ with L;

10: p′ = p and remeasuring $M$ ’s performance p;

11: until |p - p′| < λ

12: return $M$ ;

6 Evaluation

6.1 Experimental setting

Dataset and setting We construct our dataset based on the review collection from Hu et al. [12], which comes from Amazon.com and C|net.com. Our dataset contains five products: Canon camera (Canon for short), Nikon camera (Nikon), Nokia cellular phone (Nokia), Creative labs mp3 player (Creative) and Apex DVD player (Apex). Some statistical information is shown in Table 2. We apply the Stanford parser to implement the part-of-speech Tagging, and the Support Vector Machine (SVM) to implement the classification.

Table 2
Some statistical information on the constructed dataset

Product # of review sentences # of opinion pair candidates

Canon 597 103027

Nikon 346 52926

Nokia 546 74940

Creative 1716 274543

Apex 740 90200

Product	# of review sentences	# of opinion pair candidates
Canon	597	103027
Nikon	346	52926
Nokia	546	74940
Creative	1716	274543
Apex	740	90200

Baselines To solve the problem on collective extraction of opinion target and opinion word, we propose a supervised collective extraction method ( SCE for short) and an active labeling-based semi-supervised extraction method ( ALSE ) respectively in this paper. For ALSE , we compare three sampling strategies: (1) random sampling ( ALSE_Ram ); (2) sample uncertainty ( ALSE_Sam ); (3) feature evidence ( ALSE_Fea ). The following state-of-the-art methods are treated as the competitors.

NNR . Hu and Liu [12] mine product features as the opinion targets firstly, then they use the nearest neighbor rules based on the features to extract opinion words. These two steps above are processed iteratively to enlarge and refine the result set of opinion targets and opinion words.

DP . This method is a typical syntax-based method proposed in [26]. In this method, opinion associations in sentences with several syntax-based patterns are mined firstly. Then, a Double Propagation process is used to extract opinion pairs.

CR_WP . Liu et al. constructs a heterogeneous graph to model semantic relations and opinion relations firstly. Then, a co-ranking process is applied to estimate the candidates’ confidences in [30].

Evaluation indices The precision (P), recall (R), F-score (F_β) are applied to measure the effectiveness of extraction for opinion pairs, which are defined as follows. $P = \frac{tp}{tp + fp}$ (12) $R = \frac{tp}{tp + fn}$ (13) $F_{β} = (1 + β^{2}) \cdot \frac{P \cdot R}{β^{2} \cdot P + R}$ (14) where tp is the number of real opinion pairs predicted as 1 correctly, fp is the number of false opinion pairs predicted as 1 erroneously, fn is the number of real opinion pairs predicted as 0 erroneously, tn is the number of false opinion pairs predicted as 0 correctly. The β is a non-negative real value, which is used to adjust the weights of recall and precision. We set β = 1 (namely F₁) for evaluation in our experiments since both recall and precision are evenly weighted in this case.

6.2 Results and discussions

In this paper, we target at extracting the opinion pairs from online reviews. It is clear that the products or product features are the opinion targets in our experiments since our dataset is constructed based on the online reviews. Then, the first experiment focuses on the comparisons of extraction effectiveness for the proposed methods and the competitors. We compare the precisions, recalls and the F₁ values of different methods for four different products in Table 3.

Table 3
Extraction effectiveness comparisons for different methods in various domains

The state of art methods Our methods

NNR DP CR_WP SCE ALSE_Fea

Canon P 0.71 0.83 0.83 0.86 0.86

R 0.81 0.80 0.83 0.82 0.83

F ₁ 0.76 0.81 0.83 0.84 0.84

Nikon P 0.68 0.87 0.85 0.87 0.86

R 0.79 0.82 0.86 0.86 0.87

F ₁ 0.73 0.84 0.85 0.86 0.86

Nokia P 0.69 0.89 0.86 0.88 0.89

R 0.77 0.83 0.88 0.90 0.88

F ₁ 0.73 0.86 0.87 0.89 0.88

Creative P 0.67 0.79 0.80 0.85 0.85

R 0.79 0.81 0.84 0.86 0.87

F ₁ 0.73 0.80 0.82 0.85 0.86

Apex P 0.70 0.91 0.91 0.91 0.91

R 0.78 0.84 0.86 0.88 0.85

F ₁ 0.74 0.87 0.88 0.89 0.88

		The state of art methods	Our methods
Canon	P	0.71	0.83	0.83	0.86	0.86
	R	0.81	0.80	0.83	0.82	0.83
	F ₁	0.76	0.81	0.83	0.84	0.84
Nikon	P	0.68	0.87	0.85	0.87	0.86
	R	0.79	0.82	0.86	0.86	0.87
	F ₁	0.73	0.84	0.85	0.86	0.86
Nokia	P	0.69	0.89	0.86	0.88	0.89
	R	0.77	0.83	0.88	0.90	0.88
	F ₁	0.73	0.86	0.87	0.89	0.88
Creative	P	0.67	0.79	0.80	0.85	0.85
	R	0.79	0.81	0.84	0.86	0.87
	F ₁	0.73	0.80	0.82	0.85	0.86
Apex	P	0.70	0.91	0.91	0.91	0.91
	R	0.78	0.84	0.86	0.88	0.85
	F ₁	0.74	0.87	0.88	0.89	0.88

As shown in Table 3, we can observe that our methods overcome all of the state-of-arts in all domains. The NNR achieves the worst effectiveness. We think the main reason is that the nearest-neighbor method applies a fix-length window to harvest product features or opinion words, which limits the extraction precision and recall. The DP is a syntax pattern-based method, which can achieve a high relatively high precision, but the predefined syntax patterns lead to a low recall. Moreover, DP requires some initial opinion word seeds to bootstrap the extraction process, which would limit the extraction performance too. For the CR_WP, it assumes that the opinion targets should be nouns/noun phrases and opinion words should be adjectives. We consider that is a disadvantage of CR_WP, since users’ sentiment could be expressed with other types of words such as verbs and adverbs.

Another main observation is that both SCE and ALSE_Fea are effective extraction methods in our experiments, the former is a full supervised method and the latter is a semi-supervised one. We can see that the ALSE_Fea’s extraction is similar to the SCE’s, when ALSE_Fea reaches a stable state. However, it is worth noting that the training samples ALSE_Fea used are much less than those used by SCE. Exactly, the training sample amount of ALSE_Fea is 26% less than that of SCE in Canon, 27% in Nikon, 39% in Nokia, 31% in Creative and 29% in Apex. We will compare the SCE and ALSE_Fea further in later experiment.

In Section 4, we propose seven features to construct the vector space for candidate opinion pairs. The following experiment focuses on measuring the features’ functions for extraction effectiveness. We remove different types of features and evaluating the F₁ value of SCE in different domains in Fig. 3. For example, we remove the Pos-type features (F1 and F2 in Table 1), implement the extraction based on the left features with the supervised method SCE. This method is denoted as SCE_Pos in Fig. 3. Similarly, SCE_Loc means the location-type features (F3 and F4 in Table 1) are removed, SCE_Dis removes F5, SCE_Fre removes F6 and SCE_Rel removes F7 separately.

Fig.3

Different features’ effectiveness.

As shown in Fig. 3, almost all features are positive to the extraction effectiveness obviously in different domains, except the distance feature (F5). Namely, the F₁ value will decrease significantly after we remove the Pos-type features, location-type features, frequency feature and dependency relationship feature respectively. For the distance feature, it doesn’t seem to make sense in this experiment. We think the main reason is that the distance information could be implicated in the location-type features. Then, the extraction effectiveness varies slightly when we remove the distance feature only.

The proposed semi-supervised method is comparable with the full supervised one on extraction effectiveness in different domains, which is verified in our first experiment. We further make comparisons on extraction performance between ALSE_Fea and SCE in Fig. 4, in which the horizontal axis describes the proportion of training samples used in every experiment. For example, 0.3 means only 30% labeled samples in training set are used to train the model. We find that all effectiveness curves of ALSE_Fea locate above those of SCE in all domains. That means ALSE_Fea can achieve the better extraction effectiveness, when it uses the same amount of training samples with SCE. On the other hand, ALSE_Fea will need less labeled samples to train the model for the same F₁ value.

Fig.4

The comparison between the supervised method and the semi-supervised one.

In the last experiment, we investigate different sampling strategies used in the active labeling processing for the semi-supervised extraction method. As discussed above, we can choose the samples for labeled manually by three sampling strategies: random sampling (ASLE_Ram), sampling based on sample’s uncertainty (ASLE_Sam) and based on feature evidence (ASLE_Fea) separately. As shown in Fig. 5, ASLE_Sam improves 25.75% in Canon, 38.7% in Nikon, 30.88% in Nokia, 23.51% in Creative and 35.94% in Apex over ASLE_Ram respectively. ASLE_Fea improves 27.99%, 39.51%, 30.14%, 26.45% and 37.34% in five different domains over ASLE_Ram respectively. Thus, both ASLE_Sam and ASLE_Fea are effective to pick up the most informative samples for labeling firstly. By contrast, ASLE_Fea shows better performs than ASLE_Sam in most domains. It means that the fine-grain evaluation method on sample uncertainty can choose more informative samples in active labeling processing for our semi-supervised extraction method.

Fig.5

Different strategies on choosing samples for labeling.

7 Conclusion

This paper focuses on an important task in opinion mining, namely, collective extraction of opinion target and opinion word. We propose a supervised method to extract such opinion pairs based on the constrained word alignment model. However, the supervised method would encounter the time-consuming and error-prone problem of manual annotation. Then, we propose an active labeling-based semi-supervised extraction method. To choose the most informative samples for labeling, we design the sample uncertainty-based strategy and the feature evidence-based one. At last, we construct a real world dataset on the base of an open review set to verify the proposed methods’ effectiveness. The experiment results show our approaches outperform other state-of-the-art baselines in different product domains.

In the future, we will focus on two works. Firstly, we plan to improve the extraction performance by mining the implicit targets and contrasted opinion. We will then also explore the semi-automatic techniques based on crowdsourcing for labeling the candidate opinion pairs to conduct a large scale dataset, which can be used to improve the extractor’s generalization and evaluate new methods and models on mining opinion targets and opinion words.

Footnotes

Acknowledgments

The work is supported by National Natural Science Foundation of China (Nos. 61562014, 61763007, U1501252, 61662013), the Guangxi Natural Science Foundation (No. 2015GXNSFAA139303), the project of Guangxi Key Laboratory of Trusted Software, the project of Guangxi Key Laboratory of Automatic Detecting Technology and Instruments (No. YQ17111), the general Scientific Research Project of Guangxi Provincial Department of Education (No. 2017KY0195).

References

Serrano-Guerrero

, Olivas

J.A.

, Romero

F.P.

and Herrera-Viedma

, Sentiment analysis: A review and comparative analysis of web services, Information Sciences311 (2015), 18–38.

Cone Inc. 2011 online influence trend tracker, 2011. http://www.conecomm.com/news-blog/2011-online-influence-trend-tracker-release

Hassan Khan

, Qamar

and Bashir

, Building normalized sentimi to enhance semi-supervised sentiment analysis, Journal of Intelligent & Fuzzy Systems29(5) (2015), 1805–1816.

Lin

, Zhang

, Wang

and Zhou

, Sentiment classification via integrating multiple feature presentations, In Proceedings of the 21st International Conference on World Wide Web, 2012, pp. 569–570. ACM.

Pang

, Lee

and Vaithyanathan

, Thumbs up?: Sentiment classification using machine learning techniques, In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10, Association for Computational Linguistics, 2002, pp. 79–86.

Gao

and Fu

, Methods of uncertain partial differential equation with application to internet public opinion problem, Journal of Intelligent & Fuzzy Systems, 1–11. (Preprint).

Wang

, Wei

, Liu

, Zhou

and Zhang

, Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach, In Conference on Information and Knowledge Management, 2011, pp. 1031–1040. ACM.

Zhao

, Qin

, Liu

and Tang

, Social sentiment sensor: A visualization system for topic detection and topic sentiment analysis on microblog, Multimedia Tools and Applications75 (2016), 8843–8860.

Guzman

and Maalej

, How do users like this feature? a fine grained sentiment analysis of app reviews, In 2014 IEEE 22nd International Requirements Engineering Conference (RE), 2014, pp. 153–162. IEEE.

10.

Yang

and Cardie

, Joint inference for fine-grained opinion extraction, In ACL (1), 2013, pp. 1640–1649.

11.

Zhai

, Liu

, Xu

and Jia

, Clustering product features for opinion mining, In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, 2011, pp. 347–354. ACM.

12.

and Liu

, Mining opinion features in customer reviews, In AAAI, volume 4, 2004, pp. 755–760.

13.

Wang

and Wang

, Bootstrapping both product features and opinion words from chinese customer reviews with crossinducing, In IJCNLP, volume 8, 2008, pp. 289–295.

14.

Popescu

and Etzioni

, Extracting product features and opinions from reviews, In Natural Language Processing and Text Mining, Springer, 2007, pp. 9–28.

15.

Zhang

, Liu

, Lim

and O’Brien-Strain

, Extracting and ranking product features in opinion documents, In Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, 2010, pp. 1462–1470.

16.

, Qin

, Xu

and Guo

, A holistic model of mining product aspects and associated sentiments from online reviews, Multimedia Tools and Applications74 (2015), 10177–10194.

17.

Brown

P.F.

, Pietra

V.J.D.

, Pietra

S.A.D.

and Mercer

R.L.

, The mathematics of statistical machine translation: Parameter estimation, Computational Linguistics19(2) (1993), 263–311.

18.

, Zhang

, Huang

and Wu

, Phrase dependency parsing for opinion mining, In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3, Association for Computational Linguistics, 2009, pp. 1533–1541.

19.

Kobayashi

, Inui

and Matsumoto

, Extracting aspectevaluation and aspect-of relations in opinion mining, In EMNLP-CoNLL, volume 7, 2007, pp. 1065–1074. Citeseer.

20.

, Han

, Huang

, Zhu

, Xia

, Zhang

and Yu

, Structure-aware review mining and summarization, In Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, 2010, pp. 653–661.

21.

Jin

, Ho

and Srihari

R.K.

, A novel lexicalized hmmbased learning framework for web opinion mining, In Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 465–472. Citeseer.

22.

Lafferty

, McCallum

and Pereira

, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, In Proceedings of the Eighteenth International Conference on Machine Learning, ICML, volume 1, 2001, pp. 282–289.

23.

Rabiner

and Juang

, An introduction to hidden markov models, IEEE Assp Magazine3(1) (1986), 4–16.

24.

Liu

, Xu

and Zhao

, Co-extracting opinion targets and opinion words from online reviews based on the word alignment model, IEEE Transactions on Knowledge and Data Engineering27(3) (2015), 636–650.

25.

Wang

, Zhang

, Yin

, Wang

, Zhang

and Xu

, A unified framework for fine-grained opinion mining from online reviews, In 2016 49th Hawaii International Conference on System Sciences (HICSS), 2016, pp. 1134–1143. IEEE.

26.

Qiu

, Liu

, Bu

and Chen

, Opinion word expansion and target extraction through double propagation, Computational Linguistics37(1) (2010), 9–27.

27.

De Marneffe

, MacCartney

and Manning

C.D.

, Generating typed dependency parses from phrase structure parses, In Proceedings of LREC, volume 6, Genoa Italy, 2006, pp. 449–454.

28.

Lewis

D.D.

and Gale

W.A.

, A sequential algorithm for training text classifiers, In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Springer-VerlagNew York Inc., 1994, pp. 3–12.

29.

Sharma

and Bilgic

, Most-surely vs. least-surely uncertain, In 13th International Conference on Data Mining, 2013, pp. 667–676. IEEE.

30.

Liu

, Xu

and Zhao

, Extracting opinion targets and opinion words from online reviews with graph co-ranking, In ACL (1), 2014, pp. 314–324.

		The state of art methods			Our methods
		NNR	DP	CR_WP	SCE	ALSE_Fea
Canon	P	0.71	0.83	0.83	0.86	0.86
	R	0.81	0.80	0.83	0.82	0.83
	F ₁	0.76	0.81	0.83	0.84	0.84
Nikon	P	0.68	0.87	0.85	0.87	0.86
	R	0.79	0.82	0.86	0.86	0.87
	F ₁	0.73	0.84	0.85	0.86	0.86
Nokia	P	0.69	0.89	0.86	0.88	0.89
	R	0.77	0.83	0.88	0.90	0.88
	F ₁	0.73	0.86	0.87	0.89	0.88
Creative	P	0.67	0.79	0.80	0.85	0.85
	R	0.79	0.81	0.84	0.86	0.87
	F ₁	0.73	0.80	0.82	0.85	0.86
Apex	P	0.70	0.91	0.91	0.91	0.91
	R	0.78	0.84	0.86	0.88	0.85
	F ₁	0.74	0.87	0.88	0.89	0.88

Semi-supervised collective extraction of opinion target and opinion word from online reviews based on active labeling

Abstract

Keywords

1 Introduction

3 Problem statement

4.1 The constrained word alignment model

6.1 Experimental setting

Table 2 Some statistical information on the constructed dataset Product # of review sentences # of opinion pair candidates Canon 597 103027 Nikon 346 52926 Nokia 546 74940 Creative 1716 274543 Apex 740 90200

Footnotes

Acknowledgments

References

Table 2
Some statistical information on the constructed dataset

Product # of review sentences # of opinion pair candidates

Canon 597 103027

Nikon 346 52926

Nokia 546 74940

Creative 1716 274543

Apex 740 90200