Sentiment classification based on dependency-relationship embedding and attention mechanism

Abstract

Aspect-based sentiment classification, a fine-grained sentiment analysis task, aims to predict the sentiment polarity for a specified aspect. However, the existing aspect-based sentiment classification approaches cannot fully model the dependency-relationship between words and are easily disturbed by irrelevant aspects. To address this problem, we propose a novel approach named Dependency-Relationship Embedding and Attention Mechanism-based LSTM. DA-LSTM first merges the word hidden vector output by LSTM with the dependency-relationship embedding to form a combined vector. This vector is then fed into the attention mechanism together with the aspect information which can avoid interference to calculate the final word representation for sentiment classification. Our extensive experiments on benchmark data sets clearly show the effectiveness of DA-LSTM.

Keywords

Aspect-based sentiment analysis sentiment classification dependency-relationship attention mechanism

1 Introduction

Massive amounts of text data have been accumulated nowadays. In particular, various types of review data such as hotel reviews, product reviews, and hotspot reviews are closely bound up with people’s daily lives. Reviews are generally short but have strong opinions and emotions. Analyzing the emotions implied in these reviews helps us determine whether their subjects are positive or negative. However, it is very difficult to extract the emotional tendency from extensive review data in an artificial way. Sentiment analysis [1] in the field of natural language processing aims to interpret and classify emotions in subjective data, which can effectively solve such problems. It can be categorized into document-based, sentence-based, and aspect-based sentiment analysis according to the level of granularities of the existing research. For review data, aspect-based sentiment analysis has better directional analysis capabilities than the other two. For example, the sentence “This cup is small but of good quality.” would be assigned with negative polarity for aspect “cup” while with positive polarity for aspect “quality”.

Aspect-based sentiment analysis has been investigated mainly based on four approaches: feature-based, dictionary-based, traditional machine learning-based, and deep learning-based. Feature-based or dictionary-based methods first retrieve aspects and sentiment words in review and then calculate the sentiment polarity pointed to by sentiment words according to prior knowledge. Due to the use of prior knowledge such as sentiment dictionary, these methods dig out information in text quickly and achieve good performance. However, they are heavily dependent on external knowledge. The randomness of words used in review makes it difficult for them to extract infrequent sentiment words. Therefore, these methods cannot fully discover the aspects and corresponding sentiment words in the review.

Traditional machine learning-based sentiment analysis approaches construct features that express the semantic information of words and input these features into machine learning models such as SVM [2] and NB [3] to determine the aspect sentiment polarity. They also exploit models such as LDA [4] to automatically mine the sentiment information of aspects in review and classify these aspects. These methods focus on the semantics of the words themselves but ignore the semantic information existing between different words.

Recently, the application of deep learning has become more and more extensive, such as network representation learning [5, 6], recommendation system [7], and so on. Sentiment analysis methods based on deep learning also showed promising performance and attracted a great deal of attention [8]. However, these methods require abundant labeled data that is not easily available in reality. Therefore, some works try to solve this problem by introducing transfer learning and generative adversarial networks. Specifically, they first learn a traditional language model or an end-to-end encoder through extensive unlabeled data to represent text. Then they perform semi-supervised learning based on the encoding model through a little labeled data and fine-tune it to get the final classification model. These methods take local context information of the word into account while calculating the word representation to complete the aspect sentiment polarity prediction. However, due to the limitations of the neural network model, such approaches cannot mine the global context information of words and perform targeted analysis on specific aspects of review.

To solve the above problems, we propose an LSTM model based on dependency-relationship embedding and attention mechanism (DA-LSTM), which is applied for the sentiment classification task of product reviews. Figure 1 illustrates the architecture of a standard LSTM [9], which has an excellent performance in text semantic information mining. The proposed DA-LSTM takes the hidden vectors of sentiment words output by LSTM as its basic, and fuses them with the dependency-relationship through the BI function to preserve the dependency between words and the global context information. In addition, it takes the aspect vector into consideration, and finds the sentiment word most related to the target aspect through the attention mechanism. Finally, the obtained word representation is fed into the softmax function to predict the sentiment polarity of the aspect. The main contributions of this paper are summarized as:

We propose a novel method for word vector generation. Different from the existing methods, our method fuses the dependency-relationship embedding with the hidden vector obtained by LSTM, which also preserves the dependency information and context information between words.

We design an attention mechanism. The extracted aspect information and the combined word vector are fed into the attention mechanism, which adaptively assigns larger weight to word related to the target aspect to avoid interference from irrelevant words.

We evaluate experiments on three real-world review datasets, namely Laptop14, Restaurant15, Restaurant16. Experiments demonstrate that the proposed model achieves excellent performance compared to baselines.

Fig. 1

The architecture of a standard LSTM. {w_s0, w_s1, ⋯ , w_sn } represent the word vector set of a review whose length is n. {h₀, h₁, ⋯ , h_n } denote the hidden vector set.

2 Related work

2.1 Aspect-based sentiment classification with neural networks

Deep learning has achieved fantastic performance in aspect-based sentiment analysis. Zhao et al. [10] proposed convolutional feature extraction and Long Short-Term Memory, an underlying network structure for performing text sentiment analysis. They exploited a neural network to learn high-level representations (that is, embedding space) of text to achieve the purpose of predicting the sentiment polarity. Cong et al. [11] modeled sentiment polarity in text based on bi-directional LSTM. Xue et al. [12] added two convolution layers to the embedding layer of the CNN model. Based on the given aspect information, words related to aspects are automatically extracted, so that CNN can be used to predict the sentiment information of aspects. The idea of Noh et al. [13] is to represent aspects through location information and utilize CNN to construct aspect maps and extract aspect sentiment words. Zhou et al. [14] considered the location information of each aspect in their model and enhanced the aspect-based sentiment classification performance by transferring hierarchical knowledge from the resource-rich sentence-based sentiment classification dataset. Tang et al. [15] proposed a network architecture based on the idea of adversarial learning, which uses labeled source domain data to predict the sentiment label of unlabeled target domain data.

Deep learning has an outstanding performance in capturing semantic information of words, but CNN can only capture local features of words. Due to the variable length of the reviews, CNN tends to ignore emotional words which are related to semantic but far away from aspects. LSTM can capture the contextual information of words, but aspect sentiment words that have an important influence on aspect-based sentiment classification are not fully represented. The aforementioned limitations of these methods hinder the performance improvement of aspect-based sentiment classification.

2.2 Aspect-based sentiment classification with attention mechanism

With the continuous research of attention mechanism, aspect-based sentiment analysis based on attention mechanism has become a popular research work. Wang et al. [16] proposed an attention-based LSTM model for sentiment classification. Assuming that sentiment representations of several aspects are used as input, the attention mechanism would focus on the sentiment aspects that are beneficial to the classification results. Tang et al. [17] proposed a multi-layer deep learning model with shared parameters. This model utilize the attention mechanism to learn the context weight of words in text based on location and content information, and calculate the text representation through the weight. The representation of the last layer is used for sentiment classification. Wang et al. [18] proposed a mixed attention mechanism, in which local attention is applied to acquire word information with syntactic dependency, global attention is applied to retrieve all words in the text and acquire word context information. Finally, a gated mechanism is designed for aspect-based sentiment analysis. Ma et al. [19] designed a strategy that combines feature representation and word embedding to enhance the attention mechanism, and presented a feature-based composite memory network to solve the fine-grained sentiment classification problem. Tang et al. [20] extracted the word with the maximum attention weight from the text as seed information and masked it. They repeated this process to expand the seed word set and utilized regularization to standardize the results. Finally, the obtained word features are input into the sentiment analysis model to predict aspect sentiment polarity. Lu et al. [21] used general extractor, sentiment extractor, negative extractor, and degree extractor to model the hidden state of the text. Then, they designed an interactive rule attention mechanism to learn the association information between context and aspects, thereby improving the performance of aspect-based sentiment analysis. Liu et al. [22] introduced an adversarial training method to get unbiased attention which avoids excessive attention to sentiment words unrelated to the target aspect. Then, they proposed an Embedding-Preserving Gating Mechanism to dynamically incorporate target-related features into word representation and retain original word information.

The advantage of the attention mechanism is that it can mine the key information in texts. However, there are many words used to describe different aspects in the review, which causes the attention mechanism to be easily disturbed by other aspects when calculating the weight of the target aspect. In addition, attention mechanism is often applied together with neural networks. When calculating the attention weight based on the representation output by the neural network, the sentiment word information related to the target aspect may have been discarded.

For the sake of breaking the above limitations of neural networks in sentiment classification and attention mechanism in aspect sentiment word mining, we first combine dependency-relationship embedding and hidden vector. Then, the combined vector and aspect vector are fed into attention mechanism to calculate the final word representation for sentiment classification. Dependency-relationship embedding increases the syntax dependency information between words and aspect vector to avoid the interference of other aspects on the target aspect. In this way, the attention mechanism only considers sentiment words related to the target aspect and gives a more accurate representation of sentiment words. Finally, the obtained word representation is input into the softmax function, and the aspect-based sentiment classification is carried out by probability distribution.

3 DA-LSTM

In this section, we first provide the overall structure of the proposed DA-LSTM and then introduce the details of DA-LSTM in sentiment word extraction and aspect-based sentiment classification.

3.1 Overall framework

The proposed DA-LSTM model first obtains the hidden vector that encodes context information through LSTM. Then, the hidden vector and dependency-relationship embedding are combined to form the combination vector which contains both the context information and dependency-relationship. The aspect information extracted from the review is provided to the attention mechanism together with the combination vector to mine the word information closely related to the target aspect. Finally, the aspect-based sentiment classification is performed through the calculated word representation. This process is shown in Algorithm 1. The framework of the proposed DA-LSTM is shown in Fig. 2.

Fig. 2

The overall framework of the proposed DA-LSTM. {e_{w
_s0}, e_{w
_s1}, ⋯ , e_{w
_sn} } represents the word dependency-relationship vector set. {c_{w
_s0}, c_{w
_s1}, ⋯ , c_{w
_sn} } denotes the combination vector set. e_aspect is aspect vector. γ is a vector consisting of attention weights and r is a weighted representation of a review with a given aspect. h′ denotes the final feature vector related to the target aspect in review.

3.2 Sentiment word extraction

Each word in the review has a dependency-relationship with its previous and subsequent words. LSTM can preserve the long-term dependencies in the review and effectively link historical information. Due to the existence of the input gate and the forget gate, LSTM automatically judges the availability of information to save useful information and discards useless information. In addition, LSTM can avoid the problem of vanishing or exploding gradient. Therefore, most models used for sentiment analysis are based on LSTM layers, through which sentiment information is extracted from word embedding.

Due to the above advantages of LSTM, we adopt it to extract sentiment information in this paper. The result of aspect-based sentiment classification depends on the sentiment polarity of word related to the target aspect. LSTM encodes the context information of the words in the review, but cannot highlight sentiment word that can reflect the specific aspect. Therefore, we fuse the dependency-relationship embedding and the hidden vector output by LSTM, so that the sentiment words are fully semantically represented in the combined vector. To ensure that there is a close relationship between the highlighted sentiment word and the target aspect, we introduce the aspect vector and utilize the attention layer to ensure the collocation of the aspect and sentiment words.

Hidden vector. As shown in Fig. 2, for a given input review w_s = (w_s1, w_s2, ⋯ , w_sn), where {w_s1, w_s2, ⋯ , w_sn } represent the word vector set, the hidden vector h_t is calculated by the Equation 1 - Equation 5: $i_{t} = σ (W_{i} x_{t} + W_{i} h_{t - 1} + b_{i})$ (1) $f_{t} = σ (W_{f} x_{t} + W_{f} h_{t - 1} + b_{f})$ (2) $o_{t} = σ (W_{o} x_{t} + W_{o} h_{t - 1} + b_{o})$ (3) $c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \tanh (W_{g} x_{t} + U_{g} h_{t - 1} + b_{g})$ (4) $h_{t} = o_{t} ⊙ \tanh (c_{t})$ (5) where σ refers to the sigmoid function; ⊙ denotes element-wise multiplication operator; W_i, W_f, $W_{o} \in ℝ^{d \times 2 d}$ and b_i, b_f, $b_{o} \in ℝ^{d}$ are weighted matrices and biases corresponding to input gate, forget gate, and output gate of LSTM, and they are learned during the training process; x_t represents the input unit of LSTM (i.e., word embedding vector in review w_s). h_t is the vector of the hidden layer, which contains the context information of each word in the input w_s captured by LSTM. We regard the last hidden vector h_n as the representation of the review.

Dependency-relationship. Following the design of Li et al. [23], we get the word dependency-relationship vector e_{w
_sn} through the following method. We first construct a directed dependency graph G_s based on the dependency-relationship set A_s. Assuming that there are dependencies between w_s1 and w_s2, w_s2 and w_s3, then A_s ={ (w_s1, w_s2) , (w_s2, w_s3) , ⋯ , (w_sm, w_sn) } is a collection of dependency pair in s, and G_s contains the semantic structure of s. Then, the dependency path set L_s is extracted from G_s through graph traversal and transformed into a key sub-sentence set S_s ={ SubSent_s1, SubSent_s2, ⋯ , SubSent_sj } according to the grammatical rules, where L_s ={ path_s1, path_s2, ⋯ , path_sj }, path_sj denotes the j-th dependency path in the s-th sentence. Finally, all SubSent items in S_s are fed into the word embedding model to calculate the dependency-relationship embedding e_{w
_sn}. Due to the consideration of dependency embedding, words with similar structure and dependency-relationship have closer distances in vector space.

Combination vector. The dependency-relationship vector e_{w
_sn} is merged into h_n to form the combination vector c_{w
_sn}. Inspired by Pham et al. [24], we exploit the BI composition function proposed in the vector combination model to fuse h_n and e_{w
_sn}. The BI function captures binary information through non-linear bi-gram, which is defined as: $v_{x} = \sum_{i = 1}^{n} f ([x_{i - 1} + x_{i}])$ (6) where v_x is the vector form of x, f ([a + b]) denotes the element-wise weighted addition of two vectors a and b. f (·) is the hyperbolic tangent function, which is defined as follows: $f_{y} = \tanh (y) = \frac{e^{y} - e^{- y}}{e^{y} + e^{- y}}$ (7)

The function f_y learns the interaction information and captures the useful combination information between the two vectors. f_y performs a weighted addition operation on h_n and e_{w
_sn} to generate a combined vector c_{w
_sn}, which not only contains the context information of the word but also highlight the other word information related to the aspect. c_{w
_sn} is computed according to the composition function as follows: $c_{w_{sn}} = \sum_{i = 1}^{n} f (H ⊙ [e_{w_{s (i - 1)}} + e_{w_{s (i)}}])$ (8) where $H \in ℝ^{d \times n}$ represents a matrix generated by LSTM, which is composed of the hidden vectors [h₁, h₂, ⋯ , h_n]. d is the dimension of the hidden layer. After embedding the dependency-relationship, the combination vector c_{w
_sn} represents the word that is closely related to the semantic aspect more fully, which provides stronger word representations for the subsequent work of sentiment classification.

3.3 Aspect-based sentiment classification

There are several aspects in the review. However, the combination vector c_{w
_sn} can’t identify the target aspect in the review and the sentiment word closely related to the target aspect. Besides, sentiment words unrelated to the target aspect will affect the sentiment polarity prediction. The attention mechanism can directionally capture word closely related to the aspect in the review. Therefore, in addition to the combination vector c_{w
_sn}, we also consider the aspect vector and attention mechanism to assign larger weight to the sentiment word related to the target aspect.

We follow the method proposed by Li et al. [23] to obtain the aspect set in review and encode each aspect by a one-hot vector. We first retrieve the number of aspects based on the aspect set. Then the aspect vector e_aspect is generated for each aspect according to the position factor, which indicates its position in the review. Finally, aspect vector e_aspect and combination vector c_{w
_sn} are input into attention mechanism to generate attention weight vector γ and weighted hidden representation r. Inspired by Wang et al. [16], γ and r are calculated by: $M = \tanh ([\begin{matrix} c_{w_{sn}} \\ e_{aspect} \end{matrix}])$ (9) $γ = softmax (w^{T} M)$ (10) $r = C γ^{T}$ (11) where w are projection parameters. γ is a vector consisting of attention weights and r is a weighted representation of a review with given aspects. The final representation of the review is given by: $h^{'} = \tanh (r + c_{w_{sn}})$ (12)h′ represents the embedding vector related to the target aspect in review, that is, it only contains the sentiment word information associated with the target aspect. After deriving h′, the softmax function converts it into a conditional probability distribution to predict the sentiment polarity of the target aspect in the input review: $\hat{y} = softmax ({Wh}^{'} + b)$ (13) where W and b denote the parameters in the training process, $\hat{y}$ is the predicted probability distribution on the three classes.

3.4 Optimization object

In this paper, the parameters used in our model are gradually updated through gradient descent. Since the set of class labels is {positive, negative, neutral}, sentiment classification is actually regarded as a multi-class classification problem. Therefore, we adopt cross-entropy as the loss function. Cross-entropy is used to measure the distance between two probability distributions, and it can effectively avoid the problem of reduced learning rate during gradient descent. $loss = - \sum_{i} \sum_{j} y_{i}^{j} log {\hat{y}}_{i}^{j} + λ {∥ θ ∥}^{2}$ (14) where, the former is cross-entropy function, and the latter is L2 regularization which can effectively suppress over-fitting. y and $\hat{y}$ denote the true distribution and the predicted distribution for the review, respectively. i and j are the indexes of review and class respectively. θ is a set of all the parameters involved in our model and ∥ · ∥ ² is L2-norm.

4 Experiments

In this section, we conduct experiments on the SemEval dataset and compare the DA-LSTM with representative aspect-based sentiment classification methods to verify its effectiveness. Note that all input word vectors are initialized by Glove in our experiments.

Algorithm 1 Dependency-Relationship Embedding and Attention Mechanism-based LSTM(DA-LSTM) Algorithm
Input: The word vector set w_s; Number of iterations iters;
Output: The probability distribution $\hat{y}$ for aspect;
1: foriter in range(iters) do
2: fori in range(1, n + 1) do
3: calculate h_i according to Equations 1–5;
4: obtain e_{w _si} based on dependency pairs A_s;
5: calculate c_{w _si} according to Equations 6–8;
6: end for
7: obtain e_aspect based on position factor;
8: r ← Att (c_{w _sn}, e_aspect);
9: get h′ according to Equation 12;
10: get $\hat{y}$ according to Equation 13;
11: calculate Loss according to Equation 14;
12: $θ \leftarrow \frac{\partial Loss}{\partial θ}$ ;
13: end for
14: return $\hat{y}$

4.1 Datasets

SemEval is an authoritative public dataset in the field of aspect extraction. Laptop14 is a review dataset for laptops released by SemEval in 2014, Restaurant15 and Restaurant16 are review datasets for restaurants released by SemEval in 2015 and 2016, respectively. We extract aspects from reviews of laptop and restaurant on these datasets and identify the aspect sentiment polarity. The validity of the proposed approach DA-LSTM is verified by aspect-based sentiment classification results on three datasets. The detailed descriptions of datasets Laptop14, Restaurant15, and Restaurant16 are shown in Table 1.

Table 1
Experimental data description

Datasets Aspects

Positive Negative Neutral

Laptop14 1350 648 1014

Restaurant15 1662 759 78

Restaurant16 1898 858 611

Datasets	Aspects
Laptop14	1350	648	1014
Restaurant15	1662	759	78
Restaurant16	1898	858	611

Note that this paper merges the training set and test set of Laptop14, Restaurant15, and Restaurant16 respectively, and inputs the merged dataset into DA-LSTM to predict the sentiment polarity of aspects in reviews.

4.2 Performance measures

The analysis of performance is done with performance measures named, Accuracy, Macro-F1, which are detailed in this section.

Accuracy. This index is the ratio of the number of correctly classified samples to the total number of samples. Accuracy is obtained based on all samples, it does not care whether the predicted sample is positive or negative. It is worth mentioning that when the distribution of samples is not balanced, this index is of little significance. $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (15) where TP, TN, FP, and FN denote true positive rate, true negative rate, false positive rate, and false negative rate respectively.

Macro-F1. F1 is the harmonic average of Precision and Recall. Precision and Recall can only indicate the measurement of a certain classification. To measure the overall classification performance of the model, it is necessary to average the indicators of each classification in the way of Macro. $F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}$ (16) where Precision = TP/(TP + FP) denotes the ratio of the number of correctly predicted positive samples to the number of all samples that are predicted to be positive, Recall = TP/(TP + FN) denotes the ratio of the number of correctly predicted positive samples to the number of true positive samples.

4.3 Baselines

The proposed DA-LSTM is compared with the following state-of-the-art approaches for aspect-based sentiment classification.

LSTM [19]: The standard LSTM takes the last hidden vector as sentence representation and linearizes the vector to make its dimension equal to the length of the classes. Then, the softmax activation is performed based on the final vector to complete sentiment polarity recognition.

IAN [25]: This approach interactively learns context information and aspect information, and an attention mechanism is applied to generate the representation of aspect and its context. Then, they are concatenated with semantic features to predict the sentiment polarity of aspect.

BERT [26]: A language representation approach that utilizes bidirectional transformer to model word representation. BERT inserts a specific token at the beginning of each sequence and adds a fully connected layer at the position of the token in the last encoding layer. The output hidden vector is sent to softmax to complete the aspect-based sentiment classification.

GCAE [12]: A sentiment analysis model based on CNN and gating mechanism. The two convolution layers added on top of the embedding layer are used as gating units. Each gating unit has two nonlinear gates linked to the convolution layer. The gating unit can effectively extract multi granularity n-gram features in each receptive field, and automatically extract the sentiment information of a specific aspect to predict its sentiment polarity.

AMEN-AMSAN [13]: A fine-grained sentiment analysis model based on a dual CNN model. The first CNN extracts the location information of the aspect and obtains an aspect map through weakly supervised learning. Then the second CNN performs aspect-based sentiment classification through the aspect map.

TNet-ATT(+AS) [20]: A progressive self-supervised attention learning approach based on the ASC model. This method automatically mines useful attention supervision information from the training corpus in an iterative manner and uses regularization to expand words in the corpus that have a sentiment impact on the aspect.

LSTM+AdvATT+EPGating+CNN(LAEC) [22]: Two mechanisms are designed for fine-grained sentiment analysis. The unbiased attention mechanism makes it fairer in searching for aspect sentiment words through antagonistic training. The embedded gating mechanism adjusts the word vector according to the relevance of the word and the target. In addition to preserving the original semantic information, the obtained vector supplement context sentiment information.

MAN [27]: A multi-attention network for aspect-based sentiment classification. Intra-level attention employs a transformer to encode the input sentence in parallel and preserve long-distance sentiment relations, while inter-level attention uses a global and a local attention module to capture the entire relation and interactions at word level.

PAHT [14]: A hierarchical transfer model based on position-aware, which models multi-level position information by transferring hierarchical knowledge from the dataset. It presents aspect-based positional attention in the word and the segment levels and transfers the knowledge learned through pre-training into ASC from four levels: embedding, word, segment and classifier. This captures salient information in a specific aspect and makes up for the limited data for ASC.

4.4 Experimental results

The experimental results of all approaches on Laptop14, Restaurant15, and Restaurant16 are shown in Table 2, and the evaluation index is Accuracy and Macro-F1. The experimental results indicate that the DA-LSTM model is significantly superior to other approaches. Compared with other approaches, the maximum improvements of DA-LSTM on Laptop14, Restaurant15, Restaurant16 is 13.23%, 10.8%, 13% in Accuracy and 16.13%, 18%, 20.04% in Macro-F1. DA-LSTM shows excellent performance in aspect-based sentiment classification tasks, specifically:

Table 2
Experimental results (evaluating metric:Acc and Macro-F1)

Models Laptop4 Restaurant15 Restaurant16

Accuracy Macro-F1 Accuracy Macro-F1 Accuracy Macro-F1

LSTM 66.45 59.42 75.59 52.93 73.64 52.45

IAN 72.16 66.07 81.39 55.68 80.17 55.92

BERT 76.54 72.48 79.39 64.75 75.51 63.83

GCAE 75.29 67.85 82.19 67.53 81.64 6971

AMEN-AMSAN – – – – 83.94 –

Tnet-ATT (+AS) 77.62 71.84 81.73 69.08 86.25 71.52

LAEC – – 83.17 68.09 – –

MAN 78.13 73.2 82.65 69.1 85.87 73.28

PAHT 75.71 69.55 82.05 68.85 85.81 67.11

DA-LSTM (W/O D) 75.64 70.18 79.59 62.28 83.16 68.52

DA-LSTM (W/O A) 76.45 70.39 81.23 65.11 83.4 70.18

DA-LSTM (W/O D+A) 71.14 62.51 75.59 59.28 78.62 65.03

DA-LSTM 79.68 75.55 86.39 70.93 86.64 72.49

Models	Laptop4	Restaurant15	Restaurant16
LSTM	66.45	59.42	75.59	52.93	73.64	52.45
IAN	72.16	66.07	81.39	55.68	80.17	55.92
BERT	76.54	72.48	79.39	64.75	75.51	63.83
GCAE	75.29	67.85	82.19	67.53	81.64	6971
AMEN-AMSAN	–	–	–	–	83.94	–
Tnet-ATT (+AS)	77.62	71.84	81.73	69.08	86.25	71.52
LAEC	–	–	83.17	68.09	–	–
MAN	78.13	73.2	82.65	69.1	85.87	73.28
PAHT	75.71	69.55	82.05	68.85	85.81	67.11
DA-LSTM (W/O D)	75.64	70.18	79.59	62.28	83.16	68.52
DA-LSTM (W/O A)	76.45	70.39	81.23	65.11	83.4	70.18
DA-LSTM (W/O D+A)	71.14	62.51	75.59	59.28	78.62	65.03
DA-LSTM	79.68	75.55	86.39	70.93	86.64	72.49

DA-LSTM performs better as against word representation approaches such as LSTM and BERT. On the one hand, LSTM only mines the long-term dependencies in the review without highlighting the sentiment word related to the aspect according to its characteristics. BERT has achieved quite high performance in many natural language processing tasks. However, like LSTM, it is difficult for BERT to analyze multiple aspects in a single sample separately, which affects its performance improvement in the aspect-based sentiment classification task. On the other hand, LSTM and BERT have obvious advantages in text semantic mining tasks and play a huge role in paragraph-level sentiment analysis. But in fine-grained sentiment analysis, DA-LSTM with stronger pertinence has better results.

Compared with the attention-based models such as IAN, TNet-ATT (+AS), LAEC and MAN, the maximum improvements of DA-LSTM on Laptop14, Restaurant15 are 7.52%, 5% in Accuracy and 9.48%, 15.25% in Macro-F1. Besides, it achieves a competitive result on Restaurant16. The above attention-based baselines take the review as a whole and adopt interactive learning, adversarial training and other techniques to seek sentiment words. These approaches disregard the interaction between different aspects, so it is easy to incorporate words that are not related to target aspect into representation, which lead to erroneous sentiment polarity prediction. DA-LSTM aims at retrieving the sentiment word most closely related to a single aspect in the review through attention mechanism, which avoids the interference of irrelevant sentiment words to a certain extent. The experimental results show that DA-LSTM does have a better ability to represent aspect sentiment words.

Compared with CNN-based approaches such as GCAE, AMEN-AMSAN and PAHT, the improvements of DA-LSTM on Laptop14, Restaurant15, Restaurant16 are up to 4.39%, 4.34%, 5% in Accuracy and 7.7%, 3.4%, 5.38% in Macro-F1. DA-LSTM adequately considers and mines the relationship between aspect and sentiment word in the review. Unlike models equipped with gating mechanisms and CNNs, DA-LSTM takes review as the basic unit for research and exploits dependency-relationship in the hidden layer to fully express semantic information between words, resulting in better word representation of DA-LSTM than the above baselines. After that, more targeted sentiment word is acquired according to aspect vector, which makes DA-LSTM have a stronger predictive ability of aspect sentiment polarity than GCAE and AMEN-AMSAN.

4.5 Ablation experiments

To further verify the advantages of dependency-relationship embedding and attention mechanism, we constructed an ablation experiment with the results shown in Table 2. Among them, “W/O D”, “W/O A” and “W/O D+A” denote “without dependency-relationship embedding”, “without attention mechanism” and “without dependency-relationship embedding and attention mechanism”, respectively. We can draw the following conclusions:

It can be seen from the experimental results that if two components or any one of them are removed, the performance of the model will show varying degrees of decline. The word hidden vector output by LSTM has lost some information, and the dependency-relationship embedding can supplement it with useful information to a certain extent. Besides, DA-LSTM assigns larger weight to sentiment word closely related to the target aspect through the attention mechanism, which effectively prevents the interference of irrelevant words. Therefore, both components play an important role.

Dependency-relationship embedding has a more important contribution than attention mechanism. For example, on the Laptop14, the Accuracy increase and Macro-F1 increase of the model with only the dependency-relationship are 5.31%, 7.88%, while 4.5% and 7.67% of the model with only the attention mechanism. The possible reason for this result is that the attention mechanism only redistributes weights for the existing information, but the dependency-relationship embedding adds useful information.

These components are more helpful for classification on datasets with unbalanced category distribution. For example, Restaurant15 has only 78 neutral aspects. On this dataset, the model with two components has a 10.8% Accuracy improvement and 8.65% Macro-F1 improvement over the model without any component, which is the highest among the three datasets. This proves that our model increases more available information for categories that contain only a few samples.

5 Conclusion

In this paper, we propose an aspect-based sentiment classification model (DA-LSTM) based on dependency-relationship embedding and attention mechanism. The word context representation generated by most existing approaches that do not highlight sentiment word closely related to target aspect. Based on the BI combination function, DA-LSTM combines the hidden vector output by the LSTM and the dependency-relationship embedding to form novel word vector with context and dependency semantics. Afterward, the weights assigned by attention mechanism cause more attention to the sentiment word related to the target aspect. Finally, the word representation calculated by the attention mechanism are provided to the softmax to perform aspect-based sentiment classification. Thus, the proposed DA-LSTM overcomes the drawbacks of existing methods and the experimental results on three real-world review datasets demonstrate its effectiveness compared with state-of-art baselines.

Footnotes

Acknowledgment

This research is partly supported by the National Natural Science Foundation of China (Grant No. 62072288, 61702306, U1435215 and 71772107), the National Key R&D Plan (No. 2018YFC0831002), the Taishan Scholar Program of Shandong Province (Grant No. ts20190936), the Natural Science Foundation of Shandong Province (Grant No. ZR2018BF013), Scientific Research Foundation of Shandong University of Science and Technology for Innovative Team (Grant No. 2015TDJH102).

References

Nasukawa

and Yi

, Sentiment analysis: Capturing favorability using natural language processing, in: Proceedings of the 2nd international conference on Knowledge capture (2003), 70–77.

Mullen

and Collier

, Sentiment analysis using support vector machines with diverse information sources, in: Proceedings of the 2004 conference on empirical methods in natural language processing (2004), 412–418.

Wang

S.I.

and Manning

C.D.

, Baselines and bigrams: Simple, good sentiment and topic classification, in: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2012), 90–94.

and Oh

A.H.

, Aspect and sentiment unification model for online review analysis, in: Proceedings of the fourth ACM international conference onWeb search and data mining, (2011), 815–824.

Zhao

, Zhou

, Qi

, Chang

and Zhou

, Inductive representation learning via cnn for partially-unseen attributed networks, IEEE Transactions on Network Science & Engineering 8(1) (2021), 695–706.

Zhao

, Zhou

, Li

, Tang

and Zeng

, DeepEmLAN: Deepembedding learning for attributed networks, Information Sciences 543 (2021), 382–397.

Zhao

, Zhang

, Zhou

, Li

, Gong

and Wang

, Hetnerec: Heterogeneous network embedding based recommendation, Knowledge-Based Systems 204 (2020), 106218.

Zeng

, Zhao

, Hu

, Duan

, Zhao

and Li

, Learning emotional word embeddings for sentiment analysis, Journal ofIntelligent and Fuzzy Systems 40(5) (2021), 9515–9527.

Hochreiter

and Schmidhuber

, Long short-term memory, Neural Computation 9(8) (1997), 1735–1780.

10.

Zhao

, Guan

, Chen

, He

, Cai

, Wang

and Wang

, Weakly-supervised deep embedding for product review sentiment analysis, IEEE Transactions on Knowledge and Data Engineering 30(1) (2017), 185–197.

11.

Cong

, Yuan

, Zhao

and Qin

, A joint model for sentiment classification and opinion words extraction, in: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, (2018), 337–347.

12.

Xue

and Li

, Aspect based sentiment analysis with gated convolutional networks, arXiv preprint arXiv:1805.07043.

13.

Noh

, Park

and Park

S.-B.

, Aspect-based sentiment analysisusing aspect map, Applied Sciences 9(16) (2019), 3239.

14.

Zhou

, Chen

, Huang

J.X.

, Hu

Q.V.

and He

, Position-awarehierarchical transfer model for aspect-level sentiment classification, Information Sciences 513 (2020), 1–16.

15.

Tang

, Mi

, Xue

and Cao

, Graph domain adversarial transfernetwork for cross-domain sentiment classification, IEEE Access 9 (2021), 33051–33060.

16.

Wang

, Huang

, Zhu

and Zhao

, Attention-based lstm for aspect-level sentiment classification, in: Proceedings of the 2016 conference on empirical methods in natural language processing, (2016), 606–615.

17.

Tang

, Qin

and Liu

, Aspect level sentiment classification with deep memory network, arXiv preprint arXiv:1605.08900.

18.

Wang

, Xu

, Zhang

, Sun

, Wang

and Huang

, Syntax-directed hybrid attention network for aspect-level sentiment analysis, IEEE Access 7 (2018), 5014–5025.

19.

, Wang

, Qiu

, Sangaiah

A.K.

, Lin

and Liaqat

H.B.

, Feature-based compositing memory networks for aspectbased sentimentclassification in social internet of things, Future Generation Computer Systems 92 (2019), 879–888.

20.

Tang

, Lu

, Su

, Ge

, Song

, Sun

and Luo

, Progressive self-supervised attention learning for aspect-level sentiment analysis, arXiv preprint arXiv:1906.01213.

21.

, Zhu

, Zhang

, Wu

and Guo

, Interactive rule attention network for aspect-level sentiment analysis, IEEEAccess 8 (2020), 52505–52516.

22.

Liu

, Liu

, Shi

, Wang

, Yin

and Zhao

, Aspect level sentiment classification with unbiased attention and target enhanced representations, in: Proceedings of the 35th Annual ACM Symposium on Applied Computing, (2020), 843–850.

23.

, Zhao

, Li

, Qi

and Wen

, Extracting product properties with dependency relationship embedding and conditional random field, Data Analysis and Knowledge Discovery 4(2020), 54–65.

24.

Pham

D.-H.

and Le

A.-C.

, Learning multiple layers of knowledge representation for aspect based sentiment analysis, Data &Knowledge Engineering 114 (2018), 26–39.

25.

, Li

, Zhang

and Wang

, Interactive attention networks for aspect-level sentiment classification, arXiv preprint arXiv:1709.00893.

26.

Devlin

, Chang

M.-W.

, Lee

and Toutanova

, Bert: Pretraining of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805.

27.

, Zhu

, Dai

and Yan

, Aspect-based sentiment classification with multi-attention network, Neurocomputing 388 (2020), 135–143.

Datasets	Aspects
	Positive	Negative	Neutral
Laptop14	1350	648	1014
Restaurant15	1662	759	78
Restaurant16	1898	858	611

Sentiment classification based on dependency-relationship embedding and attention mechanism

Abstract

Keywords

1 Introduction

2.1 Aspect-based sentiment classification with neural networks

2.2 Aspect-based sentiment classification with attention mechanism

3 DA-LSTM

3.1 Overall framework

4.1 Datasets

Table 1 Experimental data description Datasets Aspects Positive Negative Neutral Laptop14 1350 648 1014 Restaurant15 1662 759 78 Restaurant16 1898 858 611

4.4 Experimental results

5 Conclusion

Footnotes

Acknowledgment

References

Table 1
Experimental data description

Datasets Aspects

Positive Negative Neutral

Laptop14 1350 648 1014

Restaurant15 1662 759 78

Restaurant16 1898 858 611