Abstract
Aspect-based sentiment classification, a fine-grained sentiment analysis task, aims to predict the sentiment polarity for a specified aspect. However, the existing aspect-based sentiment classification approaches cannot fully model the dependency-relationship between words and are easily disturbed by irrelevant aspects. To address this problem, we propose a novel approach named Dependency-Relationship Embedding and Attention Mechanism-based LSTM. DA-LSTM first merges the word hidden vector output by LSTM with the dependency-relationship embedding to form a combined vector. This vector is then fed into the attention mechanism together with the aspect information which can avoid interference to calculate the final word representation for sentiment classification. Our extensive experiments on benchmark data sets clearly show the effectiveness of DA-LSTM.
Keywords
Introduction
Massive amounts of text data have been accumulated nowadays. In particular, various types of review data such as hotel reviews, product reviews, and hotspot reviews are closely bound up with people’s daily lives. Reviews are generally short but have strong opinions and emotions. Analyzing the emotions implied in these reviews helps us determine whether their subjects are positive or negative. However, it is very difficult to extract the emotional tendency from extensive review data in an artificial way. Sentiment analysis [1] in the field of natural language processing aims to interpret and classify emotions in subjective data, which can effectively solve such problems. It can be categorized into document-based, sentence-based, and aspect-based sentiment analysis according to the level of granularities of the existing research. For review data, aspect-based sentiment analysis has better directional analysis capabilities than the other two. For example, the sentence “This cup is small but of good quality.” would be assigned with negative polarity for aspect “cup” while with positive polarity for aspect “quality”.
Aspect-based sentiment analysis has been investigated mainly based on four approaches: feature-based, dictionary-based, traditional machine learning-based, and deep learning-based. Feature-based or dictionary-based methods first retrieve aspects and sentiment words in review and then calculate the sentiment polarity pointed to by sentiment words according to prior knowledge. Due to the use of prior knowledge such as sentiment dictionary, these methods dig out information in text quickly and achieve good performance. However, they are heavily dependent on external knowledge. The randomness of words used in review makes it difficult for them to extract infrequent sentiment words. Therefore, these methods cannot fully discover the aspects and corresponding sentiment words in the review.
Traditional machine learning-based sentiment analysis approaches construct features that express the semantic information of words and input these features into machine learning models such as SVM [2] and NB [3] to determine the aspect sentiment polarity. They also exploit models such as LDA [4] to automatically mine the sentiment information of aspects in review and classify these aspects. These methods focus on the semantics of the words themselves but ignore the semantic information existing between different words.
Recently, the application of deep learning has become more and more extensive, such as network representation learning [5, 6], recommendation system [7], and so on. Sentiment analysis methods based on deep learning also showed promising performance and attracted a great deal of attention [8]. However, these methods require abundant labeled data that is not easily available in reality. Therefore, some works try to solve this problem by introducing transfer learning and generative adversarial networks. Specifically, they first learn a traditional language model or an end-to-end encoder through extensive unlabeled data to represent text. Then they perform semi-supervised learning based on the encoding model through a little labeled data and fine-tune it to get the final classification model. These methods take local context information of the word into account while calculating the word representation to complete the aspect sentiment polarity prediction. However, due to the limitations of the neural network model, such approaches cannot mine the global context information of words and perform targeted analysis on specific aspects of review.
To solve the above problems, we propose an LSTM model based on dependency-relationship embedding and attention mechanism (DA-LSTM), which is applied for the sentiment classification task of product reviews. Figure 1 illustrates the architecture of a standard LSTM [9], which has an excellent performance in text semantic information mining. The proposed DA-LSTM takes the hidden vectors of sentiment words output by LSTM as its basic, and fuses them with the dependency-relationship through the BI function to preserve the dependency between words and the global context information. In addition, it takes the aspect vector into consideration, and finds the sentiment word most related to the target aspect through the attention mechanism. Finally, the obtained word representation is fed into the softmax function to predict the sentiment polarity of the aspect. The main contributions of this paper are summarized as: We propose a novel method for word vector generation. Different from the existing methods, our method fuses the dependency-relationship embedding with the hidden vector obtained by LSTM, which also preserves the dependency information and context information between words. We design an attention mechanism. The extracted aspect information and the combined word vector are fed into the attention mechanism, which adaptively assigns larger weight to word related to the target aspect to avoid interference from irrelevant words. We evaluate experiments on three real-world review datasets, namely Laptop14, Restaurant15, Restaurant16. Experiments demonstrate that the proposed model achieves excellent performance compared to baselines.

The architecture of a standard LSTM. {ws0, ws1, ⋯ , w sn } represent the word vector set of a review whose length is n. {h0, h1, ⋯ , h n } denote the hidden vector set.
Aspect-based sentiment classification with neural networks
Deep learning has achieved fantastic performance in aspect-based sentiment analysis. Zhao et al. [10] proposed convolutional feature extraction and Long Short-Term Memory, an underlying network structure for performing text sentiment analysis. They exploited a neural network to learn high-level representations (that is, embedding space) of text to achieve the purpose of predicting the sentiment polarity. Cong et al. [11] modeled sentiment polarity in text based on bi-directional LSTM. Xue et al. [12] added two convolution layers to the embedding layer of the CNN model. Based on the given aspect information, words related to aspects are automatically extracted, so that CNN can be used to predict the sentiment information of aspects. The idea of Noh et al. [13] is to represent aspects through location information and utilize CNN to construct aspect maps and extract aspect sentiment words. Zhou et al. [14] considered the location information of each aspect in their model and enhanced the aspect-based sentiment classification performance by transferring hierarchical knowledge from the resource-rich sentence-based sentiment classification dataset. Tang et al. [15] proposed a network architecture based on the idea of adversarial learning, which uses labeled source domain data to predict the sentiment label of unlabeled target domain data.
Deep learning has an outstanding performance in capturing semantic information of words, but CNN can only capture local features of words. Due to the variable length of the reviews, CNN tends to ignore emotional words which are related to semantic but far away from aspects. LSTM can capture the contextual information of words, but aspect sentiment words that have an important influence on aspect-based sentiment classification are not fully represented. The aforementioned limitations of these methods hinder the performance improvement of aspect-based sentiment classification.
Aspect-based sentiment classification with attention mechanism
With the continuous research of attention mechanism, aspect-based sentiment analysis based on attention mechanism has become a popular research work. Wang et al. [16] proposed an attention-based LSTM model for sentiment classification. Assuming that sentiment representations of several aspects are used as input, the attention mechanism would focus on the sentiment aspects that are beneficial to the classification results. Tang et al. [17] proposed a multi-layer deep learning model with shared parameters. This model utilize the attention mechanism to learn the context weight of words in text based on location and content information, and calculate the text representation through the weight. The representation of the last layer is used for sentiment classification. Wang et al. [18] proposed a mixed attention mechanism, in which local attention is applied to acquire word information with syntactic dependency, global attention is applied to retrieve all words in the text and acquire word context information. Finally, a gated mechanism is designed for aspect-based sentiment analysis. Ma et al. [19] designed a strategy that combines feature representation and word embedding to enhance the attention mechanism, and presented a feature-based composite memory network to solve the fine-grained sentiment classification problem. Tang et al. [20] extracted the word with the maximum attention weight from the text as seed information and masked it. They repeated this process to expand the seed word set and utilized regularization to standardize the results. Finally, the obtained word features are input into the sentiment analysis model to predict aspect sentiment polarity. Lu et al. [21] used general extractor, sentiment extractor, negative extractor, and degree extractor to model the hidden state of the text. Then, they designed an interactive rule attention mechanism to learn the association information between context and aspects, thereby improving the performance of aspect-based sentiment analysis. Liu et al. [22] introduced an adversarial training method to get unbiased attention which avoids excessive attention to sentiment words unrelated to the target aspect. Then, they proposed an Embedding-Preserving Gating Mechanism to dynamically incorporate target-related features into word representation and retain original word information.
The advantage of the attention mechanism is that it can mine the key information in texts. However, there are many words used to describe different aspects in the review, which causes the attention mechanism to be easily disturbed by other aspects when calculating the weight of the target aspect. In addition, attention mechanism is often applied together with neural networks. When calculating the attention weight based on the representation output by the neural network, the sentiment word information related to the target aspect may have been discarded.
For the sake of breaking the above limitations of neural networks in sentiment classification and attention mechanism in aspect sentiment word mining, we first combine dependency-relationship embedding and hidden vector. Then, the combined vector and aspect vector are fed into attention mechanism to calculate the final word representation for sentiment classification. Dependency-relationship embedding increases the syntax dependency information between words and aspect vector to avoid the interference of other aspects on the target aspect. In this way, the attention mechanism only considers sentiment words related to the target aspect and gives a more accurate representation of sentiment words. Finally, the obtained word representation is input into the softmax function, and the aspect-based sentiment classification is carried out by probability distribution.
DA-LSTM
In this section, we first provide the overall structure of the proposed DA-LSTM and then introduce the details of DA-LSTM in sentiment word extraction and aspect-based sentiment classification.
Overall framework
The proposed DA-LSTM model first obtains the hidden vector that encodes context information through LSTM. Then, the hidden vector and dependency-relationship embedding are combined to form the combination vector which contains both the context information and dependency-relationship. The aspect information extracted from the review is provided to the attention mechanism together with the combination vector to mine the word information closely related to the target aspect. Finally, the aspect-based sentiment classification is performed through the calculated word representation. This process is shown in Algorithm 1. The framework of the proposed DA-LSTM is shown in Fig. 2.

The overall framework of the proposed DA-LSTM. {e w s0 , e w s1 , ⋯ , e w sn } represents the word dependency-relationship vector set. {c w s0 , c w s1 , ⋯ , c w sn } denotes the combination vector set. e aspect is aspect vector. γ is a vector consisting of attention weights and r is a weighted representation of a review with a given aspect. h′ denotes the final feature vector related to the target aspect in review.
Each word in the review has a dependency-relationship with its previous and subsequent words. LSTM can preserve the long-term dependencies in the review and effectively link historical information. Due to the existence of the input gate and the forget gate, LSTM automatically judges the availability of information to save useful information and discards useless information. In addition, LSTM can avoid the problem of vanishing or exploding gradient. Therefore, most models used for sentiment analysis are based on LSTM layers, through which sentiment information is extracted from word embedding.
Due to the above advantages of LSTM, we adopt it to extract sentiment information in this paper. The result of aspect-based sentiment classification depends on the sentiment polarity of word related to the target aspect. LSTM encodes the context information of the words in the review, but cannot highlight sentiment word that can reflect the specific aspect. Therefore, we fuse the dependency-relationship embedding and the hidden vector output by LSTM, so that the sentiment words are fully semantically represented in the combined vector. To ensure that there is a close relationship between the highlighted sentiment word and the target aspect, we introduce the aspect vector and utilize the attention layer to ensure the collocation of the aspect and sentiment words.
The function f
y
learns the interaction information and captures the useful combination information between the two vectors. f
y
performs a weighted addition operation on h
n
and e
w
sn
to generate a combined vector c
w
sn
, which not only contains the context information of the word but also highlight the other word information related to the aspect. c
w
sn
is computed according to the composition function as follows:
There are several aspects in the review. However, the combination vector c w sn can’t identify the target aspect in the review and the sentiment word closely related to the target aspect. Besides, sentiment words unrelated to the target aspect will affect the sentiment polarity prediction. The attention mechanism can directionally capture word closely related to the aspect in the review. Therefore, in addition to the combination vector c w sn , we also consider the aspect vector and attention mechanism to assign larger weight to the sentiment word related to the target aspect.
We follow the method proposed by Li et al. [23] to obtain the aspect set in review and encode each aspect by a one-hot vector. We first retrieve the number of aspects based on the aspect set. Then the aspect vector e
aspect
is generated for each aspect according to the position factor, which indicates its position in the review. Finally, aspect vector e
aspect
and combination vector c
w
sn
are input into attention mechanism to generate attention weight vector γ and weighted hidden representation r. Inspired by Wang et al. [16], γ and r are calculated by:
In this paper, the parameters used in our model are gradually updated through gradient descent. Since the set of class labels is {positive, negative, neutral}, sentiment classification is actually regarded as a multi-class classification problem. Therefore, we adopt cross-entropy as the loss function. Cross-entropy is used to measure the distance between two probability distributions, and it can effectively avoid the problem of reduced learning rate during gradient descent.
In this section, we conduct experiments on the SemEval dataset and compare the DA-LSTM with representative aspect-based sentiment classification methods to verify its effectiveness. Note that all input word vectors are initialized by Glove in our experiments.
Datasets
SemEval is an authoritative public dataset in the field of aspect extraction. Laptop14 is a review dataset for laptops released by SemEval in 2014, Restaurant15 and Restaurant16 are review datasets for restaurants released by SemEval in 2015 and 2016, respectively. We extract aspects from reviews of laptop and restaurant on these datasets and identify the aspect sentiment polarity. The validity of the proposed approach DA-LSTM is verified by aspect-based sentiment classification results on three datasets. The detailed descriptions of datasets Laptop14, Restaurant15, and Restaurant16 are shown in Table 1.
Experimental data description
Experimental data description
Note that this paper merges the training set and test set of Laptop14, Restaurant15, and Restaurant16 respectively, and inputs the merged dataset into DA-LSTM to predict the sentiment polarity of aspects in reviews.
The analysis of performance is done with performance measures named, Accuracy, Macro-F1, which are detailed in this section.
The proposed DA-LSTM is compared with the following state-of-the-art approaches for aspect-based sentiment classification.
LSTM [19]: The standard LSTM takes the last hidden vector as sentence representation and linearizes the vector to make its dimension equal to the length of the classes. Then, the softmax activation is performed based on the final vector to complete sentiment polarity recognition.
IAN [25]: This approach interactively learns context information and aspect information, and an attention mechanism is applied to generate the representation of aspect and its context. Then, they are concatenated with semantic features to predict the sentiment polarity of aspect.
BERT [26]: A language representation approach that utilizes bidirectional transformer to model word representation. BERT inserts a specific token at the beginning of each sequence and adds a fully connected layer at the position of the token in the last encoding layer. The output hidden vector is sent to softmax to complete the aspect-based sentiment classification.
GCAE [12]: A sentiment analysis model based on CNN and gating mechanism. The two convolution layers added on top of the embedding layer are used as gating units. Each gating unit has two nonlinear gates linked to the convolution layer. The gating unit can effectively extract multi granularity n-gram features in each receptive field, and automatically extract the sentiment information of a specific aspect to predict its sentiment polarity.
AMEN-AMSAN [13]: A fine-grained sentiment analysis model based on a dual CNN model. The first CNN extracts the location information of the aspect and obtains an aspect map through weakly supervised learning. Then the second CNN performs aspect-based sentiment classification through the aspect map.
TNet-ATT(+AS) [20]: A progressive self-supervised attention learning approach based on the ASC model. This method automatically mines useful attention supervision information from the training corpus in an iterative manner and uses regularization to expand words in the corpus that have a sentiment impact on the aspect.
LSTM+AdvATT+EPGating+CNN(LAEC) [22]: Two mechanisms are designed for fine-grained sentiment analysis. The unbiased attention mechanism makes it fairer in searching for aspect sentiment words through antagonistic training. The embedded gating mechanism adjusts the word vector according to the relevance of the word and the target. In addition to preserving the original semantic information, the obtained vector supplement context sentiment information.
MAN [27]: A multi-attention network for aspect-based sentiment classification. Intra-level attention employs a transformer to encode the input sentence in parallel and preserve long-distance sentiment relations, while inter-level attention uses a global and a local attention module to capture the entire relation and interactions at word level.
PAHT [14]: A hierarchical transfer model based on position-aware, which models multi-level position information by transferring hierarchical knowledge from the dataset. It presents aspect-based positional attention in the word and the segment levels and transfers the knowledge learned through pre-training into ASC from four levels: embedding, word, segment and classifier. This captures salient information in a specific aspect and makes up for the limited data for ASC.
Experimental results
The experimental results of all approaches on Laptop14, Restaurant15, and Restaurant16 are shown in Table 2, and the evaluation index is Accuracy and Macro-F1. The experimental results indicate that the DA-LSTM model is significantly superior to other approaches. Compared with other approaches, the maximum improvements of DA-LSTM on Laptop14, Restaurant15, Restaurant16 is 13.23%, 10.8%, 13% in Accuracy and 16.13%, 18%, 20.04% in Macro-F1. DA-LSTM shows excellent performance in aspect-based sentiment classification tasks, specifically:
Experimental results (evaluating metric:Acc and Macro-F1)
Experimental results (evaluating metric:Acc and Macro-F1)
DA-LSTM performs better as against word representation approaches such as LSTM and BERT. On the one hand, LSTM only mines the long-term dependencies in the review without highlighting the sentiment word related to the aspect according to its characteristics. BERT has achieved quite high performance in many natural language processing tasks. However, like LSTM, it is difficult for BERT to analyze multiple aspects in a single sample separately, which affects its performance improvement in the aspect-based sentiment classification task. On the other hand, LSTM and BERT have obvious advantages in text semantic mining tasks and play a huge role in paragraph-level sentiment analysis. But in fine-grained sentiment analysis, DA-LSTM with stronger pertinence has better results. Compared with the attention-based models such as IAN, TNet-ATT (+AS), LAEC and MAN, the maximum improvements of DA-LSTM on Laptop14, Restaurant15 are 7.52%, 5% in Accuracy and 9.48%, 15.25% in Macro-F1. Besides, it achieves a competitive result on Restaurant16. The above attention-based baselines take the review as a whole and adopt interactive learning, adversarial training and other techniques to seek sentiment words. These approaches disregard the interaction between different aspects, so it is easy to incorporate words that are not related to target aspect into representation, which lead to erroneous sentiment polarity prediction. DA-LSTM aims at retrieving the sentiment word most closely related to a single aspect in the review through attention mechanism, which avoids the interference of irrelevant sentiment words to a certain extent. The experimental results show that DA-LSTM does have a better ability to represent aspect sentiment words. Compared with CNN-based approaches such as GCAE, AMEN-AMSAN and PAHT, the improvements of DA-LSTM on Laptop14, Restaurant15, Restaurant16 are up to 4.39%, 4.34%, 5% in Accuracy and 7.7%, 3.4%, 5.38% in Macro-F1. DA-LSTM adequately considers and mines the relationship between aspect and sentiment word in the review. Unlike models equipped with gating mechanisms and CNNs, DA-LSTM takes review as the basic unit for research and exploits dependency-relationship in the hidden layer to fully express semantic information between words, resulting in better word representation of DA-LSTM than the above baselines. After that, more targeted sentiment word is acquired according to aspect vector, which makes DA-LSTM have a stronger predictive ability of aspect sentiment polarity than GCAE and AMEN-AMSAN.
To further verify the advantages of dependency-relationship embedding and attention mechanism, we constructed an ablation experiment with the results shown in Table 2. Among them, “W/O D”, “W/O A” and “W/O D+A” denote “without dependency-relationship embedding”, “without attention mechanism” and “without dependency-relationship embedding and attention mechanism”, respectively. We can draw the following conclusions: It can be seen from the experimental results that if two components or any one of them are removed, the performance of the model will show varying degrees of decline. The word hidden vector output by LSTM has lost some information, and the dependency-relationship embedding can supplement it with useful information to a certain extent. Besides, DA-LSTM assigns larger weight to sentiment word closely related to the target aspect through the attention mechanism, which effectively prevents the interference of irrelevant words. Therefore, both components play an important role. Dependency-relationship embedding has a more important contribution than attention mechanism. For example, on the Laptop14, the Accuracy increase and Macro-F1 increase of the model with only the dependency-relationship are 5.31%, 7.88%, while 4.5% and 7.67% of the model with only the attention mechanism. The possible reason for this result is that the attention mechanism only redistributes weights for the existing information, but the dependency-relationship embedding adds useful information. These components are more helpful for classification on datasets with unbalanced category distribution. For example, Restaurant15 has only 78 neutral aspects. On this dataset, the model with two components has a 10.8% Accuracy improvement and 8.65% Macro-F1 improvement over the model without any component, which is the highest among the three datasets. This proves that our model increases more available information for categories that contain only a few samples.
Conclusion
In this paper, we propose an aspect-based sentiment classification model (DA-LSTM) based on dependency-relationship embedding and attention mechanism. The word context representation generated by most existing approaches that do not highlight sentiment word closely related to target aspect. Based on the BI combination function, DA-LSTM combines the hidden vector output by the LSTM and the dependency-relationship embedding to form novel word vector with context and dependency semantics. Afterward, the weights assigned by attention mechanism cause more attention to the sentiment word related to the target aspect. Finally, the word representation calculated by the attention mechanism are provided to the softmax to perform aspect-based sentiment classification. Thus, the proposed DA-LSTM overcomes the drawbacks of existing methods and the experimental results on three real-world review datasets demonstrate its effectiveness compared with state-of-art baselines.
Footnotes
Acknowledgment
This research is partly supported by the National Natural Science Foundation of China (Grant No. 62072288, 61702306, U1435215 and 71772107), the National Key R&D Plan (No. 2018YFC0831002), the Taishan Scholar Program of Shandong Province (Grant No. ts20190936), the Natural Science Foundation of Shandong Province (Grant No. ZR2018BF013), Scientific Research Foundation of Shandong University of Science and Technology for Innovative Team (Grant No. 2015TDJH102).
