An Iterative Multi-aspect term Extraction and Polarity Detection Approach Based on spanBERT for Aspect-based Sentiment Analysis

Abstract

Aspect-based sentiment analysis (ABSA) is a fine-grained analysis of sentiments and opinions expressed in text document which set aside the naïve assumption that a document expresses opinion about a single topic or aspect. The main objective of ABSA is to spot, extract and identify the polarity of different entities and aspects in an opinionated document. Based on the previous works, ABSA can be categorized into three subtasks: Aspect-category sentiment analysis (ACSA), Opinion Target Expression Sentiment Analysis (OTESA) and Aspect-term sentiment analysis (ATSA). This research presents an end-to-end multi-task approach to performing the three categories of ABSA on a single pipeline. A ternary multitask learning objectives classifiers were built on top of the baseline spanBERT language model which was originally pretrained for span extraction. The input to the model consists of two merged segments of entity premises and context data hypothesis in a similar passion to reading comprehension downstream task in natural language processing. The ternary downstream tasks were built on the contextualized output embeddings of pretrained spanBERT entangled with cross-layer attention mechanism to associate context with the aspect-term span extraction, aspect sentiment polarity detection and entity-aspect entailment. A span masking approach was also proposed to address multiple-aspects text using an iterative outputs-inputs loopback. The span masking process replaces each word in a previously detected span of text with a special [MASK] character and then feeds back the entire sentence into the encoder input of the model for next run. The technique forces the encoder to look elsewhere for the next span prediction. The loopback span masking terminates when the span classifiers predict a special token [CLS] as the beginning and end of the span signaling the absent of relatable span to be extracted. Experimental results validate the approach as impressive results were obtained outperforming most of the compared research with benchmark ABSA datasets.

Keywords

BERT spanBERT aspect-based sentiment analysis sentiment analysis

Introduction

Aspect-based sentiment analysis (ABSA) is a subset of sentiment analysis that deals with more granular and in-depth analysis of the various aspects of the language and its associated expressed sentiments. In contrast to the traditional sentiment analysis, which naively assumes that the entire text expressed opinion or sentiment on single subject or aspect and therefore merely attempt to classify text as positive, negative, or neutral, ABSA dives deeper to analyze the different and specifics topics, aspects and attributes being discussed in the text and their associated context and then evaluates sentiment on each aspect individually. This granular approach not only provides more detailed insights but also offers valuable information for businesses, marketers, and decision-makers to understand customer opinions more comprehensively and realistically on diverse aspects of their products, services, or policies (Zhang et al., 2023).

However, in spite of its enormous potential, ABSA presents a plethora of challenges for researchers in the field of NLP. One of the primary challenges is the need for accurate aspect extraction. Identifying and extracting the relevant aspects from text, especially in domains with diverse and evolving vocabularies, remains a daunting task. Researchers grapple with developing algorithms and techniques that can effectively capture the subtle nuances of language and accurately identify the aspects under discussion. Another significant challenge lies in sentiment analysis at the aspect level. Analyzing sentiment within the context of specific aspects requires a deeper understanding of language semantics and context. Researchers face the dilemma of developing models that can accurately capture the sentiment polarity expressed towards each aspect while accounting for variations in language usage, cultural nuances, and contextual ambiguity (Bordoloi & Biswas, 2023; Zhang et al., 2023).

Research in sentiment analysis has blossomed in recent times due to humongous availability of online user expressed opinions about products, businesses, services, and policies. Business, organizations, and agencies are taking advantage of the situation by using users’ feedback to better understand the general perception about their products or policies and enable them to make future decisions and policy adjustment. Sentiment analysis has been one of the effective methods which provides polarity analysis to such voluminous opinionated texts. Sentiment analysis models have evolved over the years and are implemented at various granular levels according to the scope and requirements. Generally, four major levels at which sentiment analysis is implemented can be identified (Bordoloi & Biswas, 2023). These levels include the document level (Li & Li, 2013; Pang et al., 2002 sentence or phrase level Nguyen & Nguyen, 2018; Wilson et al., 2005, word level Bollegala et al., 2013; Li et al., 2014, and entity or aspect level Li & Lu, 2017; Quan & Ren, 2014).

The standard sentiment analysis method also known as coarse-grained method, covers the first three levels. The general assumption in the coarse-grained sentiment analysis is that the entire text or document expresses opinion about a single topic and therefore labeled the document as either having negative, positive, or neutral sentiment towards the topic. However, modern businesses and services require fine-grained analysis where for a conglomerate corporation, users might express composite of opinions about different products or topics in the same text. Therefore, entities, and their aspects or attributes need to be analyzed independently for more accurate and informative sentiment polarity detection. This gives an accurate reflection of the sentiments about individual topics or products and equip business owners and policy makers to isolate aspect of the business or policy for further actions. Aspect-based sentiment analysis is the subtask of sentiment analysis which directly deals with the fine-grained level analysis (Liu & Zhang, 2012). It is fundamental for businesses success to listen to their clients, understand what exactly the customer is saying and engage when it is necessary. ABSA approach to sentiment analysis of textual data provides valuable insights via processing of huge online customers’ feedbacks on social media platforms such as Twitter, Facebook and many more, where people regularly post their opinions on all kinds of businesses.

Inspired by the work in (Pontiki et al., 2016), number of researches classified ABSA into two or three categories in accordance with two subtasks describe in (Pontiki et al., 2016). In our opinion and based on the two subtasks, ABSA can be categorized into three subclasses thus: (1) Aspect-Category Sentiment Analysis (ACSA) (2) Opinion Target Expression Sentiment Analysis (OTESA) and (3) Aspect-Term Sentiment Analysis (ATSA). ACSA and OTESA corresponds to the slots 1 and 2 of the subtasks1 with slot 3 (polarity detection) being collapsed into one subclass and ATSA corresponds to the subtask2 as described in (Pontiki et al., 2016). In ATSA the challenge is, given an opinionated user's text or review about a target entity (e.g., Restaurants, mobile phone, manufacture etc.) is to identify/extract the various aspects or attributes (e.g., battery, screen, software for entity Mobile phone) of the entity discussed by the user in the text and the polarity of the sentiments expressed towards those aspects. In ACSA, given a predefined set of entities $E = {E_{1}, E_{2}, E_{3}, . . E_{n}}$ and aspect categories $A = {A_{1}, A_{2}, A_{3}, . . A_{n}}$ and an opinionated user text, the task is to return a matching tuple pairs ${E_{1}, A_{1}}$ , ${E_{2}, A_{2}}$ et al. which were discussed in the sentence and also identify the polarity of each ${E_{1}, A_{1}}$ category pair. The last one OTESA is very similar to coreference task in LNP.

The objective is given a user review about an entity-aspect category pair, is to extract all the different parts of the text which discuss a predetermine entity-aspect category pair and the overall polarity of extracted parts. While ATSA uses an extractive approach to identify the aspect-term as it appears in the text, ACSA must have a deeper understanding of semantic relations between the predefined ${E_{n}, A_{n}}$ pairs and the context in which they were used in the text. For instance, for a predefined entity-aspect category pair/tuple say ${E_{1}, A_{1}} = {R e s t a u r a n t, S e r v i c e}$ , if there is an instance in the user's text where a mention of “…poor delivery” is spotted when discussing an entity Restaurant, even though there were no specific mention of the aspect Service, ACSA should identify the term delivery as referring to the aspect-category service as both have similar semantic implication despite the syntactic difference (Gou et al., 2023; Mao et al., 2022).

The three major contribution of this research includes (1) proposing a ununified framework for conducting the three categories of ABSA (i.e., ACSA, OTESA and ATSA) sentiment analysis in a single pipeline using cross-attention mechanism. (2) propose a novel approach to multiple-aspects detection and extraction with associated context in text using exhaustive masking iterative approach. (3) Proposes an approach to automatic generation of ground-truth training data from public ABSA dataset using masking for an effective multiple-aspect detection within a text.

Literature Review

ABSA task is a subcategory of sentiment analysis which poses enormous challenge and requires an approach with enriched linguistic understanding. It is no surprise that most of the research in this field employs natural language processing techniques using time-series deep recurrent neural networks or self-attention-based deep neural networks or a combination of both (Augustyniak et al., 2021). These algorithms have recorded state-of-the art performance outperforming humans in many tasks. There are sizeable number of models that rely on gated recurrent deep neural networks with attention mechanism like long-short-term memory (LSTM) layers (Hochreiter & Schmidhuber, 1997), Gated Recurrent Units (GRU) (Gao & Glowacka, 2016). They extract opinionated information from embedding vectors, and then apply attention mechanisms (Bahdanau et al., 2016) to ensure that the models focus or pay attention to a span of texts relevant to the given entity or aspect of interest. Few amongst these models are Attention-based LSTM with Aspect Embedding (Wang et al., 2016) which was used for ACSA; Target-Dependent Sentiment Classification (Tang et al., 2016), Gated Neural Networks (Zhang et al., 2016) and Recurrent Attention Memory Network (RAM) (Chen et al., 2017) for ATSA. (Gao & Glowacka, 2016) proposes a cross-domain tagging with adversarial training. (Li et al., 2019a) presented a model with a two-layer LSTM, where the lower LSTM is trained for Aspect Term Extraction, ATE and upper LSTM for aspect-polarity unified tagging. The gates in those networks controls what information is to retain as the vector propagates through the network and therefore serves an important memory for short and long dependency within the text. The self-attention mechanism plays a vital role for computing a contextualized vector and performs similarity scores computation between the contextualized vectors and target vectors which is important for context and attention association. The context vectors basically encode both the aspect and sentiment polarity information, and the similarity scores are applied across all feature dimensions irrespective of the distinctive difference between the aspect and sentiment polarity (Wang et al., 2021). One of the fundamental shortcomings of those models (recurrent Neural Network) is the sequential nature where tokens are processed one at a time and these result to huge computational cost and time consumption. As the model size grows the attention layer computation grows exponentially and so as the normalization of all similarity scores of the entire tokens in the sentence (Wang et al., 2016). In addition, some models such as one proposed in (Chen et al., 2017), to produce weighted LSTM, they require positional information between words and targets which could be problematic and unreliable with noisy inputs. More sophisticated LSTM cells and attention mechanisms can guarantee higher accuracy, but it comes with huge computation and memory cost. (Peters et al., 2018) proposed deep contextualized word representations called the model ELMo. This word embedding technique creates vector space using bidirectional LSTMs trained on a language modeling objective.

With popularization of self-attention networks proposed by (Vaswani et al., June 2017), where recurrent operation was abolished to address most of the inefficiency of the recurrent neural networks such as long dependency and propel evaluation of new models in ABSA. Pretrained models such as Bidirectional Encoder Representations from Transformers, BERT (Devlin et al., 2019) which were trained on huge corpus of data for general language understanding. BERT uses self-supervised training method using “Masked Language Modeling, MLM” and “Next Sentence Prediction, NSP”. Another pretrained language model called XLNet (Yang et al., 2019) emerged which aimed at improv BERT by introducing a variant of language modeling called “permutation language modeling”. Instead of predicting masked words independently and in a left to right manner as in BERT, the XLNet model predicts target words based on different orders of source words (no strict left to right order). spanBERT (Joshi et al., 2020) is another pretrained model inspired by BERT, it developed a pretraining method which improve on BERT's two training objectives (MLM and NSP) with span MLS and span Boundary objective. The model used a well-trained BERT and emphasized on extraction on span on text for the task at hand. (Li et al., 2019c) leverages pretrained model BERT for End-to-End-ABSA. Authors in (Søgaard & Goldberg, 2016) design multiple lower-level tasks with task-related knowledge to assist the main task learning. (Li & Xiao, 2020) implemented a for span extraction and propagandist detection in news article by building the model for Span Identification (SI) based on SpanBERT. They developed a hybrid model for the Technique Classification (TC) which composed of three sub-models including two BERT models with different training methods, and a feature-based Logistic Regression model. (Wang et al., 2021) proposed an End-to-End Aspect-based Sentiment Analysis with Hierarchical Multi-task Learning auxiliary tasks and the main task into a hierarchical multi-task learning framework. Multi-task Learning (MTL) enables training of multi-task problem like ABSA to be combined in the same model. MTL is a learning paradigm in machine learning which enables a joint learning of multiple tasks (Caruana, 1997) with aim of improve correlational and general understanding and performance of the model in all related tasks. This type of training has been in various tasks (Rei, 2017; Yang et al., 2017). However, most of these works combine multiple losses applied to the output layer. The idea of multi-task learning with low-level auxiliary tasks is first introduced by (Søgaard & Goldberg, 2016). They find that low-level tasks are better kept at the lower layers, enabling the higher-level tasks to make use of lower-level tasks’ knowledge.

Inspired by the aforementioned works, we proposed a multi-tasking learning approach to ABSA based on a baseline spanBERT pretrained language model capable of handling the three aspects of ABSA in a single pipeline using cross-attention layer. A ternary downstream task was built on the contextualized output embeddings of pretrained spanBERT to perform (1) aspect-term span extraction, (2) aspect sentiment polarity detection and (3) entity-aspect entailment which are combined to the three ABSA subtasks. A span masking approach was also proposed to address multiple-aspects text using an iterative outputs-inputs loopback. The span masking process replaces each word in a previously detected span of text with a special character [MASK] and then feeds back the entire sentence into the encoder input of the model for next run. The technique forces the encoder to look elsewhere for the next span prediction. The loopback span masking terminates when the span classifiers predict a special token [CLS] as the beginning and end of the span which signifying absent of the relatable span to be extracted. To achieve the multiple-aspects extraction, we implemented ground-truth dataset refinement to conform with the proposed iterative approach.

Proposed Method

In this section we present the details of the overall proposed pipeline for performing the three categories of the ABSA task as depicted in Figure 1. The input to the model consists of two merged segments of entity premises and aspect-hypothesis similar to reading comprehension task. These inputs segments are fed into the baseline spanBERT which computes the contextualized output embeddings. The contextualized output embeddings are then used by three pipelines for Entity-Aspect entailment (EAE), Aspect-term Span Extraction (ATSE) and cross-layer attention which together with intermediary output of ATSE compute the Aspect Sentiment Polarity (ASP). Details explanation of the of the various parts are provided in the relevant subsections below.

Figure 1.

Proposed Framework Iterative Multi-Aspect Term Extraction and Polarity Detection with spanBERT for ABSA Tasks.

Multitask Input Framework

The input to the model is similar to reading comprehensive question and answering in downstream task as first suggested in (Devlin et al., 2019), where a merged single input is used with segment embedding being added to differentiate premises/question segment from the hypothesis/ paragraph segment. In our case the premises are the entity whereas the context text or the users’ reviews are used as the hypothesis. Given a sequence of entity tokens $E = (e_{1}, e_{1}, e_{1}, \dots e_{n})$ of length n, $E \in R^{n \times h}$ with each token represented by a WordPiece embedding (Zhang et al., 2019) vector of length h and sequence hypothesis or review $R = (r_{1}, r_{1}, r_{1}, \dots r_{m})$ of length m, ( $m ≫ n$ ), we defined the input sequence tokens X as a concatenated vectors of E and R as follows: $X = {[C L S], E, [S E P],$ $H, [S E P]}$ $X \in R^{s \times h}$ where $s = m + n + 3$ . The tokens $[C L S] a n d [S E P])$ are special tokens used for classification and segment separation respectively. Therefore, the spanBERT model baseline inputs token sequence $X ∋ (E, R)$ which produces contextualized output sequence representation $H \in R^{s \times h}$ of the input as depicted in Figure 2.

Figure 2.

Multitask Input Framework for the Proposed System.

Multitask Training Objective and Data Preparation

A ternary learning objective function is proposed to perform three tasks of aspect term extraction $σ_{s}$ , entity-attribute entailment classification $σ_{e}$ and the aspect sentiment polarity classification $σ_{p}$ . For instance, for a given multi-aspect-hypothesis input, “Great design but it has low battery capacity!” we can define target output $T_{j} = (σ_{e}, σ_{s}, σ_{p})$ and additional training target ground-truth can be generated from the aspect-hypothesis to curate the input sequence $X_{j}$ for training the proposed model as shown in Table 1.

Table 1.

Input Data and Targets Preparation.

$X_{j}$		$T_{j}$
Entity-premises	Aspect-hypothesis	$σ_{s}$ (Start index, end index)	$σ_{e}$	$σ_{p}$
Laptops	Great design but it has low battery capacity.	$[(n + 2 + m_{i}), (n + 2 + m_{j})]$	True	positive
Laptops	Great [MASK] but it has low battery capacity.	$[(n + 2 + m_{i}), (n + 2 + m_{j})]$	True	negative
Laptops	Great design but it has low [MASK] capacity.	$[(n + 2 + m_{i}), (n + 2 + m_{j})]$	True	positive
Laptops	Great [MASK] but it has low [MASK] capacity.	$[1, 1]$	True	conflict
Restaurants	Great design but it has low battery capacity.	$[(n + 2 + m_{i}), (n + 2 + m_{j})]$	False	positive
Restaurants	Great [MASK] but it has low battery capacity.	$[(n + 2 + m_{i}), (n + 2 + m_{j})]$	False	negative
Restaurants	Great design but it has low [MASK] capacity.	$[(n + 2 + m_{i}), (n + 2 + m_{j})]$	False	positive
Restaurants	Great [MASK] but it has low [MASK] capacity	$[1, 1]$	False	conflict

Iterative Multi-Targets Aspect-Term Span Extraction

The goal is to extract a contiguous span of tokens from the inputs sequence which represent aspect-term or attribute of a given entity. (Seo et al., 2017; Xu et al., 2018) works in extractive question and answering in reading comprehensive task where the task is to extract a continuous span of text from the document as the answer to a particular question. (Lee et al., 2016; Rajpurkar et al., 2016) studied several techniques of span extraction which include BIO prediction which used sequence tagging concept and, boundary prediction. They concluded that span boundary prediction methods have numerous advantages over the sequence-tagging methods. This observation was revalidated by the work in (Wang & Jiang, 2017) as applied to answer prediction methods. Some of the shortcomings of the above span extraction methods are their limitation to extracting one span within a sequence. (Hu et al., 2019) proposed multi-targets span extraction method to extract multiple opinion targets from a given sequence for ABSA polarity classification. They used the boundary prediction approach for span extraction as suggested in extractive question answering (Seo et al., 2017; Wang & Jiang, 2017; Xu et al., 2018). During training, they generated two separate start and end boundary binary sparse vectors for each sequence. They assigned logical ones to tokens within a sequence that corresponds to the span starts and span ends while assigning zeros values to the rest of sequence. Thereafter, they built two classifiers that minimize the log probabilities between the labels and predictions. At inference, they proposed propose a heuristic multi-span decoding algorithm which takes the top-K indices chosen from the two predicted and candidate span scores with its heuristic regularized.

Our approach to multi-targets span extraction also uses boundary prediction but extracts one span at a time iteratively until all spans are exhausted (which is determined when CLS token is predicted as the span index). The concept forces the model to concentrate on a localized first instance of aspect-term in the sequence. The iterative approach is only required at inference time as during training the data is prepared (see Table 1) in such a way that for multi-aspect inputs one instance of aspect is allowed at a time whereas all other aspects are masked during training. Another situation is when none of the multiple-aspect is masked, and this instance the model learns to extract only the first occurrence of the aspect-term span. To predict the start and end token in the sequence, we first compute the logits start confidence scores $q^{s t} \in R^{s}$ and logit end confidence score $q^{s t} \in R^{s}$ via separate linear transformations. These scores are converted to probabilities distribution over the tokens for start $p^{s t} \in R^{s}$ and end $p^{s t} \in R^{s}$ probabilities as shown in equations (1) and (2).

q^{s t} = G e l u (W_{s t} H^{T}), p^{s t} = s o f t m a x (q^{s t})

(1)

q^{e n} = G e l u (W_{e n} H^{T}), p^{e n} = s o f t m a x (q^{e n})

(2)

Where

W_{s t} \in R^{h}

and

W_{e n} \in R^{h}

are trainable weights vectors for start and end tokens prediction respectively. GELU is the Gaussian Error Linear Unit activation function for standard Gaussian cumulative distribution function.

Consequently, 2 one-hot vectors can be computed for the start $y^{s t} \in R^{s}$ and end $y^{e n} \in R^{s}$ of first occurrence of aspect in each input sequence. Then, we define the training objective loss $L_{s p a n}$ as the sum of the negative log probabilities of the true start and end positions on two predicted probabilities as:

L_{s p a n} = - \sum_{i}^{k} (\sum_{j = 1}^{s} y_{i}^{s t} \log (p_{i}^{s t}) + \sum_{j = 1}^{s} y_{i}^{e n} \log (p_{i}^{e n}))

(3)

Where s is the sequence length.

Aspect Term Sentiment Polarity Classification

The sentiment polarity detection here is interested in determining sentiment polarity relevant to the aspect-term in question not the entire input text. Therefore, we introduced a self-attention computation to enable the model pays attention to the only part of the sequence associated with aspect-term of interest. This is done by computing single-head cross attention scores between the average of $q^{s t}$ and $q^{s t}$ and the contextualized output sequence vector H (see Figure 1). This average span score vector serves as the query, Q to the contextual vector H which is linearly projected to create keys K and values V vectors as follows:

V = W_{v} H^{T}, K = W_{k} H^{T}, K = W_{v} (\frac{(q^{s t} + q^{e n})}{2})

(4)

H_{a t t} = S o f t M a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(5)

Where

W_{v} \in R^{h \times h}

W_{k} \in R^{h \times h}

and

W_{Q} \in R^{h}

are trainable weight vectors.

q^{s} = W_{s} H_{a t t}, p^{s} = s o f t m a x (q^{s})

(6)

Then, training objective is to minimize the loss function $L_{s}$ as the sum of the categorical cross entropy between the true aspect-term polarity, $y^{s}$ and the predicted label $p^{s}$ .

L_{s} = - \sum_{i}^{k} \sum_{j = 1}^{l} y_{i}^{s} \log (p_{i}^{s})

(7)

Where k is the number of sentiment classes.

Entity-Aspect Entailment Classification

Given a set of predefined entities, the task is to determine if a review or aspect-hypothesis is referring to one of the predefined entities. Usually, such downstream task is done by pooling the contextualized output classification vector $h_{[C L S]}$ and then is linearly projected to obtain binary confidence scores as weather the two segments are entailed or not (Devlin et al., 2019; Joshi et al., 2020). In a similar way, to compute entailments confidence scores, $q^{e}$ between entity-premises and aspect-hypothesis segments we first applied 10%-layer dropout on the vector $h_{[C L S]}$ to suppress overfitting. Then a linear transformation is applied with a trainable weight vector $W_{e} \in R^{h}$ and then the scores are passed to sigmoid activation function to generate normalized binary entailment probability $p^{e}$ , as depicted in Figure 1 and equation (8).

q^{e} = W_{e} D r o p O u t (H), p^{e} = s i g m o i d (q^{e})

(8)

The training objective is to minimize binary cross entropy $L_{e}$ , between the predicted binary probability $p^{e}$ and the true entailment values $y^{e}$ for all the samples in the training data as:

L_{e} = - \sum_{i = 1}^{k} \sum_{i = 1}^{n} y_{i}^{e} \log (p_{i}^{e})

(9)

Where k is total number of training input sequence in the training data.

Multitask Training

For the ternary tasks, the overall training objective is to maximize the likelihood for s training targets $T_{j} = (σ_{e}, σ_{s}, σ_{p})$ given an input text sequence $X_{j}$ and a set of training data set $D = {T_{j}, X_{j}}$ as described in (10).

L (D) = \prod_{j = 1}^{| D |} \prod_{(σ_{e}, σ_{a}, σ_{p}) \in T_{j}} P ((σ_{e}, σ_{s}, σ_{p}) / X_{j})

(10)

The log-likelihood of $X_{j}$ is captured by (11)

\begin{aligned} L (X_{j}) = & \sum_{(σ_{e}, σ_{s}, σ_{p}) \in T_{j}} P ((σ_{e}, σ_{s}, σ_{p}) / X_{j}) \\ = & \sum_{σ_{e} \in T_{j}} \sum_{(σ_{a}, σ_{p}) \in T_{j} / σ_{e}} \log P (σ_{e} / X_{j}) + \log P ((σ_{a}, σ_{p}) / σ_{a}, X_{j}) \\ = & \sum_{σ_{e} \in T_{j}} (\sum_{(σ_{a}, σ_{p}) \in T_{j} / σ_{e}} \log P (σ_{e} / X_{j})) \\ + \sum_{σ_{e} \in T_{j}} (\sum_{(σ_{a}, σ_{p}) \in T_{j} / σ_{e}} \log P (σ_{a} / σ_{p}, X_{j}) + \log P (σ_{p} / σ_{a}, X_{j})) \end{aligned}

(11)

Since the sentence entity

σ_{e}

is conditionally independent given

σ_{a}

and

σ_{p}

and the sentence

X_{j}

, equation (11) can be rewritten as:

\begin{aligned} L (X_{j}) = & \sum_{σ_{e} \in T_{j}} | T_{j} / σ_{e} | \log P (σ_{e} / X_{j}) \\ + \sum_{σ_{e} \in T_{j}} (| T_{j} / σ_{e} | \log P (σ_{s} / σ_{e}, X_{j}) + \sum_{σ_{p} \in T_{j} / σ_{e}} | T_{j} / σ_{e} | \log P (σ_{p} / σ_{e}, X_{j})) \end{aligned}

(12)

\begin{aligned} L (D) = & β \sum_{j = 1}^{| D |} \sum_{σ_{e} \in T_{j}} (\sum_{σ_{e} \in T_{j}} \log P (σ_{e} / X_{j})) \\ + γ \sum_{j = 1}^{| D |} \sum_{j σ_{e} \in T_{j}} (\sum_{σ_{a} \in T_{j} / σ_{e}} \log P (σ_{a} / σ_{e}, X_{j})) \\ + δ \sum_{j = 1}^{| D |} \sum_{σ_{e} \in T_{j}} \log P (σ_{p} / σ_{e}, X_{j}) \end{aligned}

(13)

Where

β, γ a n d δ \in [0, 1]

are optimization weights which prioritize the importance of a task with respect to others. An algorithm table is provided below to explain the logical implementation of the proposed Aspect-term sentiment analysis.

Algorithm 1.

Implementation of Aspect-term sentiment analysis

Input: text data,

T

Output : Tuple

(E_{i}, A_{i}, P_{i})

where:

•

E_{i}

: Entity

•

A_{i}

: Aspect-term

•

P_{i}

: Polarity of the aspect

While

A_{i} \neq N o n e

X \leftarrow c o m p u t e i n p u t v e c t e r s e q u e n c e f r o m T

H \leftarrow c o m p u t e c o n t e x t u a l i z e d v e c t o r s f r o m X

A_{i} \leftarrow E x t r a c t A s p e c t - t e r m w i t h A T S E

A_{i} \neq [C L S]

token

P_{i} \leftarrow c l a s s i f y A_{i} p o l a r i t y w i t h A S P

T_{m a s k} \leftarrow m a s k o u t t h e d e t e c t e d a s p e c t i n T

T = T_{m a s k} \leftarrow u p d a t e T m a s k

else

return

(E_{i}, A_{i}, P_{i})

end while

Experiments

This section present series of results obtained from the experiments to validate the relevance and efficacy of the proposed system. firstly, experiments were conducted based on the setup in Figure 1 to train and fine-tune the three different spanBERT baseline configurations. At inference time, the setup is run at three different modes to accomplish the three tasks of ATSA, ACSA and OTESA. Results from the proposed system and comparison were made to demonstrate the level of improvement realized. To further probe the effectiveness of the system, we conducted additional experiments for ablation study to closely monitor the effect or contributions of important components of the proposed systems. Lastly, we present some discussion with visualization on the vector's perplexity and other relevant parameters.

Training Procedures

All experiments were conducted on a NVIDI T4 GPU processor which has 320 Turing Tensor cores and GPU memory of 16GB and additional system memory of 16GB. We deployed the same optimizer Adaptive moment (Adam) throughout the experiments. The optimizer performs both the regularization on the training batches and prediction loss minimization. The regularization is done based on weight decay and not moments.

In contrast to the original BERT implementation, the training approach differs in several key aspects. Firstly, while BERT randomly selects 10 masks for each sequence during data processing, we utilize unique masks at each epoch. Secondly, we opt out of employing short-sequence strategies, such as sampling shorter sequences with a low probability (0.1) and pre-training with smaller sequence lengths (128 tokens for 90% of training steps). Instead, throughout the experiments, sequences of up to 512 tokens was utilized until reaching a document boundary. Similar to BERT, we implement a learning rate warm-up over the initial 1,000 steps, reaching a peak value of 1e-4, followed by linear decay. We maintain hyperparameters (B1 = 0.9, B2 = 0.999) and adopt decoupled weight decay set at 0.1. Additionally, we retain a dropout rate of 0.1 across all layers and attention weights, alongside a GeLU activation function. Notably, we diverge from the optimization approach by extending training to 10,000 thousand steps and utilizing an epsilon value of 1e-8 for AdamW, resulting in superior convergence to an optimal set of model parameters. Our implementation employs a batch size of 256 sequences, ensuring a maximum of 512 tokens per sequence.

Dataset

The datasets used in the experiment come from the workshop on semantic evaluation (SemEval-2016) Task 5 held in San Diego, California 2016 (Pontiki et al., 2016). The original dataset consists of 39 (19 for training and 20 for testing) different datasets with total of 70,790 manually annotated ABSA labels formed from 8 languages and 7 different domains which include: Restaurant, laptops, hotels, mobile phones, digital cameras, museums and telecommunication. Except for telecommunication domain, which is formed from customers tweets, all other domains consist of reviews from the customers (Pontiki et al., 2016).

Datasets from Restaurant and laptop subdomains were available in English and therefore were used in this experiment as a baseline to create an extended version that suits the proposed system propositions for training. The SemEval-2016 laptops dataset consists of 2500 unique sentence reviews annotated with 2923 {E#A, polarity} tuples and the restaurants dataset consist of 2000 unique sentences reviews annotated with 2499 {E#A, OTE, polarity} tuples. The sentiment annotation consists of four classes: positive, negative, neutral and conflict. The conflict class is when there is contrasting sentiment towards an aspect in the review. We extended these datasets by performing three actions (see Figure 1) on each sample thus: (i) keeping the original sample unaltered (ii) creating copies of the original sample equals to the number of aspects in the sample and the for each copy leave out one aspect term unaltered while replacing all other aspects with special token (iii) creating a copy the sample in which all aspect termed are replaced with [MASK] token. Additionally, each review sample is further made into two sample by pairing with entity term from a matching and mismatching domain to create entity- aspect hypothesis paring conditions. In total, 8346 and 6998 training data was formed from restaurant and laptop domain respectively.

Evaluation Metrics

To evaluate the performance of the proposed model, four standard evaluation metrics, i.e., accuracy (A), precision (P), recall (R), and F1 score (F1), are utilized. The accuracy metric is used to evaluate ATSE for span-extraction and Entity-Aspect Entailment (EAE) whereas the remaining metrics evaluates the Aspect sentiment polarity (ASP). For AOP and ASTE, the number of predicted pairs and triplets is compared to the actual number in the given dataset. Details about the evaluation metrics can be found in references (Augustyniak et al., 2021; Zhang et al., 2023).

Aspect-Term Sentiment Analysis (ATSA)

In ATSA the challenge is given an opinionated user's text or review about a target entity (e.g., Restaurants, mobile phone, manufacture etc.) is to extract the various aspects or attributes (e.g., battery, screen, software for entity Mobile phone) of the entity discussed by the user in the text and the polarity of the sentiments expressed towards those aspects. To accomplish the task, ATSE and ASP are utilized. For multi-aspect inputs, at inference the input sentence is run iteratively each time returning an aspect and then replace the detected aspect with special token [MASK] and then feedback the masked sentence to the system until a special token [CLS] is returned as the chosen aspect which indicate complete exhaustion of the Aspect-term. Each time an aspect is detected its polarity is predicted by the polarity classifier. Figures 3 and 4, depicts how a multi-aspect-input sentence “[CLS] laptop [SEP] great design but it has low battery capacity. [SEP]” is run at inference. The first iteration in Figure 2 correctly predicted the start and end position of the first occurrence of the terms {“design”} by scoring them high by ATSE. In the second iteration Figure 4, after replacing the aspect term with [MASK] the next aspect-term {“battery”} is also correctly detected by the ATSE. This last iteration no included scores [CLS] indicating the absence of aspect-term

Figure 3.

Aspect Term Span Extraction Scores for Detecting the Start and end Point Position of an Aspect Term from a Multi-Aspect-Input Sentence “[CLS] Laptop [SEP] Great Design but it has low Battery Capacity. [SEP]” During the First Iteration.

Figure 4.

Aspect Term Span Extraction Scores for Detecting the Start and End Point Position of an Aspect Term from a Multi-aspect-input Sentence “[CLS] Laptop [SEP] Great [MASK] but it has Low Battery Capacity. [SEP]” During the Second Iteration.

At each iteration when an aspect is detected, ASP computes sentiment polarity associated with that aspect via cross-attention as its input. Figure 5 shows a heatmap which gives the average scores for the multi-aspect-input sentence “[CLS] laptop [SEP] great design but it has low battery capacity. [SEP]” with darker blue regions indicating higher attention scores which clearly associated the sentiment indicators with the extracted aspect-term i.e., {“great design”} and {“low battery capacity”}. This indicates the selective and associative capacity of the proposed model enabling it to make the right call for ATSA task.

Figure 5.

Heatmap of the Cross-Attention Layers Scores. It is the Average Scores of the Multi-Aspect-Input Sentence “[CLS] Laptop [SEP] Great Design but it has low Battery Capacity. [SEP]” with Darker Blue Regions Indicating Higher Attention Scores.

Table 2 presents the scores from ATSE (accuracy) and ASP (precision, recall and f1). For the sentiment polarity all the four categories in the dataset (positive, negative, neutral and conflict) are used.

Table 2.

ATSA Results from ATSE and ASP of the Proposed Method.

	ATSA for Laptop 14				ATSA for Restaurant 15
	Span (%)	Precision (P)	Recall (R)	F-1	Span (%)	Precision (P)	Recall (R)	F-1
$s p a n B E R T_{s m a l l}$	93.40	91.97	93.45	92.70	84.40	87.11	81.32	84.11
$s p a n B E R T_{b a s e}$	94.90	92.81	97.10	94.90	86.67	88.90	85.00	86.90
$a p a n B E R T_{l a r g e}$	95.30	93.31	97.30	95.30	86.80	89.00	85.13	87.02

Aspect-Category Sentiment Analysis (ACSA)

In the context of SemEval task 5, ACSA if to extract a pairing of entity, E = {LAPTOP, KEYBOARD, CUSTOMER SUPPORT, RESTAURANT, FOOD} and attribute types A = {USABILITY, QUALITY, PRICE} discussed in a user text. The objective is to extract the correct paring and perform sentiment polarity identification on the correct Entity-attribute pairs. At inference, the entity segment of the input text iteratively changes for a given number of predefined entities and each time using EAE classifier to predict if the entity matches the aspect-hypothesis segment (user input). As soon as a matching is detected, the system proceeds to ATSA mode to determine the sentiment polarity of the pair in the input. If the entailment is false or is not in the predefined Entity-Attribute set, the paring is disregarded and discarded and hence no polarity computation by ATSA is considered. Table 3 provide results from experiment.

Table 3.

Entailment Results from EAE Classifier and ASP of the Proposed Method.

	Laptop 14				Restaurant 15
	EAE (%)	Precision (P)	Recall (R)	F-1	EAE (%)	Precision (P)	Recall (R)	F-1
$s p a n B E R T_{s m a l l}$	98.40	90.45	93.00	91.71	98.95	86.00	80.11	83.21
$s p a n B E R T_{b a s e}$	99.20	91.90	96.30	93.79	99.67	87.40	84.70	85.74
$a p a n B E R T_{l a r g e}$	99.50	92.11	96.55	94.28	99.80	89.00	84.55	86.12

Opinion Target Expression Sentiment Analysis (OTESA)

The OTESA task is to extract all the different parts of the text which discuss a predetermined entity-aspect category pair from the user's review and then combine these extracted opinions to predict the overall polarity of a particular entity-aspect category. To utilize our model for this task, we first run the system for ACSA task and record all the correct matched entity-aspect pairs and their corresponding polarities. For matched entity-aspect pairs which appears more than ones in a text, the overall polarity is computed by a summing the individual scores for each pair's sentiment polarity. Positive sentiment is assigned +1 score, Negative sentiment is assigned −1 score and neutral and conflict both are assigned 0. Therefore, final positive scores are interpreted as positive sentiment, negative scores as negative sentiment and zero score are interpreted as either being neutral or conflict. It is worth noting that actual span of the text which talks about the entity-aspect is not used since the system is only trained to extract aspect alone without its surrounding context, but the sentiment polarity classifier encodes this local context via the attention mechanism hence it's relied upon to make the OTESA. Table 4 below provides some results.

Table 4.

EAE and OTESA Results.

	Laptop 14				Restaurant 15
	EAE (%)	Precision (P)	Recall (R)	F-1	EAE (%)	Precision (P)	Recall (R)	F-1
$s p a n B E R T_{s m a l l}$	98.40	89.85	92.50	90.21	98.95	84.00	80.00	81.25
$s p a n B E R T_{b a s e}$	99.20	91.11	95.83	91.99	99.67	85.40	84.00	83.89
$a p a n B E R T_{l a r g e}$	99.50	91.02	94.23	92.98	99.80	87.40	83.15	84.72

Ablation and Comparison

To have a clear understanding of the influence of the different aspect of the proposed model, we conducted the experiment under different architecture and model setups to investigate the influence of the introduction of cross-attention (Ca) layer and multi-learning (Ml) objective and iterative multi-aspect extraction (It). Firstly, we use spanBERT as baseline for all the investigation for the three tasks. Each task (ATSA, ACSA or OTESA) is investigated under four conditions cases: in case one, both iterative approach with cross attention layer were used to conduct the experiment. In the second case 2, we retained the cross-attention layer but conducted the experiment with non-iterative approach. In the third case 3 is without cross-attention layer but in iterative mode and lastly in the fourth case 4 neither cross*attention layer nor iterative approach is utilized. Results from these setups for the three subcategories of ABSA are presented in Table 5.

Table 5.

Ablation Results from the Proposed Method.

	Cases	Laptop 14				Restaurant 15
	Cases	Span (%)	Precision (P)	Recall (R)	F-1	Span (%)	Precision (P)	Recall (R)	F-1
ATSA	1	91.30	93.25	96.20	94.70	93.55	96.00	90.40	93.11
	2	89.70	89.57	90.75	90.86	89.30	87.53	88.91	90.13
	3	86.70	90.35	91.10	88.45	89.90	91.90	86.25	86.55
	4	86.75	82.80	86.20	81.55	80.60	83.00	81.85	83.85
ACSA	1	93.73	94.75	97.25	95.98	95.89	97.40	92.90	95.01
	2	92.20	91.25	94.93	90.25	92.52	95.10	89.30	88.18
	3	86.10	85.70	85.20	86.30	85.60	86.20	81.00	84.00
	4	83.45	81.20	83.05	82.35	82.80	78.60	79.75	81.75
OTESA	1	86.10	85.70	85.20	85.45	85.60	86.20	81.00	83.52
	2	84.20	77.60	82.50	82.50	79.60	83.20	71.40	75.00
	3	82.30	77.60	76.10	81.70	79.70	77.35	78.55	79.70
	4	83.45	81.20	83.05	82.35	82.80	78.60	79.75	81.75

It can be deduced that across all the ABSA subcategories and with almost all the metrics, the best performance is achieved when both the cross-attention layer and iterative muti-aspect detection was used. A significant improvement of over 6% has been recorded. When the iterative multi-aspect approach is not used in the second case 2, the gain performance equally drops across most of the cases though with less efficacy compared to the exclusion of cross-attention layer in case 3 and 4. This underlines the level influence the cross-attention layer exerts in overall system performance. Without the cross-attention layers, the model has performed poorly mainly due to its inability to properly associate segments of the input text with the right aspect-term. similarly, the iterative approach for multi-aspect inputs reduces the model complexity in training and enhances it performance as a result thereby making the aspect-term detection more trivial to the model. This effect can be visibly seen in table in Table 5 where on average 1% improvement is recorded with the iterative multi-aspect extraction techniques.

Despite the significant improvement of the proposed method, OTESA subcategories presents a more challenging prospect where the performance is far lagging behind the other two subtasks. This challenge may not be unconnected to the fact that after the extraction and polarity determination of the entity-aspects pair in multi-aspect text, the overall polarity has to be deduced from the individual entity-aspect polarities which is prone to additional performance loss.

To provide comparison with the state-of-the-art methods, Table 6 presents the average scores on three tasks (ATSA, ACSA and OTESA) from several approaches using different techniques including transformer-based approach (using Bert architecture) and Recurrent Deep neural Networks approaches (e.g., LSTM and CNN-LSTM). The approaches include DECNN (Xu et al., 2018), ALSTM (Wang et al., 2016), DECNN-dTrans (He et al., 2018), ALSTM (He et al., 2018), DREGCN (Liang et al., 2020), HGGNN (Liu et al., 2020), IMN, IMN-BERT (He et al., 2019), INABSA (Li et al., 2019b), E2E-ABSA, E2E-ABSA-BERT (Wang et al., 2021). In addition, a multi view prompting techniques, MvP proposed (He et al., 2018) has also some promises recently and therefore included in the comparison (Ashir & Taha, 2024).

Table 6.

Results of Comparison with Other Methods.

	Laptop 14			Restaurant 15
	Precision (P)	Recall (R)	F-1	Precision (P)	Recall (R)	F-1
CMLA-ALSTM	76.80	70.25	53.68	68.55	81.03	54.79
CMLA-dTrans	76.80	72.38	55.56	68.55	82.27	56.09
IMN	78.46	73.21	57.66	69.80	83.38	57.91
DREGCN	77.78	77.18	59.66	69.36	86.03	59.71
HGGNN	80.12	74.51	59.61	70.73	82.53	58.37
DECNN-ALSTM	78.38	70.46	55.05	68.32	80.32	55.10
DECNN-dTrans	78.38	73.10	56.60	68.32	82.65	56.28
INABSA	77.34	72.30	55.88	69.40	82.56	57.38
MNN	76.94	70.40	53.80	70.24	80.79	56.57
E2E-ABSA	80.83	74.71	60.42	70.86	84.51	59.92
IMN-BERT	78.47	77.18	60.34	72.55	84.37	60.76
SPAN-BERT	80.48	77.60	61.78	69.54	82.79	58.58
E2E-ABSA -BERT	82.03	75.80	63.69	70.93	84.17	60.68
MvP	65.30	76.30	69.44	73.10	63.44	72.76
Proposed	89.930	87.25	88.57	84.00	90.40	87.08

Conclusion

In the research, a unified framework has been proposed to solve the three major subtasks of aspect-based sentiment analysis. spanBERT has been used as the baseline language model for context understanding of the downstream tasks. The major contributions have been the utilization of cross-attention layer for aspect association, an iterative multi-aspect extraction and the unified multitasking training approach for both training and inference. Ablation results showed that cross-attention layers have positively impacted the results without which the model performance decreases mainly due to its lack of proper association of the segments of the input to the right aspect-term. similarly, the iterative approach for multi-aspect inputs reduces the model complexity in training and enhances it performance as a result. In contrast to many ABSA models in the literature, our approach provided an integrated framework where all the subtasks can be run from one pipeline. Similarly, the method deployed to preprocess the dataset to conform to the iterative multi-aspect term extraction has been effective. Experimental results and comparison results using other approaches validated the approach as superior performances were obtained, outperforming most of the compared research using Restaurant and Laptop datasets.

Footnotes

ORCID iDs

Abubakar M Ashir

Mohammed Abdulghani Taha

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Ashir

A. M.

Taha

M. A.

(2024). BERT with an augmented cross-attention decoder (BERT-ACD) for binary and fine-grained multiband sentiment detection. Intelligenza Artificiale, 19(1), 3–16. https://doi.org/10.1177/17248035241291687

Augustyniak

Kajdanowicz

Kazienko

(2021). Comprehensive analysis of aspect term extraction methods using various text embeddings. Comput Speech Lang, 69(1). https://doi.org/10.1016/j.csl.2021.101217

Bahdanau

Cho

Bengio

(2016). Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473.

Bollegala

Weir

Carroll

(2013). Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans Knowl Data Eng, 25(8), 1719–1731. https://doi.org/10.1109/TKDE.2012.103

Bordoloi

Biswas

S. K.

(2023). Sentiment analysis: A survey on design framework, applications and future scopes. Artif Intell Rev. https://doi.org/10.1007/s10462-023-10442-2

Caruana

(1997). Multitask learning. Mach Learn, 28(1), 41–75. https://doi.org/10.1023/A:1007379606734

Chen

Sun

Bing

Yang

(2017). Recurrent attention network on memory for aspect sentiment analysis. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 452–461). Association for Computational Linguistics. https://doi.org/10.18653/v1/D17-1047

Devlin

Chang

M.-W.

Lee

Google

K. T.

Language

A. I.

(2019). BERT: Pre-training of deep bidirectional transformers for language understanding. https://doi.org/10.18653/v1/N19-1423

Gao

Glowacka

(2016). Deep gate recurrent neural network. In Durrant

R. J.

Kim

K.-E.

(Eds.), Proceedings of the 8th Asian conference on machine learning, Proceedings of Machine Learning Research, vol. 63. The University of Waikato (pp. 350–365). PMLR. [Online]. Available: https://proceedings.mlr.press/v63/gao30.html

10.

Gou

Guo

Yang

(2023). MvP: Multi-view prompting improves aspect sentiment tuple prediction.

11.

Lee

W. S.

H. T.

Dahlmeier

(2018). Exploiting document knowledge for aspect-level sentiment classification. In Gurevych

Miyao

(Eds.), Proceedings of the 56th annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 579–585). Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-2092

12.

Lee

W. S.

H. T.

Dahlmeier

(2019). An interactive multi-task learning network for End-to-End aspect-based sentiment analysis. In Korhonen

Traum

Màrquez

(Eds.), Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 504–515). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1048

13.

Hochreiter

Schmidhuber

(1997). Long short-term memory. Neural Comput., 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

14.

Peng

Huang

(2019). Open-domain targeted sentiment analysis via span-based extraction and classification. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 537–546). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1051

15.

Joshi

Chen

Liu

Weld

D. S.

Zettlemoyer

Levy

(2020). SpanBERT: Improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist, 8, 64–77. https://doi.org/10.1162/tacl_a_00300

16.

Lee

Kwiatkowski

Parikh

A. P.

Das

(2016) Learning recurrent span representations for extractive question answering. CoRR, abs/1611.01436. https://doi.org/10.48550/arXiv.1611.01436

17.

Bing

Lam

(2019a). A unified model for opinion target extraction and target sentiment prediction. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, in AAAI’19/IAAI’19/EAAI’19. AAAI Press. https://doi.org/10.1609/aaai.v33i01.33016714

18.

Bing

Lam

(2019b). A unified model for opinion target extraction and target sentiment prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6714–6721. https://doi.org/10.1609/aaai.v33i01.33016714

19.

Bing

Zhang

Lam

(2019c). Exploiting BERT for end-to-end aspect-based sentiment analysis. Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019) (pp. 34–41). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-5505

20.

Y.-M.

T.-Y.

(2013). Deriving market intelligence from microblogs. Decis Support Syst, 55(1), 206–217. https://doi.org/10.1016/j.dss.2013.01.023

21.

(2017). Learning latent sentiment scopes for entity-level sentiment analysis. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, in AAAI’17 (pp. 3482–3489). AAAI Press.

22.

Xiao

(2020). Syrapropa at SemEval-2020 task 11: BERT-based models design for propagandistic technique and span detection. Proceedings of the Fourteenth Workshop on Semantic Evaluation (pp. 1808–1816). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.237

23.

Xie

Chen

Wang

Deng

(2014). News impact on stock price return via sentiment analysis. Knowl Based Syst, 69, 14–23. https://doi.org/10.1016/j.knosys.2014.04.022

24.

Liang

Meng

Zhang

Chen

Zhou

(2020) A dependency syntactic knowledge augmented interactive architecture for end-to-end aspect-based sentiment analysis.

25.

Liu

Sun

(2020). Jointly modeling aspect and sentiment with dynamic heterogeneous graph neural networks.

26.

Liu

Zhang

(2012). A survey of opinion mining and sentiment analysis. In Aggarwal

Zhai

(Ed.), Mining text data (pp. 415–463). Springer US. https://doi.org/10.1007/978-1-4614-3223-4_13

27.

Mao

Shen

Yang

Zhu

Cai

(2022). Seq2Path: Generating sentiment tuples as paths of a tree. In Muresan

Nakov

Villavicencio

(Eds.), Findings of the association for computational linguistics: ACL 2022 (pp. 2215–2225). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.findings-acl.174

28.

Nguyen

M. L.

(2018). A deep neural architecture for sentence-level sentiment classification in twitter social networking. In Hasida

(Eds.), Computational linguistics (pp. 15–27). Springer Singapore.

29.

Pang

Lee

Vaithyanathan

(2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002) (pp. 79–86). Association for Computational Linguistics. https://doi.org/10.3115/1118693.1118704

30.

Peters

M. E.

, Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyeret, L. (2018). Deep contextualized word representations. https://doi.org/10.48550/arXiv.1802.05365

31.

Pontiki

, et al. (2016). SemEval-2016 task 5: Aspect based sentiment analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 19–30). Association for Computational Linguistics. https://doi.org/10.18653/v1/S16-1002

32.

Quan

Ren

(2014). Unsupervised product feature extraction for feature-oriented opinion determination. Inf Sci (N Y), 272, 16–28. https://doi.org/10.1016/j.ins.2014.02.063

33.

Rajpurkar

Zhang

Lopyrev

Liang

(2016). SQuAD: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383–2392). Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1264

34.

Rei

(2017). Semi-supervised multitask learning for sequence labeling. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2121–2130). Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-1194

35.

Seo

M. J.

Kembhavi

Farhadi

Hajishirzi

(2017). Bidirectional attention flow for machine comprehension. 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net. [Online]. Available: https://openreview.net/forum?id=HJ0UKP9ge

36.

Søgaard

Goldberg

(2016). Deep multi-task learning with low level tasks supervised at lower layers. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 231–235). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-2038

37.

Tang

Qin

Feng

Liu

(2016). Effective LSTMs for target-dependent sentiment classification. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 3298–3307). The COLING 2016 Organizing Committee. [Online]. Available: https://aclanthology.org/C16-1311

38.

Vaswani

, et al. (June 2017) Attention Is All You Need. [Online]. Available: http://arxiv.org/abs/1706.03762

39.

Wang

Huang

Zhu

Zhao

(2016). Attention-based LSTM for aspect-level sentiment classification. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 606–615). Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1058

40.

Wang

Jiang

(2017). Machine comprehension using match-LSTM and answer pointer. 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net. [Online]. Available: https://openreview.net/forum?id=B1-q5Pqxl

41.

Wang

Zhang

Jin

Sun

(2021). End-to-end aspect-based sentiment analysis with hierarchical multi-task learning. Neurocomputing, 455, 178–188. https://doi.org/10.1016/j.neucom.2021.03.100

42.

Wilson

Wiebe

Hoffmann

(2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Mooney

Brew

Chien

L.-F.

Kirchhoff

(Eds.), Proceedings of human language technology conference and conference on empirical methods in natural language processing (pp. 347–354). Association for Computational Linguistics. [Online]. Available: https://aclanthology.org/H05-1044

43.

Liu

Shu

P. S.

(2018) Double embeddings and CNN-based sequence labeling for aspect extraction. In Gurevych

Miyao

(Eds.), Proceedings of the 56th annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 592–598). Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-2094

44.

Yang

Dai

Yang

Carbonell

Salakhutdinov

Q. V.

(2019). XLNet: Generalized autoregressive pretraining for language understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc..

45.

Yang

Salakhutdinov

Cohen

W. W.

(2017). Transfer learning for sequence tagging with hierarchical recurrent networks.

46.

Zhang

Feng

Meng

You

Liu

(2019). Bridging the gap between training and inference for neural machine translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4334–4343). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1426

47.

Zhang

Deng

Bing

Lam

(2023). A survey on aspect-based sentiment analysis: Tasks, methods, and challenges. IEEE Trans Knowl Data Eng, 35(11), 11019–11038. https://doi.org/10.1109/TKDE.2022.3230975

48.

Zhang

D.-T.

(2016). Gated neural networks for targeted sentiment analysis. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, in AAAI’16 (pp. 3087–3093). AAAI Press.