Abstract
Aspect-based sentiment analysis (ABSA) is a fine-grained analysis of sentiments and opinions expressed in text document which set aside the naïve assumption that a document expresses opinion about a single topic or aspect. The main objective of ABSA is to spot, extract and identify the polarity of different entities and aspects in an opinionated document. Based on the previous works, ABSA can be categorized into three subtasks: Aspect-category sentiment analysis (ACSA), Opinion Target Expression Sentiment Analysis (OTESA) and Aspect-term sentiment analysis (ATSA). This research presents an end-to-end multi-task approach to performing the three categories of ABSA on a single pipeline. A ternary multitask learning objectives classifiers were built on top of the baseline spanBERT language model which was originally pretrained for span extraction. The input to the model consists of two merged segments of entity premises and context data hypothesis in a similar passion to reading comprehension downstream task in natural language processing. The ternary downstream tasks were built on the contextualized output embeddings of pretrained spanBERT entangled with cross-layer attention mechanism to associate context with the aspect-term span extraction, aspect sentiment polarity detection and entity-aspect entailment. A span masking approach was also proposed to address multiple-aspects text using an iterative outputs-inputs loopback. The span masking process replaces each word in a previously detected span of text with a special [MASK] character and then feeds back the entire sentence into the encoder input of the model for next run. The technique forces the encoder to look elsewhere for the next span prediction. The loopback span masking terminates when the span classifiers predict a special token [CLS] as the beginning and end of the span signaling the absent of relatable span to be extracted. Experimental results validate the approach as impressive results were obtained outperforming most of the compared research with benchmark ABSA datasets.
Introduction
Aspect-based sentiment analysis (ABSA) is a subset of sentiment analysis that deals with more granular and in-depth analysis of the various aspects of the language and its associated expressed sentiments. In contrast to the traditional sentiment analysis, which naively assumes that the entire text expressed opinion or sentiment on single subject or aspect and therefore merely attempt to classify text as positive, negative, or neutral, ABSA dives deeper to analyze the different and specifics topics, aspects and attributes being discussed in the text and their associated context and then evaluates sentiment on each aspect individually. This granular approach not only provides more detailed insights but also offers valuable information for businesses, marketers, and decision-makers to understand customer opinions more comprehensively and realistically on diverse aspects of their products, services, or policies (Zhang et al., 2023).
However, in spite of its enormous potential, ABSA presents a plethora of challenges for researchers in the field of NLP. One of the primary challenges is the need for accurate aspect extraction. Identifying and extracting the relevant aspects from text, especially in domains with diverse and evolving vocabularies, remains a daunting task. Researchers grapple with developing algorithms and techniques that can effectively capture the subtle nuances of language and accurately identify the aspects under discussion. Another significant challenge lies in sentiment analysis at the aspect level. Analyzing sentiment within the context of specific aspects requires a deeper understanding of language semantics and context. Researchers face the dilemma of developing models that can accurately capture the sentiment polarity expressed towards each aspect while accounting for variations in language usage, cultural nuances, and contextual ambiguity (Bordoloi & Biswas, 2023; Zhang et al., 2023).
Research in sentiment analysis has blossomed in recent times due to humongous availability of online user expressed opinions about products, businesses, services, and policies. Business, organizations, and agencies are taking advantage of the situation by using users’ feedback to better understand the general perception about their products or policies and enable them to make future decisions and policy adjustment. Sentiment analysis has been one of the effective methods which provides polarity analysis to such voluminous opinionated texts. Sentiment analysis models have evolved over the years and are implemented at various granular levels according to the scope and requirements. Generally, four major levels at which sentiment analysis is implemented can be identified (Bordoloi & Biswas, 2023). These levels include the document level (Li & Li, 2013; Pang et al., 2002 sentence or phrase level Nguyen & Nguyen, 2018; Wilson et al., 2005, word level Bollegala et al., 2013; Li et al., 2014, and entity or aspect level Li & Lu, 2017; Quan & Ren, 2014).
The standard sentiment analysis method also known as coarse-grained method, covers the first three levels. The general assumption in the coarse-grained sentiment analysis is that the entire text or document expresses opinion about a single topic and therefore labeled the document as either having negative, positive, or neutral sentiment towards the topic. However, modern businesses and services require fine-grained analysis where for a conglomerate corporation, users might express composite of opinions about different products or topics in the same text. Therefore, entities, and their aspects or attributes need to be analyzed independently for more accurate and informative sentiment polarity detection. This gives an accurate reflection of the sentiments about individual topics or products and equip business owners and policy makers to isolate aspect of the business or policy for further actions. Aspect-based sentiment analysis is the subtask of sentiment analysis which directly deals with the fine-grained level analysis (Liu & Zhang, 2012). It is fundamental for businesses success to listen to their clients, understand what exactly the customer is saying and engage when it is necessary. ABSA approach to sentiment analysis of textual data provides valuable insights via processing of huge online customers’ feedbacks on social media platforms such as Twitter, Facebook and many more, where people regularly post their opinions on all kinds of businesses.
Inspired by the work in (Pontiki et al., 2016), number of researches classified ABSA into two or three categories in accordance with two subtasks describe in (Pontiki et al., 2016). In our opinion and based on the two subtasks, ABSA can be categorized into three subclasses thus: (1) Aspect-Category Sentiment Analysis (ACSA) (2) Opinion Target Expression Sentiment Analysis (OTESA) and (3) Aspect-Term Sentiment Analysis (ATSA). ACSA and OTESA corresponds to the slots 1 and 2 of the subtasks1 with slot 3 (polarity detection) being collapsed into one subclass and ATSA corresponds to the subtask2 as described in (Pontiki et al., 2016). In ATSA the challenge is, given an opinionated user's text or review about a target entity (e.g., Restaurants, mobile phone, manufacture etc.) is to identify/extract the various aspects or attributes (e.g., battery, screen, software for entity Mobile phone) of the entity discussed by the user in the text and the polarity of the sentiments expressed towards those aspects. In ACSA, given a predefined set of entities
The objective is given a user review about an entity-aspect category pair, is to extract all the different parts of the text which discuss a predetermine entity-aspect category pair and the overall polarity of extracted parts. While ATSA uses an extractive approach to identify the aspect-term as it appears in the text, ACSA must have a deeper understanding of semantic relations between the predefined
The three major contribution of this research includes (1) proposing a ununified framework for conducting the three categories of ABSA (i.e., ACSA, OTESA and ATSA) sentiment analysis in a single pipeline using cross-attention mechanism. (2) propose a novel approach to multiple-aspects detection and extraction with associated context in text using exhaustive masking iterative approach. (3) Proposes an approach to automatic generation of ground-truth training data from public ABSA dataset using masking for an effective multiple-aspect detection within a text.
Literature Review
ABSA task is a subcategory of sentiment analysis which poses enormous challenge and requires an approach with enriched linguistic understanding. It is no surprise that most of the research in this field employs natural language processing techniques using time-series deep recurrent neural networks or self-attention-based deep neural networks or a combination of both (Augustyniak et al., 2021). These algorithms have recorded state-of-the art performance outperforming humans in many tasks. There are sizeable number of models that rely on gated recurrent deep neural networks with attention mechanism like long-short-term memory (LSTM) layers (Hochreiter & Schmidhuber, 1997), Gated Recurrent Units (GRU) (Gao & Glowacka, 2016). They extract opinionated information from embedding vectors, and then apply attention mechanisms (Bahdanau et al., 2016) to ensure that the models focus or pay attention to a span of texts relevant to the given entity or aspect of interest. Few amongst these models are Attention-based LSTM with Aspect Embedding (Wang et al., 2016) which was used for ACSA; Target-Dependent Sentiment Classification (Tang et al., 2016), Gated Neural Networks (Zhang et al., 2016) and Recurrent Attention Memory Network (RAM) (Chen et al., 2017) for ATSA. (Gao & Glowacka, 2016) proposes a cross-domain tagging with adversarial training. (Li et al., 2019a) presented a model with a two-layer LSTM, where the lower LSTM is trained for Aspect Term Extraction, ATE and upper LSTM for aspect-polarity unified tagging. The gates in those networks controls what information is to retain as the vector propagates through the network and therefore serves an important memory for short and long dependency within the text. The self-attention mechanism plays a vital role for computing a contextualized vector and performs similarity scores computation between the contextualized vectors and target vectors which is important for context and attention association. The context vectors basically encode both the aspect and sentiment polarity information, and the similarity scores are applied across all feature dimensions irrespective of the distinctive difference between the aspect and sentiment polarity (Wang et al., 2021). One of the fundamental shortcomings of those models (recurrent Neural Network) is the sequential nature where tokens are processed one at a time and these result to huge computational cost and time consumption. As the model size grows the attention layer computation grows exponentially and so as the normalization of all similarity scores of the entire tokens in the sentence (Wang et al., 2016). In addition, some models such as one proposed in (Chen et al., 2017), to produce weighted LSTM, they require positional information between words and targets which could be problematic and unreliable with noisy inputs. More sophisticated LSTM cells and attention mechanisms can guarantee higher accuracy, but it comes with huge computation and memory cost. (Peters et al., 2018) proposed deep contextualized word representations called the model ELMo. This word embedding technique creates vector space using bidirectional LSTMs trained on a language modeling objective.
With popularization of self-attention networks proposed by (Vaswani et al., June 2017), where recurrent operation was abolished to address most of the inefficiency of the recurrent neural networks such as long dependency and propel evaluation of new models in ABSA. Pretrained models such as Bidirectional Encoder Representations from Transformers, BERT (Devlin et al., 2019) which were trained on huge corpus of data for general language understanding. BERT uses self-supervised training method using “Masked Language Modeling, MLM” and “Next Sentence Prediction, NSP”. Another pretrained language model called XLNet (Yang et al., 2019) emerged which aimed at improv BERT by introducing a variant of language modeling called “permutation language modeling”. Instead of predicting masked words independently and in a left to right manner as in BERT, the XLNet model predicts target words based on different orders of source words (no strict left to right order). spanBERT (Joshi et al., 2020) is another pretrained model inspired by BERT, it developed a pretraining method which improve on BERT's two training objectives (MLM and NSP) with span MLS and span Boundary objective. The model used a well-trained BERT and emphasized on extraction on span on text for the task at hand. (Li et al., 2019c) leverages pretrained model BERT for End-to-End-ABSA. Authors in (Søgaard & Goldberg, 2016) design multiple lower-level tasks with task-related knowledge to assist the main task learning. (Li & Xiao, 2020) implemented a for span extraction and propagandist detection in news article by building the model for Span Identification (SI) based on SpanBERT. They developed a hybrid model for the Technique Classification (TC) which composed of three sub-models including two BERT models with different training methods, and a feature-based Logistic Regression model. (Wang et al., 2021) proposed an End-to-End Aspect-based Sentiment Analysis with Hierarchical Multi-task Learning auxiliary tasks and the main task into a hierarchical multi-task learning framework. Multi-task Learning (MTL) enables training of multi-task problem like ABSA to be combined in the same model. MTL is a learning paradigm in machine learning which enables a joint learning of multiple tasks (Caruana, 1997) with aim of improve correlational and general understanding and performance of the model in all related tasks. This type of training has been in various tasks (Rei, 2017; Yang et al., 2017). However, most of these works combine multiple losses applied to the output layer. The idea of multi-task learning with low-level auxiliary tasks is first introduced by (Søgaard & Goldberg, 2016). They find that low-level tasks are better kept at the lower layers, enabling the higher-level tasks to make use of lower-level tasks’ knowledge.
Inspired by the aforementioned works, we proposed a multi-tasking learning approach to ABSA based on a baseline spanBERT pretrained language model capable of handling the three aspects of ABSA in a single pipeline using cross-attention layer. A ternary downstream task was built on the contextualized output embeddings of pretrained spanBERT to perform (1) aspect-term span extraction, (2) aspect sentiment polarity detection and (3) entity-aspect entailment which are combined to the three ABSA subtasks. A span masking approach was also proposed to address multiple-aspects text using an iterative outputs-inputs loopback. The span masking process replaces each word in a previously detected span of text with a special character [MASK] and then feeds back the entire sentence into the encoder input of the model for next run. The technique forces the encoder to look elsewhere for the next span prediction. The loopback span masking terminates when the span classifiers predict a special token [CLS] as the beginning and end of the span which signifying absent of the relatable span to be extracted. To achieve the multiple-aspects extraction, we implemented ground-truth dataset refinement to conform with the proposed iterative approach.
Proposed Method
In this section we present the details of the overall proposed pipeline for performing the three categories of the ABSA task as depicted in Figure 1. The input to the model consists of two merged segments of entity premises and aspect-hypothesis similar to reading comprehension task. These inputs segments are fed into the baseline spanBERT which computes the contextualized output embeddings. The contextualized output embeddings are then used by three pipelines for Entity-Aspect entailment (EAE), Aspect-term Span Extraction (ATSE) and cross-layer attention which together with intermediary output of ATSE compute the Aspect Sentiment Polarity (ASP). Details explanation of the of the various parts are provided in the relevant subsections below.

Proposed Framework Iterative Multi-Aspect Term Extraction and Polarity Detection with spanBERT for ABSA Tasks.
Multitask Input Framework
The input to the model is similar to reading comprehensive question and answering in downstream task as first suggested in (Devlin et al., 2019), where a merged single input is used with segment embedding being added to differentiate premises/question segment from the hypothesis/ paragraph segment. In our case the premises are the entity whereas the context text or the users’ reviews are used as the hypothesis. Given a sequence of entity tokens

Multitask Input Framework for the Proposed System.
Multitask Training Objective and Data Preparation
A ternary learning objective function is proposed to perform three tasks of aspect term extraction
Input Data and Targets Preparation.
Iterative Multi-Targets Aspect-Term Span Extraction
The goal is to extract a contiguous span of tokens from the inputs sequence which represent aspect-term or attribute of a given entity. (Seo et al., 2017; Xu et al., 2018) works in extractive question and answering in reading comprehensive task where the task is to extract a continuous span of text from the document as the answer to a particular question. (Lee et al., 2016; Rajpurkar et al., 2016) studied several techniques of span extraction which include BIO prediction which used sequence tagging concept and, boundary prediction. They concluded that span boundary prediction methods have numerous advantages over the sequence-tagging methods. This observation was revalidated by the work in (Wang & Jiang, 2017) as applied to answer prediction methods. Some of the shortcomings of the above span extraction methods are their limitation to extracting one span within a sequence. (Hu et al., 2019) proposed multi-targets span extraction method to extract multiple opinion targets from a given sequence for ABSA polarity classification. They used the boundary prediction approach for span extraction as suggested in extractive question answering (Seo et al., 2017; Wang & Jiang, 2017; Xu et al., 2018). During training, they generated two separate start and end boundary binary sparse vectors for each sequence. They assigned logical ones to tokens within a sequence that corresponds to the span starts and span ends while assigning zeros values to the rest of sequence. Thereafter, they built two classifiers that minimize the log probabilities between the labels and predictions. At inference, they proposed propose a heuristic multi-span decoding algorithm which takes the top-K indices chosen from the two predicted and candidate span scores with its heuristic regularized.
Our approach to multi-targets span extraction also uses boundary prediction but extracts one span at a time iteratively until all spans are exhausted (which is determined when CLS token is predicted as the span index). The concept forces the model to concentrate on a localized first instance of aspect-term in the sequence. The iterative approach is only required at inference time as during training the data is prepared (see Table 1) in such a way that for multi-aspect inputs one instance of aspect is allowed at a time whereas all other aspects are masked during training. Another situation is when none of the multiple-aspect is masked, and this instance the model learns to extract only the first occurrence of the aspect-term span. To predict the start and end token in the sequence, we first compute the logits start confidence scores
Consequently, 2 one-hot vectors can be computed for the start
Aspect Term Sentiment Polarity Classification
The sentiment polarity detection here is interested in determining sentiment polarity relevant to the aspect-term in question not the entire input text. Therefore, we introduced a self-attention computation to enable the model pays attention to the only part of the sequence associated with aspect-term of interest. This is done by computing single-head cross attention scores between the average of
Then, training objective is to minimize the loss function
Entity-Aspect Entailment Classification
Given a set of predefined entities, the task is to determine if a review or aspect-hypothesis is referring to one of the predefined entities. Usually, such downstream task is done by pooling the contextualized output classification vector
The training objective is to minimize binary cross entropy
Multitask Training
For the ternary tasks, the overall training objective is to maximize the likelihood for s training targets
The log-likelihood of
Implementation of Aspect-term sentiment analysis
Experiments
This section present series of results obtained from the experiments to validate the relevance and efficacy of the proposed system. firstly, experiments were conducted based on the setup in Figure 1 to train and fine-tune the three different spanBERT baseline configurations. At inference time, the setup is run at three different modes to accomplish the three tasks of ATSA, ACSA and OTESA. Results from the proposed system and comparison were made to demonstrate the level of improvement realized. To further probe the effectiveness of the system, we conducted additional experiments for ablation study to closely monitor the effect or contributions of important components of the proposed systems. Lastly, we present some discussion with visualization on the vector's perplexity and other relevant parameters.
Training Procedures
All experiments were conducted on a NVIDI T4 GPU processor which has 320 Turing Tensor cores and GPU memory of 16GB and additional system memory of 16GB. We deployed the same optimizer Adaptive moment (Adam) throughout the experiments. The optimizer performs both the regularization on the training batches and prediction loss minimization. The regularization is done based on weight decay and not moments.
In contrast to the original BERT implementation, the training approach differs in several key aspects. Firstly, while BERT randomly selects 10 masks for each sequence during data processing, we utilize unique masks at each epoch. Secondly, we opt out of employing short-sequence strategies, such as sampling shorter sequences with a low probability (0.1) and pre-training with smaller sequence lengths (128 tokens for 90% of training steps). Instead, throughout the experiments, sequences of up to 512 tokens was utilized until reaching a document boundary. Similar to BERT, we implement a learning rate warm-up over the initial 1,000 steps, reaching a peak value of 1e-4, followed by linear decay. We maintain hyperparameters (B1 = 0.9, B2 = 0.999) and adopt decoupled weight decay set at 0.1. Additionally, we retain a dropout rate of 0.1 across all layers and attention weights, alongside a GeLU activation function. Notably, we diverge from the optimization approach by extending training to 10,000 thousand steps and utilizing an epsilon value of 1e-8 for AdamW, resulting in superior convergence to an optimal set of model parameters. Our implementation employs a batch size of 256 sequences, ensuring a maximum of 512 tokens per sequence.
Dataset
The datasets used in the experiment come from the workshop on semantic evaluation (SemEval-2016) Task 5 held in San Diego, California 2016 (Pontiki et al., 2016). The original dataset consists of 39 (19 for training and 20 for testing) different datasets with total of 70,790 manually annotated ABSA labels formed from 8 languages and 7 different domains which include: Restaurant, laptops, hotels, mobile phones, digital cameras, museums and telecommunication. Except for telecommunication domain, which is formed from customers tweets, all other domains consist of reviews from the customers (Pontiki et al., 2016).
Datasets from Restaurant and laptop subdomains were available in English and therefore were used in this experiment as a baseline to create an extended version that suits the proposed system propositions for training. The SemEval-2016 laptops dataset consists of 2500 unique sentence reviews annotated with 2923 {E#A, polarity} tuples and the restaurants dataset consist of 2000 unique sentences reviews annotated with 2499 {E#A, OTE, polarity} tuples. The sentiment annotation consists of four classes: positive, negative, neutral and conflict. The conflict class is when there is contrasting sentiment towards an aspect in the review. We extended these datasets by performing three actions (see Figure 1) on each sample thus: (i) keeping the original sample unaltered (ii) creating copies of the original sample equals to the number of aspects in the sample and the for each copy leave out one aspect term unaltered while replacing all other aspects with special token (iii) creating a copy the sample in which all aspect termed are replaced with [MASK] token. Additionally, each review sample is further made into two sample by pairing with entity term from a matching and mismatching domain to create entity- aspect hypothesis paring conditions. In total, 8346 and 6998 training data was formed from restaurant and laptop domain respectively.
Evaluation Metrics
To evaluate the performance of the proposed model, four standard evaluation metrics, i.e., accuracy (A), precision (P), recall (R), and F1 score (F1), are utilized. The accuracy metric is used to evaluate ATSE for span-extraction and Entity-Aspect Entailment (EAE) whereas the remaining metrics evaluates the Aspect sentiment polarity (ASP). For AOP and ASTE, the number of predicted pairs and triplets is compared to the actual number in the given dataset. Details about the evaluation metrics can be found in references (Augustyniak et al., 2021; Zhang et al., 2023).
Aspect-Term Sentiment Analysis (ATSA)
In ATSA the challenge is given an opinionated user's text or review about a target entity (e.g., Restaurants, mobile phone, manufacture etc.) is to extract the various aspects or attributes (e.g., battery, screen, software for entity Mobile phone) of the entity discussed by the user in the text and the polarity of the sentiments expressed towards those aspects. To accomplish the task, ATSE and ASP are utilized. For multi-aspect inputs, at inference the input sentence is run iteratively each time returning an aspect and then replace the detected aspect with special token [MASK] and then feedback the masked sentence to the system until a special token [CLS] is returned as the chosen aspect which indicate complete exhaustion of the Aspect-term. Each time an aspect is detected its polarity is predicted by the polarity classifier. Figures 3 and 4, depicts how a multi-aspect-input sentence “[CLS] laptop [SEP] great design but it has low battery capacity. [SEP]” is run at inference. The first iteration in Figure 2 correctly predicted the start and end position of the first occurrence of the terms {“design”} by scoring them high by ATSE. In the second iteration Figure 4, after replacing the aspect term with [MASK] the next aspect-term {“battery”} is also correctly detected by the ATSE. This last iteration no included scores [CLS] indicating the absence of aspect-term

Aspect Term Span Extraction Scores for Detecting the Start and end Point Position of an Aspect Term from a Multi-Aspect-Input Sentence “[CLS] Laptop [SEP] Great Design but it has low Battery Capacity. [SEP]” During the First Iteration.

Aspect Term Span Extraction Scores for Detecting the Start and End Point Position of an Aspect Term from a Multi-aspect-input Sentence “[CLS] Laptop [SEP] Great [MASK] but it has Low Battery Capacity. [SEP]” During the Second Iteration.
At each iteration when an aspect is detected, ASP computes sentiment polarity associated with that aspect via cross-attention as its input. Figure 5 shows a heatmap which gives the average scores for the multi-aspect-input sentence “[CLS] laptop [SEP] great design but it has low battery capacity. [SEP]” with darker blue regions indicating higher attention scores which clearly associated the sentiment indicators with the extracted aspect-term i.e., {“great design”} and {“low battery capacity”}. This indicates the selective and associative capacity of the proposed model enabling it to make the right call for ATSA task.

Heatmap of the Cross-Attention Layers Scores. It is the Average Scores of the Multi-Aspect-Input Sentence “[CLS] Laptop [SEP] Great Design but it has low Battery Capacity. [SEP]” with Darker Blue Regions Indicating Higher Attention Scores.
Table 2 presents the scores from ATSE (accuracy) and ASP (precision, recall and f1). For the sentiment polarity all the four categories in the dataset (positive, negative, neutral and conflict) are used.
ATSA Results from ATSE and ASP of the Proposed Method.
Aspect-Category Sentiment Analysis (ACSA)
In the context of SemEval task 5, ACSA if to extract a pairing of entity, E = {LAPTOP, KEYBOARD, CUSTOMER SUPPORT, RESTAURANT, FOOD} and attribute types A = {USABILITY, QUALITY, PRICE} discussed in a user text. The objective is to extract the correct paring and perform sentiment polarity identification on the correct Entity-attribute pairs. At inference, the entity segment of the input text iteratively changes for a given number of predefined entities and each time using EAE classifier to predict if the entity matches the aspect-hypothesis segment (user input). As soon as a matching is detected, the system proceeds to ATSA mode to determine the sentiment polarity of the pair in the input. If the entailment is false or is not in the predefined Entity-Attribute set, the paring is disregarded and discarded and hence no polarity computation by ATSA is considered. Table 3 provide results from experiment.
Entailment Results from EAE Classifier and ASP of the Proposed Method.
Opinion Target Expression Sentiment Analysis (OTESA)
The OTESA task is to extract all the different parts of the text which discuss a predetermined entity-aspect category pair from the user's review and then combine these extracted opinions to predict the overall polarity of a particular entity-aspect category. To utilize our model for this task, we first run the system for ACSA task and record all the correct matched entity-aspect pairs and their corresponding polarities. For matched entity-aspect pairs which appears more than ones in a text, the overall polarity is computed by a summing the individual scores for each pair's sentiment polarity. Positive sentiment is assigned +1 score, Negative sentiment is assigned −1 score and neutral and conflict both are assigned 0. Therefore, final positive scores are interpreted as positive sentiment, negative scores as negative sentiment and zero score are interpreted as either being neutral or conflict. It is worth noting that actual span of the text which talks about the entity-aspect is not used since the system is only trained to extract aspect alone without its surrounding context, but the sentiment polarity classifier encodes this local context via the attention mechanism hence it's relied upon to make the OTESA. Table 4 below provides some results.
EAE and OTESA Results.
Ablation and Comparison
To have a clear understanding of the influence of the different aspect of the proposed model, we conducted the experiment under different architecture and model setups to investigate the influence of the introduction of cross-attention (Ca) layer and multi-learning (Ml) objective and iterative multi-aspect extraction (It). Firstly, we use spanBERT as baseline for all the investigation for the three tasks. Each task (ATSA, ACSA or OTESA) is investigated under four conditions cases: in case one, both iterative approach with cross attention layer were used to conduct the experiment. In the second case 2, we retained the cross-attention layer but conducted the experiment with non-iterative approach. In the third case 3 is without cross-attention layer but in iterative mode and lastly in the fourth case 4 neither cross*attention layer nor iterative approach is utilized. Results from these setups for the three subcategories of ABSA are presented in Table 5.
Ablation Results from the Proposed Method.
It can be deduced that across all the ABSA subcategories and with almost all the metrics, the best performance is achieved when both the cross-attention layer and iterative muti-aspect detection was used. A significant improvement of over 6% has been recorded. When the iterative multi-aspect approach is not used in the second case 2, the gain performance equally drops across most of the cases though with less efficacy compared to the exclusion of cross-attention layer in case 3 and 4. This underlines the level influence the cross-attention layer exerts in overall system performance. Without the cross-attention layers, the model has performed poorly mainly due to its inability to properly associate segments of the input text with the right aspect-term. similarly, the iterative approach for multi-aspect inputs reduces the model complexity in training and enhances it performance as a result thereby making the aspect-term detection more trivial to the model. This effect can be visibly seen in table in Table 5 where on average 1% improvement is recorded with the iterative multi-aspect extraction techniques.
Despite the significant improvement of the proposed method, OTESA subcategories presents a more challenging prospect where the performance is far lagging behind the other two subtasks. This challenge may not be unconnected to the fact that after the extraction and polarity determination of the entity-aspects pair in multi-aspect text, the overall polarity has to be deduced from the individual entity-aspect polarities which is prone to additional performance loss.
To provide comparison with the state-of-the-art methods, Table 6 presents the average scores on three tasks (ATSA, ACSA and OTESA) from several approaches using different techniques including transformer-based approach (using Bert architecture) and Recurrent Deep neural Networks approaches (e.g., LSTM and CNN-LSTM). The approaches include DECNN (Xu et al., 2018), ALSTM (Wang et al., 2016), DECNN-dTrans (He et al., 2018), ALSTM (He et al., 2018), DREGCN (Liang et al., 2020), HGGNN (Liu et al., 2020), IMN, IMN-BERT (He et al., 2019), INABSA (Li et al., 2019b), E2E-ABSA, E2E-ABSA-BERT (Wang et al., 2021). In addition, a multi view prompting techniques, MvP proposed (He et al., 2018) has also some promises recently and therefore included in the comparison (Ashir & Taha, 2024).
Results of Comparison with Other Methods.
Conclusion
In the research, a unified framework has been proposed to solve the three major subtasks of aspect-based sentiment analysis. spanBERT has been used as the baseline language model for context understanding of the downstream tasks. The major contributions have been the utilization of cross-attention layer for aspect association, an iterative multi-aspect extraction and the unified multitasking training approach for both training and inference. Ablation results showed that cross-attention layers have positively impacted the results without which the model performance decreases mainly due to its lack of proper association of the segments of the input to the right aspect-term. similarly, the iterative approach for multi-aspect inputs reduces the model complexity in training and enhances it performance as a result. In contrast to many ABSA models in the literature, our approach provided an integrated framework where all the subtasks can be run from one pipeline. Similarly, the method deployed to preprocess the dataset to conform to the iterative multi-aspect term extraction has been effective. Experimental results and comparison results using other approaches validated the approach as superior performances were obtained, outperforming most of the compared research using Restaurant and Laptop datasets.
Footnotes
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
