Multi-feature fusion and dual-channel networks for sentiment analysis

Abstract

With the rapid proliferation of substantial textual data from sources such as social media, online comments, and news articles, sentiment analysis has become increasingly crucial. However, existing deep learning methods have overlooked the significance of part-of-speech (POS) and emotional words in understanding the emotion of text. Based on this, this paper proposes a sentiment analysis approach that combines multiple features with a dual-channel network. Firstly, the vector representation of the text is obtained through Robustly Optimized BERT Pretraining Approach (RoBERTa). Secondly, the POS features and word emotional features are separately updated using self-attention to calculate weights. Concatenating words, POS and emotion, feature dimension reduction and fusion are achieved through a linear layer. Finally, the fused feature vector is input into a dual-channel network composed of Bidirectional Gated Recurrent Unit (BiGRU) and Deep Pyramid Convolutional Neural Network (DPCNN). Experimental results demonstrate that the proposed method achieves higher classification accuracy than the comparative methods on three sentiment analysis datasets. Moreover, the experimental results fully validate the effectiveness of the proposed approach.

Keywords

Sentiment analysis part-of-speech RoBERTa bidirectional gated recurrent unit,deep pyramid convolutional neural network

1 Introduction

Sentiment analysis represents a deeply investigated area within the realm of Natural Language Processing (NLP), with its fundamental objective being the identification and interpretation of sentiments and emotional nuances present in textual content [1]. The surge in the volume of textual data, spanning social media posts, online commentary, and journalistic reports, has magnified the importance of sentiment analysis. Its applications are not confined to merely gauging consumer responses to products and services; rather, they extend to offering insightful analyses of societal trends and public sentiments. Therefore, sentiment analysis emerges as a field with significant potential for application across sectors such as business, social media surveillance, and the examination of public opinions [2].

Nevertheless, sentiment analysis confronts several hurdles, notably arising from the complexity inherent in textual data. Texts frequently contain a mix of sentiments, characterized by varying emotional tones, ambiguities, and multiple meanings. Traditional approaches to sentiment analysis often fall short in adeptly navigating these complex emotional layers. As a result, there has been a consistent effort among researchers to refine the accuracy and robustness of sentiment analysis techniques to better meet the needs of real-world applications [3].

This paper focuses on examining sentiment analysis methodologies that incorporate multi-feature fusion and dual-channel networks to improve the efficiency and effectiveness of sentiment analysis. Multi-feature fusion is a strategy that combines different types of feature information to bolster the model’s performance. Through the integration of textual, emotional, and part-of-speech (POS) features, we aim to achieve a nuanced understanding of the emotions conveyed in the text. POS features, which delineate the grammatical function of words within a text [4], aid the model in grasping the grammatical constructs and relationships of POS in the text. For instance, information on POS such as nouns, verbs, and adjectives can provide insights into the manner and intensity of emotional expression. Emotional features are pivotal in distinguishing the emotional content within a text, allowing the model to discern between emotional and non-emotional expressions, thereby enabling more precise assessments. Given the wide array of emotions and varied expressions found in texts, multi-feature fusion is essential for enabling models to capture this diversity of sentiments. This approach, in turn, enhances the depth and adaptability of sentiment analysis.

The dual-channel network introduced in this paper consists of a Bidirectional Gated Recurrent Unit (BiGRU) [5] and a Deep Pyramid Convolutional Neural Network (DPCNN) [6], forming a robust deep learning framework for textual data analysis. This architecture harnesses the capabilities of two distinct neural network models to encapsulate a wide range of information and features within the text. The BiGRU, as a Recurrent Neural Network (RNN) optimized for sequential data, effectively captures contextual nuances in text. Its bidirectional functionality allows for the assimilation of both preceding and succeeding contextual information, thereby enriching its understanding of context—a vital component in sentiment analysis due to the contextual dependency of sentiments. Concurrently, the DPCNN focuses on extracting features within a localized scope, adeptly identifying local patterns and attributes embedded in the text. The synergistic operation of BiGRU and DPCNN within the dual-channel network facilitates the comprehensive consideration of both the global context and localized features, culminating in enhanced performance in sentiment analysis. Deep neural networks, while powerful, are prone to overfitting, particularly in situations with limited data availability. The dual-channel network, by incorporating two distinct structures (BiGRU and DPCNN), introduces additional parameters and layers that significantly improve the model’s generalization capability, enabling it to adapt more effectively to texts across various datasets and domains. This strategy effectively reduces the risk of overfitting [7].

In summary, the main contributions of this paper are as follows.

–
This paper explores the effective fusion of various types of feature information, encompassing textual features, POS features, and emotion features, aiming to enhance the performance of sentiment analysis.
–
In multi-feature fusion, this paper introduces attention mechanism to distinguish the relatively important POS token and emotion token.
–
In this paper, we combine a dual-channel network comprising BiGRU and DPCNN. This integration fully exploits the strengths of both BiGRU and DPCNN, enabling the consideration of both global context and local features. This approach not only mitigates the risk of overfitting but also facilitates multilevel feature extraction.

2 Literature review

2.1 Multi-feature fusion for sentiment analysis

Literature [8] proposes an emotion analysis model based on multi-feature fusion, which combines contextual, syntactic, and semantic information to enhance emotion recognition. Additionally, to better capture the semantic information of sentences, an attention mechanism based on contextual emotion space is introduced. In Literature [9], a poetic sentiment analysis model is proposed, integrating multiple encoders to extract text features at various levels. This enrichment of text feature information aims to improve the accuracy of text semantics and enhance the learning and generalization ability of the model. In Literature [10], an emotion analysis model for enterprise brand evaluation is introduced. This model integrates depth features extracted from word vectors and named entities extracted using Conditional Random Field (CRF) [11]. Emotion results are obtained based on Support Vector Machines (SVM) [12]. In Literature [13], a multi-feature fusion GCN sentiment analysis method is proposed. This method constructs text maps based on several features, including aspect relation, word dependence relation, and semantic information relation, respectively. In Literature [14], an emotion analysis model for microblog comments is presented, introducing emojis to more accurately assess the emotional tendencies of microblogs. Emoji vectors and text vectors are input into the attention mechanism, and weights are assigned to extracted feature information to highlight important details. Finally, Literature [15] proposes an emotion analysis method based on bilingual feature fusion, combining Chinese and English to mitigate the influence caused by the deviation of sentence results. An aspect word attention mechanism is introduced to enable the model to acquire dependent information about aspect words.

2.2 Dual-channel networks for sentiment analysis

Literature [16] proposes a two-channel sentiment analysis method based on BiGRU+CNN. This method simultaneously incorporates global attention and local attention to explore the interaction between contextual semantic features and local structural features. Additionally, psychological characteristics are introduced to integrate psychological information into emotional classification. In another work, Literature [17] presents an emotion analysis model for course evaluation. This model constructs a dual-channel network of CNN and BiLSTM. Word vectors are obtained using BERT, and an attention mechanism is introduced after Bidirectional Long Short Term Memory (BiLSTM). Literature [18] introduces a sentiment analysis network based on the CNN-BiGRU architecture. This network utilizes BERT to obtain text vectors and subsequently fuses the output of a dual-channel network with an attention mechanism. Meanwhile, Literature [19] proposes a dual-channel Chinese sentiment analysis method based on multi-feature fusion. It combines word features and input BiLSTM and BiGRU respectively to extract context information. Finally, feature fusion is achieved through an attention mechanism. Literature [20] constructs a two-channel Chinese sentiment analysis model. This model inputs text into language models such as BERT and Word2vec [21] to obtain vector representations. Subsequently, through the dual-channel network of CNN and BiLSTM, the emotion category is determined by combining the attention mechanism.

Fusing multiple features is a pivotal strategy for augmenting the precision of sentiment analysis, with significant contributions made by several researchers [8–10, 14, 15] in this domain. However, the breadth of features they incorporate remains overly narrow, diminishing their relevance in sentiment analysis and constraining the model’s ability to generalize. Dual-channel networks, by contrast, leverage the advantage of assessing emotional data from diverse viewpoints. Studies [16, 17, 19, 20] have employed dual-channel frameworks to refine emotion classification accuracy. However, CNN applied in these investigations are of a shallow architecture, which curtails their capability for comprehensive feature extraction and results in a pronounced oversight of the global information intrinsic to emotional text.

3 Proposed methodology

3.1 Method structure

The network structure of the proposed method is shown in Fig. 1.

Fig. 1

The text classification structure based on TTL and GCN.

As depicted in Fig. 1, the proposed methodology integrates several advanced neural networks, specifically RoBERTa, BiGRU, and DPCNN, to enhance performance. For ease of reference and clarity in subsequent discussions, this amalgamation will henceforth be designated as the RoBERTa-BiGRU-DPCNN (BGC) framework. Initially, the process involves the transformation of text data through RoBERTa to produce vector representations. These vectors are then enriched by integrating POS and emotion features, yielding multi-faceted feature vectors. These comprehensive vectors serve as inputs to a dual-channel network architecture, consisting of both BiGRU and DPCNN components. Upon normalization, the framework proceeds to output predictions across various emotion categories. The comprehensive methodology is meticulously outlined in Algorithm 1.

3.2 Data preprocessing

As depicted in Algorithm 1, the dataset requires preprocessing to render it compatible with the model. Initially, the dataset undergoes a cleaning process where irrelevant symbols and noise are removed from the original textual data. This is followed by the transformation of the text into RoBERTa encodings through the tokenizer tool provided by the transformers library [21]. Subsequently, the text is enriched with annotations for POS and emotional categories using the NLTK toolkit [22]. For example, given the text [‘I’, ‘like’, ‘to’, ‘watch’, ‘action’, ‘movies’], the corresponding POS sequence would be [’PRP’, ‘VBP’, ‘TO’, ‘VB’, ‘NN’, ‘NNS’], and the emotional categorization might appear as [’neu’, ‘pos’, ‘neu’, ‘neu’, ‘neu’, ‘neu’].

To enhance the semantic accuracy of POS vector representations, our study leverages the Continuous Bag of Words (CBOW) algorithm from Word2Vec for training the POS sequences. In addressing the representation of emotional words, the emotional valence of a word is identified by selecting the category with the highest score according to NLTK’s analysis. Given that emotional words are categorized into three distinct groups, their vector representation is simplified using one-hot encoding, facilitating a more direct interpretation.

3.3 BGC

3.3.2 RoBERTa

RoBERTa is constructed upon the Transformer architecture, which has established itself as the cornerstone for numerous state-of-the-art NLP models. It signifies a notable progression in the field of NLP, expanding upon the achievements of BERT. The input structure of RoBERTa remains analogous to that of BERT, as illustrated in Fig. 2.

Fig. 2

The input composition of RoBERTa.

RoBERTa has achieved significant improvements in its pretraining objectives, scale, and training techniques. Similar to BERT, RoBERTa follows a two-step process: pretraining and fine-tuning. During the pretraining phase, the model undergoes training on an extensive corpus of text data to acquire contextually relevant word representations. The key innovation in RoBERTa lies in its approach to pretraining objectives. Unlike BERT, RoBERTa employs a substantially larger dataset and eliminates the Next Sentence Prediction (NSP) task. Instead, it utilizes a Masked Language Modeling (MLM) objective, wherein the model predicts randomly masked words within sentences. This adjustment enhances RoBERTa’s ability to understand language and context more effectively [23].

RoBERTa’s training data scale significantly exceeds that of BERT, utilizing a corpus of 160 GB of text data, whereas BERT only uses 13.5 GB. Furthermore, RoBERTa employs a larger model with more layers and hidden units. This increased scale and scope contribute to its superior performance across various NLP benchmarks [24]. Additionally, RoBERTa employs larger batch sizes and undergoes more training steps [25].

3.4.1 Multi-feature fusion

The Self-Attention Mechanism [26], a cornerstone technique in deep learning, has garnered widespread acclaim for its pivotal role, particularly within NLP. It empowers models to dynamically allocate attention weights across various positions in a sequence, considering their interrelations. Such a feature significantly augments the model’s prowess in capturing contextual nuances and forging semantic linkages. The crux of this study lies in applying the self-attention mechanism to sequences of POS and emotion, serving a dual purpose. The first objective is to enhance the dimensionality of POS and emotion features to align with the dimensions of the RoBERTa model, thereby facilitating smoother computational processes. The second aims at discerning the relatively crucial POS tokens and the significant emotion tokens within their respective sequences. This nuanced differentiation is anticipated to sharpen the model’s focus and attentional allocation. The mathematical representation of the attention mechanism is delineated as follows.

Q, K, V = linear (Vec)

(1)

{Vec}_{updated} = softmax (\frac{{QK}^{T}}{d}) V

(2)

As delineated in Equations (1) and (2), the concept of linearity can be encapsulated within a trainable matrix-vector framework, whereas the softmax function is characterized as a mechanism for normalization. Following the implementation of Equations (1) and (2) to refresh both the POS and emotion sequences, these are amalgamated with the initial text sequence. This process is succeeded by the application of a linear transformation, which is immediately followed by the employment of the Mish activation function [27]. The purpose of this linear transformation is twofold: firstly, it aims to diminish the dimensionality of features, thereby streamlining the extraction of subsequent features; secondly, it leverages its trainable parameters to augment the fusion of features from diverse sources. The Mish activation function is instrumental in countering the risk of model overfitting and in enhancing the feature vector’s smoothness.

3.3.3 Dual-channel networks

The BiGRU, or Bidirectional Gated Recurrent Unit, represents a sophisticated neural network architecture extensively employed within the domains of Natural Language Processing (NLP) and sequence modeling tasks. It significantly augments the functionality of the conventional GRU by facilitating the flow of information in both forward and reverse directions, thereby capturing a more comprehensive contextual understanding and relationships within sequential datasets. The GRU, recognized as an efficient alternative to the LSTM, is distinguished by its streamlined and practical parameters. Remarkably, it delivers performance on par with the LSTM, yet with a reduced parameter count [28]. Figure 3 illustrates the structural design of the BiGRU.

Fig. 3

BiGRU.

The DPCNN represents an advancement in deep learning architectures, building on the foundational principles of TextCNN to address challenges in text classification and sentiment analysis tasks [29]. At its core, DPCNN is designed to harness the capabilities of a deep convolutional neural network, thereby adeptly capturing both the local and global characteristics inherent in text data. A distinctive feature of DPCNN is its implementation of a convolutional pyramid structure, which entails a hierarchy of convolutional layers each specialized in identifying various levels of textual features. This design principle facilitates a nuanced understanding of the text’s hierarchical nature. Further enhancing its robustness, DPCNN incorporates residual connections that promote the efficient flow of information across the network layers, effectively mitigating the vanishing gradient issue and making the network more amenable to training. These connections also play a crucial role in preserving information throughout the network. In the wake of each convolutional layer, DPCNN applies max-pooling to condense the data dimensionality, thereby spotlighting the most salient features. This strategy is pivotal in reducing computational demands and curbing the risk of overfitting [30]. The architecture of DPCNN is depicted in Fig. 4, illustrating its comprehensive approach to parsing and interpreting text data.

Fig. 4

DPCNN.

While the DPCNN demonstrates certain proficiency in extracting global features, it may not achieve the level of comprehensiveness found in RNN architectures. Consequently, in order to more thoroughly capture both global and local information within sentiment analysis texts, this study integrates BiGRU with DPCNN. This approach harnesses the combined strengths of both RNN and CNN architectures.

4 Experimental results

In this section, we will experimentally verify the effectiveness and superiority of the method in this paper.

4.1 Experimental datasets

To fully demonstrate the superiority and effectiveness of the proposed method, this paper conducts experiments on three of the most common sentiment analysis datasets. These datasets include Imdb [31], MR [32], SST-2 [33], Twitter [34], and Amazon [35], their detailed information is presented in Table 1.

Table 1
The datasets information

Dataset Training Test Val Classes Avg Length

Imdb¹ 25000 15000 10000 2 270

MR² 7463 2132 1066 2 20

SST-2³ 16000 2000 2000 2 10

Twitter⁴ 16000 2000 2000 6 19

Amazon⁵ 16000 2000 2000 2 74

Dataset	Training	Test	Val	Classes	Avg Length
Imdb¹	25000	15000	10000	2	270
MR²	7463	2132	1066	2	20
SST-2³	16000	2000	2000	2	10
Twitter⁴	16000	2000	2000	6	19
Amazon⁵	16000	2000	2000	2	74

¹ http://ai.stanford.edu/∼amaas/data/sentiment/ ² https://www.cs.cornell.edu/people/pabo/movie-review-data/ ³ https://www.cs.cornell.edu/people/pabo/movie-review-data/ ⁴ https://github.com/dair-ai/emotion_dataset ⁵ https://registry.opendata.aws/

As presented in Table 1, the dataset employed in this study comprises a diverse array of text lengths, including long-text sources from IMDb and short-text collections such as MR, SST-2, and Twitter, thereby ensuring a broad spectrum of textual data for analysis. SST-2 is delineated as a subset within this dataset, with the training, testing, and validation partitions being randomly selected from the original corpus. The Twitter dataset is constituted of English-language tweets that encapsulate six primary emotions: anger, fear, joy, love, sadness, and surprise. Additionally, the dataset incorporating Amazon reviews consists of consumer feedback collected from the Amazon platform, spanning a dichotomy of emotional expressions.

4.2 Experimental evaluation index

Sentiment analysis falls within the realm of classification tasks. Therefore, in this paper, we have selected accuracy, recall, and F1 score as the evaluation metrics for the model. The calculation methods are illustrated in Eqs. (3), (4), and (5).

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(3)

Recall = \frac{TP}{TP + FN}

(4)

F 1 = \frac{2 * Precision * Recall}{Precision + Recall}

(5)

True Positives (TP) represent the number of correctly predicted positive classes. False Positives (FP) indicate the number of negative classes incorrectly predicted as positive. True Negatives (TN) denote the number of correctly predicted negative classes. False Negatives (FN) refer to the number of positive classes incorrectly predicted as negative [36].

4.3 Experimental setting

This paper delineates the methodology for training a model, employing a learning rate of 2e-5, with the training process defaulting to three iterations and terminating when the loss on the validation set does not diminish across three successive batches. The dataset is processed in batches of 32, with sentence length normalization conducted in accordance to the average length observed within the dataset. For optimization, the Adam algorithm is utilized, incorporating a dropout rate of 0.5 to mitigate overfitting. The architecture of the RoBERTa model is characterized by a hidden layer size of 1024. In contrast, the BiGRU model is configured with a hidden layer size of 512, and the DPCNN model is designed to utilize 256 filters.

4.4 Baselines

In this section, we will briefly introduce the baseline models of this paper.

Machine Learning. In sentiment analysis, machine learning commonly employs classifiers such as Random Forests (RF) ([37]), k-Nearest Neighbors (KNN) ([38]), and Support Vector Machine (SVM) ([39]). In this context, the text is typically vectorized using the Term Frequency-Inverse Document Frequency (TF-IDF) approach [40] and subsequently input into the classifier to generate classification predictions.

Deep Learning. In deep learning techniques, such as BiLSTM ([41]), BiGRU ([42]), CNN ([43]), FastText [44], and Graph Convolutional Network (GCN) [45], various models are employed. In the case of BiLSTM and BiGRU, the final output is extracted as the classification token, which is then processed through a linear layer for prediction. Our CNN model is constructed based on the TextCNN architecture, utilizing convolution kernels of size (2, 3, 4). Regarding GCN, we follow the approach outlined in reference [45] for data preprocessing and training. Word vectors for RNN, GCN, and CNN are initialized with pre-trained GloVe embeddings [46].

Recent Related Works. To rigorously demonstrate the superiority of our proposed method within the domain of sentiment analysis, we intend to conduct a comparative analysis against the most relevant techniques in sentiment analysis. For the multi-feature fusion approach, we will employ the methodology outlined in reference [8], referred to as the Contextual Information-Syntactic Information-Semantic Information (CSS) model. Regarding the dual-channel approach, the method detailed in reference [18], known as the BiGRU-CNN method, will be utilized.

4.5 Results

In this section, we present the relevant experimental findings along with a concise analysis of these results. Tables 2 and 3 display the testing accuracies of each method. It is important to note that the calculation here pertains to the macro-average result.

Table 2
Classification accuracy on Imdb, MR, and SST-2 (%)

Imdb MR SST-2

Method Accuracy Recall F1 Accuracy Recall F1 Accuracy Recall F1

SVM 89.64 89.44 89.55 76.11 76.08 76.05 84.88 84.76 84.70

KNN 88.55 88.61 88.64 75.69 75.50 75.58 83.18 83.26 83.28

RF 87.88 87.94 87.95 74.58 74.66 74.60 84.06 84.22 84.18

BiLSTM 85.66 86.11 85.77 77.23 77.32 77.28 85.26 85.66 85.41

BiGRU 84.98 84.85 84.90 77.56 77.48 77.50 85.42 85.44 85.50

CNN 86.11 86.08 86.15 77.64 77.68 76.70 85.88 85.60 85.61

GCN 85.26 85.44 85.33 76.77 76.58 76.66 84.56 84.50 84.58

FastText 84.20 84.33 84.28 76.33 76.40 76.42 83.26 83.44 83.30

CSS[8] 92.08 92.11 92.10 79.20 79.31 79.22 87.32 87.64 87.44

BiGRU-CNN[18] 92.12 92.28 92.23 78.66 78.58 78.70 88.56 88.58 88.70

Proposed method 92.38 92.46 92.40 79.74 79.75 79.69 88.80 88.76 88.82

	Imdb	MR	SST-2
SVM	89.64	89.44	89.55	76.11	76.08	76.05	84.88	84.76	84.70
KNN	88.55	88.61	88.64	75.69	75.50	75.58	83.18	83.26	83.28
RF	87.88	87.94	87.95	74.58	74.66	74.60	84.06	84.22	84.18
BiLSTM	85.66	86.11	85.77	77.23	77.32	77.28	85.26	85.66	85.41
BiGRU	84.98	84.85	84.90	77.56	77.48	77.50	85.42	85.44	85.50
CNN	86.11	86.08	86.15	77.64	77.68	76.70	85.88	85.60	85.61
GCN	85.26	85.44	85.33	76.77	76.58	76.66	84.56	84.50	84.58
FastText	84.20	84.33	84.28	76.33	76.40	76.42	83.26	83.44	83.30
CSS[8]	92.08	92.11	92.10	79.20	79.31	79.22	87.32	87.64	87.44
BiGRU-CNN[18]	92.12	92.28	92.23	78.66	78.58	78.70	88.56	88.58	88.70
Proposed method	92.38	92.46	92.40	79.74	79.75	79.69	88.80	88.76	88.82

Table 3

Classification accuracy on Twitter and Amazon (%)

	Twitter			Amazon
Method	Accuracy	Recall	F1	Accuracy	Recall	F1
SVM	88.55	80.51	82.59	85.10	85.10	85.10
KNN	75.25	58.35	63.97	75.80	75.80	75.76
RF	85.45	77.72	79.60	78.20	78.20	78.19
BiLSTM	88.33	88.24	88.36	86.12	86.33	86.22
BiGRU	88.50	88.32	88.18	85.65	86.22	86.33
CNN	89.12	88.92	88.61	87.84	88.62	88.34
GCN	89.06	89.64	89.33	88.16	88.32	88.18
FastText	86.88	86.71	86.77	86.12	83.33	86.33
CSS[8]	92.89	92.88	93.12	92.98	92.88	92.67
BiGRU-CNN[18]	93.56	93.64	93.34	93.20	93.12	92.96
Proposed method	94.32	94.26	94.28	93.86	93.64	93.78

As delineated in Tables 2 and 3, the method introduced in this study has achieved superior classification performance across five distinct datasets. Within the IMDb dataset, the proposed approach realized an accuracy of 92.38%, exceeding the performance of BiGRU-CNN by 0.26% and CSS by 0.3% in terms of accuracy. In the MR dataset, the accuracy achieved by the proposed method was 79.74%, surpassing BiGRU-CNN by 1.08% and CSS by 0.54%. For the SST-2 dataset, an accuracy of 88.80% was recorded for the proposed method, outstripping BiGRU-CNN by 0.24% and CSS by 1.48%. Regarding the Twitter dataset, the proposed method’s accuracy exceeded that of BiGRU-CNN by 0.76% and CSS by 1.43%. In the Amazon dataset, the proposed approach’s accuracy was 1.66% higher than that of BiGRU-CNN and 1.88% higher than CSS. These outcomes suggest that employing RoBERTa as a language model facilitates the generation of more precise semantic text vectors, thereby outperforming static language models such as GloVe and BERT in performance metrics. Moreover, the configuration of BiGRU-DPCNN demonstrates superiority over BiGRU-CNN, indicating enhanced feature extraction capabilities afforded by DPCNN as compared to CNN. Collectively, these findings underscore the proposed method’s efficacy in sentiment analysis across both short-text and long-text contexts, affirming its preeminence in this domain. The efficacy of the proposed method is further corroborated by the results of ablation studies, illustrated in Figs. 5 through 9, where ‘MF’ and ‘Att’ denote ‘Multi-Feature’ and ‘Attention Mechanism’, respectively.

Fig. 5

Melting experiment results on Imdb.

Fig. 6

Melting experiment results on MR.

Fig. 7

Melting experiment results on SST-2.

Fig. 8

Melting experiment results on Twitter.

Fig. 9

Melting experiment results on Amazon.

The results from the sentiment analysis experiments unequivocally indicate that RoBERTa surpasses BERT in classification performance across all three evaluated datasets, underscoring RoBERTa’s superior efficacy as a language model. Augmenting RoBERTa with POS information and emotion analytics yielded accuracy enhancements of 0.28%, 0.17%, 0.17%, 0.46%, and 0.43%, respectively. This enhancement evidences the premise that the integration of POS and emotional insights into the model significantly amplifies its capacity to discern the sentiment conveyed within sentences.

Further refinement of RoBERTa through the integration of an attention mechanism, referred to as RoBERTa+MF, resulted in accuracy increments of 0.3%, 0.14%, 0.33%, 0.28%, and 0.29% across the five datasets. This enhancement illustrates the pivotal role of attention mechanisms in enabling the model to concentrate on salient POS and emotional indicators. The evolution to RoBERTa+MF(Att) supplemented with a BiGRU-DPCNN architecture led to even more pronounced accuracy enhancements, registering gains of 0.5%, 0.88%, 0.82%, 0.78%, and 0.99%. Moreover, this composite approach exhibited superior performance over monolithic models such as BiGRU or DPCNN. These findings compellingly demonstrate that the dual-channel framework is adept at extracting comprehensive contextual sentiment information while simultaneously acknowledging localized cues. In conclusion, the efficacy of the proposed methodologies is conclusively validated by Figs. 5 through 9, showcasing their effectiveness in sentiment analysis tasks.

5 Conclusion

This research introduces an innovative framework for sentiment analysis that incorporates POS and emotional features, alongside a dual-channel architecture combining BiGRU and DPCNN to augment model efficacy. In this work, RoBERTa is preferred over BERT to achieve more accurate semantic representations of text. The integration of POS and emotional features facilitates a nuanced capture of emotional nuances within the text, marking a departure from conventional sentiment analysis approaches that predominantly rely on sentiment lexicons and syntactic analyses but often neglect the informative value of POS tags. The method proposed herein synergizes these elements, thereby enabling a more sophisticated understanding of both the syntactic and semantic dimensions underpinning emotions conveyed in texts. Furthermore, this approach leverages a dual-channel network structure that combines BiGRU and DPCNN, architectures that are adept at processing information at different granularities. BiGRU is renowned for its sequence modeling prowess, capably identifying long-term dependencies within text. Conversely, DPCNN is adept at delineating local features, rendering it particularly effective for discerning local patterns and sentiment expressions. The amalgamation of these dual channels equips the model with the capability to undertake a thorough analysis of text, significantly enhancing sentiment analysis performance. Through empirical studies conducted on datasets such as IMDb, MR, SST-2, Twitter, and Amazon, this paper substantiates the superior performance of the proposed methodology over conventional sentiment analysis models. In sum, this investigation not only advances model performance through the integration of POS and emotional features coupled with a dual-channel network but also sets a precedent for future research and development in textual sentiment analysis, underlining its potential to improve both the precision and the practical applicability of sentiment analysis tools.

Conflict of interest

The authors declare no competing interests.

Footnotes

Acknowledgments

This work was supported in part by the Liaoning Natural Science Foundation Program Grant 2019-MS-036; In part by the Liaoning Provincial Science and Technology Department under Grant 1655706734383; And in part by the Basic Scientific Research Projects of Colleges and Universities of Liaoning Provincial Department of Education under Grant LJKMZ20220826.

References

Wankhade

Rao

A.C.S.

Kulkarni

, A survey on sentiment analysis methods, applications, and challenges, Artificial Intelligence Review55(7) (2022), 5731–5780.

Babu

N.V.

Kanaga

E.G.M.

, Sentiment analysis in social media data for depression detection using artificial intelligence: A review, SN Computer Science3 (2022), 1–20.

Abdullah

Ahmet

, Deep learning in sentiment analysis: Recent architectures, ACM Computing Surveys55(8) (2022), 1–37.

Liu

Guan

Yang

, et al., Effective method for making Chinese word vector dynamic, Journal of Intelligent & Fuzzy Systems45(1) (2023), 941–952.

Xiu

Liu

Qiu

, et al., A method of sentiment analysis and visualized interaction based on ernie-tiny and BiGRU, Applied Sciences13(10) (2023).

Shah

Patel

Sanghvi

, et al., A comparative analysis of logistic regression, random forest and KNN models for the text classification, Augmented Human Research5 (2020), 1–16.

Babu

N.V.

Kanaga

E.G.M.

, Sentiment analysis in social media data for depression detection using artificial intelligence: A review, SN Computer Science3 (2022), 1–20.

Zhuang

Liu

T.T.

, et al., Implicit sentiment analysis based on multi-feature neural network model, Soft Computing (2022), 1–10.

Lingli

Yadong

Qikai

, et al., SA-Model: Multi-feature fusion poetic sentiment analysis based on a hybrid word vector model, in: 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). IEEE (2022), pp. 984–988.

10.

, Business brand research based on multi-feature fusion emotion analysis, Frontiers in Psychology13 (2022), 939304.

11.

Huang

Zhang

, et al., CRF-GCN: An effective syntactic dependency model for aspect-level sentiment analysis, Knowledge-Based Systems260 (2023), 110125.

12.

AlBadani

Shi

Dong

, A novel machine learning approach for sentiment analysis on Twitter incorporating the universal language model fine-tuning and SVM, Applied System Innovation5(1) (2022), 13.

13.

Huang

Fukumoto

, et al., Multi-feature and Multi-channel GCNs for aspect based sentiment analysis, in: International Conference on Database and Expert Systems Applications Cham: Springer Nature Switzerland (2023), pp. 158–172.

14.

Wang

, Multi-feature microblog sentiment analysis based on BERT-AttBiGRU model, in: Proceedings of the 3rd Asia-Pacific Conference on Image Processing, Electronics and Computers (2022), pp. 967–974.

15.

Ding

Long

, Bilingual multi-feature sentiment analysis based on relationship degree, in: International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023). SPIE (2023) 12718, pp. 482–490.

16.

Wang

Feng

Liu

, et al., Dual BiGRU-CNN-based sentiment classification method combining global and local attention, The Journal of Supercomputing (2023), 1–39.

17.

Yan

Liu

, et al., Sentiment analysis and topic mining using a novel deep attention-based parallel dual-channel model for online course reviews, Cognitive Computation15(1) (2023), 304–322.

18.

Zhen

Shang

Zhang

, Sentiment analysis of hybrid network model based on attention, International Journal of Software Innovation (IJSI)11(1) (2023), 1–17.

19.

Zhao

Yang

, Dual channel Chinese sentiment analysis of characters and words based on deep learning, in: 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, (2023) 6, 231–235.

20.

Ning

Gaoya

, et al., Chinese text sentiment analysis based on dual channel attention network with hybrid word embedding, Data Analysis and Knowledge Discovery7(3) (2023), 58–68.

21.

homas

Lysandre

Victor

, et al., Transformers: State-of-the-Art natural language processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (2020), 38–45.

22.

Bird, Steven, L. Edward, et al., Natural language processing with Python, O’Reilly Media Inc (2009).

23.

Tan

K.L.

Lee

C.P.

Anbananthen

K.S.M.

, et al., RoBERTa-LSTM: A hybrid model for sentiment analysis with transformer and recurrent neural network, IEEE Access10 (2022), 21517–21525.

24.

Ghasiya

Okamura

, Investigating COVID-19 news across four nations: A topic modeling and sentiment analysis approach, IEEE Access9 (2021), 36645–36656.

25.

You

Han

Peng

, et al., ASK-RoBERTa: A pretraining model for aspect-based sentiment classification via sentiment knowledge mining, Knowledge-Based Systems253 (2022), 109511.

26.

Vaswani

Shazeer

Parmar

, et al., Attention is all you need, Advances in Neural Information Processing Systems30 (2017).

27.

Misra

, Mish: A self regularized non-monotonic activation function, arxiv preprint arxiv:1908.08681, (2019).

28.

Dong

Yang

Cao

, A text classification model based on GCN and BiGRU fusion, in: Proceedings of the 8th International Conference on Computing and Artificial Intelligence (2022), pp. 318–322.

29.

Zhang

M.J.

Pang

Cai

, et al., DPCNN-based models for text classification, in: 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud). IEEE (2023), pp. 363–368.

30.

Huang

Nie

Huang

, Text sentiment classification method based on DPCNN and BiLSTM, ITM web of conferences, 45 (2022), 01040.

31.

Andrew L. Maas, E. Raymond, et al., Learning word vectors for sentiment analysis, in: The 49th Annual Meeting of the Association for Computational Linguistics (ACL). 2011.

32.

Lillian

, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, in: Proceedings of ACL. 2005, pp. 115–124.

33.

Socher

Perelygin

, et al., Recursive deep models for semantic compositionality over a sentiment treebank, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013, pp. 1631–1642.

34.

Saravia

Liu

H.C.T.

Huang

Y.H.

, et al., Carer: Contextualized affect representations for emotion recognition, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, pp. 3687–3697.

35.

Zhang

Zhao

LeCun

, Character-level convolutional networks for text classification, Advances in Neural Information Processing Systems28 (2015).

36.

Liu

Guan

Yang

, et al., Transformer and graph convolutional network for text classification, International Journal of Computational Intelligence Systems16(1) (2023), 161.

37.

Maruf

Javed

Babri

H.A.

, Improving text classification performance with random forests-based feature selection, Arabian Journal for Science and Engineering41 (2016), 951–964.

38.

Chen

Zhou

L.J.

Da Li

, et al., The Lao text classification method based on KNN, Procedia Computer Science166 (2020), 523–528.

39.

Goudjil

Koudil

Bedda

, et al., A novel active learning method using SVM for text classification, International Journal of Automation and Computing15 (2018), 290–298.

40.

Kim

S.W.

Gil

J.M.

, Research paper classification systems based on TF-IDF and LDA schemes, Human-centric Computing and Information Sciences9 (2019), 1–21.

41.

Hameed

Garcia-Zapirain

, Sentiment classification using a single-layered BiLSTM model, IEEE Access8 (2020), 73992–74001.

42.

Kenarang

Farahani

Manthouri

, BiGRU attention capsule neural network for persian text classification, Journal of Ambient Intelligence and Humanized Computing13(8) (2022), 3923–3933.

43.

Kim

, Convolutional neural networks for sentence classification, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1746–1751.

44.

Bojanowski

Grave

Joulin

, et al., Enriching word vectors with subword information, Transactions of the Association for Computational linguistics5 (2017), 135–146.

45.

Yao

Mao

Luo

, Graph convolutional networks for text classification, in: Proceedings of the AAAI Conference on Artificial Intelligence33(01) (2019), pp. 7370–7377.

46.

Pennington

Socher

Manning

C.D.

, Glove: Global vectors for word representation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532–1543.

Multi-feature fusion and dual-channel networks for sentiment analysis

Abstract

Keywords

1 Introduction

2.1 Multi-feature fusion for sentiment analysis

2.2 Dual-channel networks for sentiment analysis

3 Proposed methodology

3.1 Method structure

3.3 BGC

3.3.2 RoBERTa

4.1 Experimental datasets

Table 1 The datasets information Dataset Training Test Val Classes Avg Length Imdb1 25000 15000 10000 2 270 MR2 7463 2132 1066 2 20 SST-23 16000 2000 2000 2 10 Twitter4 16000 2000 2000 6 19 Amazon5 16000 2000 2000 2 74

4.4 Baselines

4.5 Results

Conflict of interest

Footnotes

Acknowledgments

References

Table 1
The datasets information

Dataset Training Test Val Classes Avg Length

Imdb¹ 25000 15000 10000 2 270

MR² 7463 2132 1066 2 20

SST-2³ 16000 2000 2000 2 10

Twitter⁴ 16000 2000 2000 6 19

Amazon⁵ 16000 2000 2000 2 74