Abstract
Sentiment analysis, which involves determining the emotional polarity positivity, negativity, or neutrality in the source texts, is a crucial task. Multilingual sentiment analysis techniques were developed to analyze data in several languages; a notable deficiency of resources in multilingual sentiment analysis is one of the primary issues. Furthermore, the developed methods for multilingual sentiment analysis have some limitations such as data dependency, reliability, robustness, and computational complexity. To tackle these shortcomings, this research proposed a multilingual improved multi-attention Deep Learning model (M2PSC-DL), which leverages the advantages of the Bi-directional Long Short Term Memory (BiLSTM) classifier with improved attention mechanisms. The Multi-metric graph embedding technique encodes the data to provide more contextual information representation. Additionally, the combination of improved Positional Spatial Channel (SPC) attention increases the capability of the model to extract relevant features in the training process which leads to getting accurate results in sentiment analysis tasks. Furthermore, the research proposed an improved sigmoid activation for solving the vanishing gradient issues that help the model avoid gradient saturations. The validation results demonstrate that the M2PSC-DL model attains 96.26% accuracy, 96.06% precision, and 96.18% recall for the XED dataset which is far better than the traditional methods.
Keywords
Introduction
Nowadays social media covers people’s opinions, feelings, and emotions through the internet all over the world [1, 2]. Different people from many countries express their feelings by uploading their captivities, product reviewing, and discussing current trending topics through multiple social media platforms formal or informal, using mixed code languages [3]. People from different cultures and linguistic backgrounds have various opinions accordingly by viewing the subject and communicating with mixed languages based on their comfort [4, 5, 6, 7]. Code mixed writing is nothing but using bilingual or multilingual languages for conversation, for example, in India, Hindi and English are the most commonly preferred bilingual language for conversation, because Hindi is an official language in India [8, 9, 10]. But these types of conversations adopt multiple challenges to detect and analyze the feelings of every person, along with difficulties for accurate detection, quite complicated to develop a robust model, limited resources, and so on. This issue made Natural Language Processing (NLP) quite a challenging task [10], which plays a significant role in analyzing feelings under the code of mixed texts and messages.
The sentiment analysis refers to, analyzing the feelings behind the text conversation like anger, sad, happiness, joy, and so on, and is categorized under three circumstances namely, positive, negative, and neutral [11], more deeply, it is also referred to the extraction of subjectivity and polarity behind the text [12, 13]. During the sentimental analysis, the wrong translation, misunderstanding of the text, and merging two or more different languages in the conversation like “This song bohutachhahai” instead of “This is a nice song” paved the way to produce massive error output [14]. The code mixed conversation provides various grammatical errors, informal text, and other deliberate errors which make it hard to train the model and difficult to analyze the sentiments of an individual [7]. The code mixed dataset required to perform the sentimental analysis task and provide a couple of challenges like repeating spelling in a word ie., “plz”, “please”, “pleeeeez” and many more, and equivocal words which mean words with similar sounds [15]. To overcome these limitations and challenges, techniques such as machine learning (ML), deep learning (DL), and transfer learning were utilized to demonstrate the sentiment analysis using multiple classifiers to detect the code-mixed languages.
Methods including unsupervised learning and semi-supervised learning approaches provide a huge capacity for collecting data, annotation, and developing free tools to build bridges between the gaps for accurate sentimental analysis using bilingual or multilingual languages [10]. The multi-task learning (MTL) method is mostly preferred to develop the system performance by learning effective, efficient, and different tasks [16, 17, 13] and adopting various methods and techniques for better detection performances. Many well-known researchers suggested the chief methods for sentimental analysis using such as lexicon-based approach, ML [18], and DL [19] technique, in which, the DL techniques provide meta-level feature representation [20] using multilingual dataset [21] for better sentiment detection [14]. The ML techniques addressed the state of the imbalance class label distribution problem for sentimental analysis and detection [2]. The DL learning approach provides high accuracy and more capability to analyze sentiments using bilingual languages [7, 3]. As long as the bilingual data were quite noisy, n-gram-based methods were deployed to improve the hypothetical size of the datasets [3].
The research aims to analyze the sentiment polarity of the multi-lingual text data using the proposed M2PSC-DL model. To obtain better results, the data quality is improved using the data pre-processing approaches suggested in this research. The TF-IDF and multi-graph embedding features provide the contextual information of the input data. The improved PSC attention mechanism introduced two activation functions namely improved sigmoid activation and EBV which reduce the vanishing gradient issues and minimize system complexity. The proposed M2PSC-DL model leverages the benefits of the BiLSTM classifier and the improved PSC attention mechanisms enhance the feature extraction ability which leads to better efficiency in sentiment analysis.
Improved Sigmoid Activation and Equiangular Basis Vectors (EBV): The research proposed improved sigmoid activation function in the channel and spatial attention mechanism which minimizes the issues related to vanishing gradient. In the positional attention mechanism, the softmax activation is replaced with EBV which addresses the overfitting and generalization problems. Thus the improved activation functions enable the model to learn complex data and produce better results. Multilingual improved multi-attention-based Deep Learning model (M2PSC-DL): The M2PSC-DL model combines the improved PSC attention with the BiLSTM classifier. The model can process the sequence data in both directions and analyze the sentiment of the texts with better accuracy. In addition to that the PSC attention to the EBV and a better sigmoid activation function improves both the model tuning procedure and the capacity to extract features.
The paper is structured as follows: Section 2 reviews the literature on established methods; Section 3 explains the M2PSC-DL model’s methodology; and Section 4 presents the model’s result findings. Section 5 goes into detail on the research’s result and its future scope.
The review of the literature examines similar research on multilingual sentiment analysis and its shortcomings.
Yongfeng Dong et al. [22] utilized a DL technique with a capsule network for analyzing sentiment from multilingual data. The DL model acted as a decoder to get better probability representation. However, the model’s reduced capacity for generalization with the new dataset raised the verification loss. The dynamic route iterations minimized the efficiency of the system. Kalim Sattar et al. [14] implemented a multilingual BERT model to analyze cross-lingual sentiments, which effectively examines aspect-based sentiments. The integration of CNN with the BERT model and the word embedding approaches minimized the training time and exploding gradient issues. However, this approach does not identify and analyze the morphological similarity between the languages for sentiment analysis.
A deep learning method was modeled by Shashi Shekhar et al. [3] to identify the text’s sentiment polarity. The established model leveraged a skip-gram-based embedding technique which assessed the probability of the text. In addition to that, a voting technique was used that used to classify tasks and scalable than the other conventional methods. However, the model does not produce accurate analysis values using a large dataset. Cach N. Dang et al. [23] implemented a hybrid deep learning technique that increased the model’s performance to the single DL technique. Furthermore, the utilization of the ensemble mechanisms boosted the model’s scalability and reliability. Nevertheless, the hybrid technique may pose overfitting problems and vanishing gradient issues.
R. Srinivasan and C.N. Subalalitha [2] used a machine-learning technique to detect sentiment from non-code-mixed data. The SMOTE and ADASYN provide accurate sentimental analysis and overcome the imbalance class using multilingual languages. However, the model may fail to detect multi-class sentiments that may impact the generalizability of the system. Pragati Goel et al. [15] presented an ML model for Bilingual sentiment analysis which achieved massive outcomes for sentimental analysis compared to other methods with maximum accuracy. However, in naïve Bayes, the bilingual language text cannot be worked under simple translation models and the model increased the computational complexity.
Gauri Takawanea, Abhishek Phaltanka et al. [10] implemented a language augmentation model which provides accurate identification of multiple languages in the text conversation and deliberately analyzes the sentiment. However, the method required high computational expenses. Bharathi Raja Chakravarthi et al. [24] presented a fused technique for offensive language detection, the combination of MPNet and CNN model extracts the key information from the sentence and provides accurate sentiment analysis. However, this method used only Dravidian languages and cannot be utilized in other multilingual languages for sentimental analysis which may impact the applicability of the model.
Challenges
The deep learning approach does not identify and analyze the morphological similarities between the languages to analyze the sentiments [25].
The hybrid Character-trigrams LSTM model and NB method cannot be used in other pairs of code-mixed language [3].
In MPNet and CNN methods, the MPNet provides numerous parameters for analyzing the data which produces high computational cost and time for execution [24].
The re-sampling techniques, SMOTE and ADASYN provide multiple gaps in imbalance problems and the F1 score should be improved for sentiment analysis [2].
Using BiLSTM, the code mixed text cannot be worked under simple translation models and the model increases the computational complexity [15].
Proposed methodology for multilingual sentiment analysis
Sentiment analysis of data from social networks, including Facebook and Twitter, is becoming more and more popular. While a huge number of works have been done on this, there are still a lot of issues that need to be resolved, such as enhancing model reliability, cutting down on processing times, and using methods designed for particular data kinds and data domains. To resolve these issues, this research proposed a DL technique for sentiment analysis in multilingual data. The input data obtained from the XED database [26] and real word dataset is transferred into the preprocessing stage, where data transliteration, stop word removal, special characters, and tokenization are removed to improve the structure of the data. TF-IDF features and multi-metric graph embedding features are recovered from the preprocessed data by feature extraction. Following feature extraction the data is transferred into the proposed Multilingual Multi-Attention BiLSTM model which accurately detects the sentiment polarity of the text. The proposed multi-attention strategy helps train the model for accurate and efficient sentiment analysis using multilingual data by optimizing the classifier’s hyperparameters. The schematic depiction of the proposed sentiment analysis approach is shown in Fig. 1.
Schematic illustration of the proposed methodology.
The model uses multilingual data from a real-world data source as input; in this case, the input data may be expressed mathematically as
where
Data preprocessing is a crucial step in multilingual sentiment analysis as it enhances the model’s reliability and data quality. In this research, the input data
Transliteration
The technique of translating a text from one script to another while preserving pronunciation is known as transliteration [28], which makes the text more consistent and aids, in the model’s learning. In this research, the process of transliteration involves transferring words from a regional language into English without altering their meaning. As a result, the output word has English letters but no meaning in the source language, which is not included in the English vocabulary.
Stop word removal
The common elements in an English text such as conjunctions and pronouns are referred to as stop words, which are not useful for sentiment analysis. By eliminating the stop words the model can focus on meaningful information that improves the performance of the model [29].
Tokenization
The task of splitting the data streams into words is referred to as tokenization [30]. Any sentiment analysis model must include tokenization, which is the process of simply separating all the words, numerical values, and other characters from a document that has been given already. These recognized characters are referred to as tokens, which aim to identify the meaningful keyword from the text [31].
Special character removal
In the special character removal stage letter cases are frequently removed, punctuation marks are treated as separators, and numbers used in a document should be ignored. As a result, words are split into verbs and nouns. This aids in a precise analysis of the words [32]. Indexing the text into data vectors is crucial. Additionally, the HTML tags and special characters such as
TF-IDF-based dependency features
Term Frequency-Inverse Document Frequency (TF-IDF) is a remarkable method for determining sentence weights. TF signifies the occurrences of words and IDF signifies the inverse frequency index [33], which demonstrates the importance of words in a sentence. The TF-IDF can be mathematically termed as [34]
where
where indicates the term frequency,
where the frequency word in the database is denoted as
In this multi-metric graph embedding technique the graph is constructed using the TF-IDF vectors, furthermore, the distance measures such as Euclidean distance
The distance between the two vectors is evaluated using the Euclidean distance, which is mathematically calculated as follows [35]
When two sentences are represented as term vectors, the correlation between the vectors indicates the similarity of the sentences to each other, which is measured as the angle between the vectors’ cosine.
The Jaccard index calculates the similarity using an intersection divided by the union of objects. The total weight of terms that are included in both sentences but are not shared terms is compared to the total weight of terms that are shared in sentences using the Jaccard coefficient.
The cosine distance between the two vectors is expressed as follows [cos 1],
The dot product of the two vectors in the graph is computed by the linear kernel, which essentially computes their similarity [36].
Therefore, the distance measures efficiently calculate the connection between the sentences to evaluate the sentiment of the text. If the similarity value is minimum then the sentences belong to the same sentiment the calculated weights are then provided in the proposed model to detect the sentiment of the text. The combined graph embedding can be calculated as follows
The constructed multi-metric graph embedding features with the dimension of
Sentiment analysis is a significant process that determines the text’s polarity, the existing methods designed for sentiment analysis including ML and DL have some drawbacks, including computational complexity, reliability, interpretability, and data requirements. Thus to overcome the above-mentioned issues this research proposed a novel M2PSC-DL model that utilized the improved multi-attention technique along with the BiLSTM classifier. The influence of the BiLSTM model effectively analyzes the sentiment from sequence data. This research combines the improved positional, spatial, and channel attention mechanisms with improved multi-attention to increase the performance of the model on sentiment analysis tasks. Figure 2 shows the architecture of the M2PSC-DL model.
Architecture of the M2PSC-DL model.
In this proposed model four BiLSTM neural networks are implemented with two hidden layers receiving the output of the graph embedding layer. An LSTM can store the history of input data and has already demonstrated the ability to identify patterns in data when information is important in terms of order. LSTM uses the input of the previous unit as well as the current input to compute an output vector, which is then utilized as the input of the subsequent unit. Its development fixes the RNN’s long-term dependency issue and could improve its ability to identify and make use of long-distance data dependencies. However, LSTM is unable to encode data in reverse order [22]. Thus, to provide contextual information more effectively, BiLSTM is utilized. By using the bidirectional version of the input data, the model can learn patterns from the input data both forward and backward, which allows it to recognize patterns that could go undetected while parsing the data in a single direction. To create a single result, the outputs of the backward and forward LSTM algorithms are mixed [37].
The hidden state representations for both forward
The output of the BiLSTM_1 layers can be combined as follows,
The output of the BiLSTM_2 layers can be combined as follows
The output of the BiLSTM_3 layers can be combined as follows
The outputs of the BiLSTM layers are reshaped into the size of
In this research, the improved PSC attention mechanism combines the improved positional, spatial, and channel attention mechanisms which overcomes the issues of standard attention techniques. The positional attention encodes the local features of the appropriate information which leads to improving the capability of the model. Additionally, the spatial attention map can be calculated using the softmax activation; however, the high number of parameters needed for softmax, which can grow exponentially with the number of classes, is one of its primary disadvantages. This can result in poor generalization performance and overfitting, particularly in datasets with a high number of classes. Furthermore, when working with noisy inputs, softmax can yield inconsistent results because it is not highly resilient to noise. To mitigate these issues, this research used an alternative activation known as Equiangular Basis Vectors (EBV) [38]. EBVs predefine the satisfied category basis vectors; which has the potential to improve computational economy, particularly in the case of large-scale training data. The workflow of the improved positional attention is explained as follows, initially, the output from the BiLSTM layer
where
where
The objective of channel attention in enhanced PSC is to learn a 1D weight by increasing the relationship between each channel of the feature map by the relevant channel, which can concentrate more on the relevant semantic information for the current task. Initially, to efficiently learn weight representations, the spatial data is aggregated using average
Additionally, maxout gives the model greater robustness and generalization, and the hyperparameters allow managing the complexity of the model.
The combination of the Rectified linear unit (ReLU) and the leaky ReLU is known as max-out activation which overcomes the vanishing gradient issues which can be measured using the following equation [41]
where
The improved sigmoid at the current state
Thus the improved sigmoid activation eliminates the vanishing gradient by introducing sigmoid and maxout functions that minimize the training cost and aid in improved performance. Similarly in an improved spatial attention mechanism along with the convolutional layer, average, and max pooling layers, the sigmoid activation is replaced with improved sigmoid function. By learning a 2D spatial weight map
The outputs of the improved PSC mechanisms are combined using the addition operation that extracts the valuable features and aids in improved sentiment analysis.
The combined output
Where,
The M2PSC-DL model’s performance and the comparative evaluation with the implemented methods are discussed in this section.
Experimental setup
PYTHON software running on a Windows 10 operating system with 16 GB of RAM is used to conduct the research.
Dataset description
In this research, the XED [26] and Web Scrapped datasets are used to train and validate the performance of the M2PSC-DL method. The dataset collection includes 30 k English and 25 k Finnish sentences with human annotations, together with anticipated annotations for 30 more languages, adding fresh resources for numerous low-resource languages. A Web Scrapped dataset which is also known as LLM dataset. It includes a huge data collection with 3375 rows, and 29 different categories of sentiments.
Performance metrics
The model’s performance is evaluated using the performance metrics including accuracy, precision, and recall. The ratio of the correct prediction from the total predicted cases is considered as the whole correctness of the model measured by accuracy. Precision, alternatively referred to as positive predictive value, represents the percentage of pertinent examples among the recovered examples. Recall is the quantity measurement, whereas precision is the quality measurement. An algorithm with higher accuracy has provided more appropriate results than irrelevant cases. While one has the highest recall that gives the majority appropriate results.
Multi-metric Graph embeddings of the proposed M2PSC-DL model.
Figure 3 depicts the graph structure obtained from the multi-metric graph embedding which is generated based on the TF-IDF vectors.
Performance evaluation with TP for XED dataset.
Performance evaluation with TP for XED dataset
Figure 4 depicts the performance assessment of the M2PSC-DL model with TP and various epoch values. The M2PSC-DL model obtains 96.25% accuracy for TP 90 at epoch 500 which shows that the M2PSC-DL model effectively analyzes the sentiment with better accuracy and minimal error. Additionally, the precision and recall measures exhibit 96.05% and 96.17% for 80 at epoch 500. Thus the M2PSC-DL model with the improved attention enhances the efficacy of the model to effectively detect the sentiment polarity. The incorporation of an improved PSC attention mechanism effectively captures the sequential relationships between the data and tunes the weights of the M2PSC-DL model.
Performance evaluation with TP for real-world dataset
Figure 5 depicts the performance evaluation of the M2PSC-DL model with TP and various epoch values using a real-world dataset. The M2PSC-DL model obtains 95.02% accuracy for TP 90 at epoch 500 which shows that the M2PSC-DL model effectively analyzes the sentiment with better accuracy and minimal error. Additionally, the precision and recall measures exhibit 95.11% and 95.42% for 80 at epoch 500. Thus the M2PSC-DL model with the improved attention enhances the efficacy of the model to effectively detect the sentiment polarity. The incorporation of an improved PSC attention mechanism effectively captures the sequential relationships between the data and the channel attention tunes the weights of the M2PSC-DL model.
Performance evaluation with TP for real-world dataset.
The Performance of the M2PSC-DL model is compared with the traditional Naïve Bayes [15], Perceptron [43], Convolutional Neural Network, and Long Short Term Memory (CNN-BiLSTM) [44], and Fused DCNN [24].
Comparative evaluation with TP for XED dataset
The comparison of the M2PSC-DL model with the established methods for TP using the XED dataset is shown in Fig. 6. At TP 90 the M2PSC-DL model gains 96.25% accuracy, this is superior to the conventional Perceptron by 14.84%, Naïve Bayes by 7.70%, BiLSTM by 14.98%, CNN-LSTM by 10%, and fused DCNN by 2.78%. Similarly when compared with the traditional methods the M2PSC-DL model obtains 96.05% precision, which is improved by 5.43%, 2.29%, 11.36%, 6.64%, and 6.83%. For TP 90 the M2PSC-DL model attains 96.17% recall which is superior to the traditional techniques. Thus, the M2PSC-DL model attention improves the efficacy of the model to effectively analyze the sentiment of input text. Data preprocessing techniques minimize the dimensional of the input data, and reduce the computational cost. The improved PSC attention mechanism effectively captures the sequential relationships between the data and enhances the model’s ability for feature extraction.
Comparative evaluation with TP for XED dataset.
Comparative discussion of the M2PSC-DL model with the existing methods
Comparative evaluation with TP for real-world dataset.
The comparison of the M2PSC-DL model with the established methods for TP using the real-world dataset is shown in Fig. 7. At TP 90 the M2PSC-DL model gains 95.02% accuracy, this is superior to the conventional Perceptron by 6.53%, Naïve Bayes by 9.38%, BiLSTM by 7.27%, CNN-LSTM by 12.17%, and fused DCNN by 3.45%. Similarly when compared with the traditional methods the M2PSC-DL model obtains 95.11% precision, which is improved by 11.35%, 3.27%, 7.99%, 8.95%, and 6.88%. For TP 90 the M2PSC-DL model attains 95.42% recall which is superior to the traditional techniques. Thus, the M2PSC-DL model attention improves the efficacy of the model to effectively analyze the sentiment of input text. Data preprocessing techniques reduce computational cost and minimize the dimensionality of the input data. The improved PSC attention mechanism effectively captures the sequential relationships between the data and enhances the ability of the model’s feature extraction.
Comparative discussion
Table 1 shows a comparison between the M2PSC-DL model and the other established approaches using real-world and XED datasets. The discussion result expresses that the M2PSC-DL model attains increased performance and also the model rectifies the issues related to computational complexity, vanishing gradient, and generalization issues with the help of improved PSC attention and the DL model.
Conclusion
The research proposed the M2PSC-DL model for multilingual sentiment analysis, which leverages the advantages of deep learning techniques with improved multi-attention techniques. The data is encoded using the multi-metric graph embedding technique to express contextual information. Furthermore, the combination of enhanced SPC attention improves the model’s ability to extract features during training, which raises the task’s accuracy for sentiment analysis. Additionally, the research developed an improved sigmoid activation to address the problems with vanishing gradients, which aids in keeping the model clear of gradient saturations. Furthermore, when the M2PSC-DL model’s performance is contrasted with other established methodologies, the outcomes represents that the model performs significantly better than the conventional approaches, achieving 96.26% accuracy, 96.06% precision, and 96.18% recall for the XED dataset. To investigate the efficacy of hybrid methodologies for sentiment analysis in numerous fusion strategies and datasets to obtain a deeper understanding of a particular subject, such as marketing, business, or medicine is the future avenue of this research.
