CoBiCo: A model using multi-stage ConvNet with attention-based Bi-LSTM for efficient sentiment classification

Abstract

The rapid growth of social media and specialized websites that provide critical product reviews has resulted in a massive collection of information for customers worldwide. These data could contain a wealth of information, such as product reviews, market forecasting, and the polarity of sentiments. In these challenges, machine learning and deep learning algorithms give the necessary capabilities for sentiment analysis. In today’s competitive markets, it’s critical to grasp reviewer opinions and sentiments by extracting and analyzing their characteristics. The research aims to develop an optimised model for evaluating sentiments and categorising them into proper categories. This research proposes a unique, novel hybridised model that integrates the advantages of deep learning methods Dual LSTM (Long Short Term Memory) and CNN (Convolution Neural Network) with word embedding technique. The performance of different word embedding techniques is compared to select the best embedding for the implementation in the proposed model. Furthermore, a multi-convolution approach with attention-oriented BiLSTM is applied. To test the validity of the performance of the proposed model, standard metrics were applied. The outcome indicates that the suggested model achieves a significantly improved accuracy of 96.56%, superior to other models.

Keywords

Emotion classification deep learning long short term memory networks word embeddings opinion mining

1. Introduction

The technique for automatic extraction of sentiment data from unstructured text is known as sentiment analysis. Several fields, including machine learning, natural language processing (NLP), and data mining, use sentiment analysis [1]. The goal of this process is to determine the orientation of users’ reviews using word information taken from their context. Organizations employ sentiment analysis to learn about their customers’ reactions to their products and services. The extensive usage of social media platforms provides opportunities for people to get feedback on products, services, events, etc. [2]. These comments are frequently based on user interactions with the product or service, which may include positive or negative feelings about items or services. The ability to recognise unfavourable customer feedback is essential to an organization’s progress [3]. Statistical machine learning methods work excellent for simple sentiment analysis scenarios, but they are not appropriate for more complicated text classification issues [4]. These insights will aid businesses in improving their products and services, allowing them to make more profits.

The encoding of words as vectors is an essential step in sentiment analysis. Word2Vec [5] and Glove [6], based on distributed representation, are now the most frequently used word embedding technologies. Because the vector of words can capture many contextual factors to describe texts, it is highly beneficial for many tasks involving semantic similarity. According to previous research on sentiment classification [7], deep learning methods are now preferable over machine learning methods such as Support Vector Machines (SVM), Naive Bayes, Decision Trees, and Random Forests, which are commonly employed for classification.

The pioneers of deep learning-based models for sentiment classification are convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Due to the cost- effectiveness of large-scale manufacturing of efficient Graphics Processing Unit cards and the large datasets, deep learning has emerged as a superior method to machine learning in sentiment classification applications [8]. Compatibility between text representation techniques and their algorithms is critical in many text classification tasks. Therefore, it becomes feasible to provide high-performance classification with the assistance of a suitable representation of texts and a classification model.

With the advancements in cognitive computing and artificial intelligence, deep learning techniques have emerged as an increasingly popular method for sentiment analysis in several fields [9, 10]. As a result, many researchers use neural networks to automatically build feature representations from text sequences. The prominent deep learning algorithms for sentiment categorisation nowadays include RNN, CNN, and LSTM. Huang et al. [11] suggested combining bidirectional LSTM (BiLSTM) and CRF models to get superior outcomes by using both backward input and forward input characteristics. Ma and Hovy [12] proposed a combined approach using CNNs and LSTM-CRF to identify entities. On the CoNLL-2003 corpus, Chiu and Nichols [13] used the BiLSTM combined with the CNN model to achieve significant results. At the same time, Lample et al. [14] used the hybrid BiLSTM-CRF model with text-level representation for better analysis of sentiments.

Each deep learning approach has a distinctive property that is actively used to achieve a particular objective or assess datasets. The text representation technique, which translates texts into numeric form, has both benefits and disadvantages. For resolving difficulties and obtaining higher accuracy, a combined approach of the ideal text depiction method and the appropriate machine learning algorithm for emotion analysis is critical for resolving difficulties and obtaining higher accuracy. In the numeric format representation of user sentiments, any text representation approach (e.g., word embedding, character embedding) is not fully complete. A new hybrid model has been proposed to address these issues and evaluate a dataset of tweets from various airline services in India. The new hybrid model has been proposed using deep learning techniques that employ multiple representations of data.

Based on the studies above following research questions arise:

R.Q.1
How to perform hybridization of CNN with multi-layer of Bi-LSTM (DualLSTM) for classification of textual dataset.
R.Q.2
What is the effect on the performance of the proposed model on varying length of tweets categorized into following categories based on length of texts: Very Short, Short, Medium, Long, Very Long?
R.Q.3
How to test the efficacy of the proposed model for text based sentiment classification against the baseline models and advanced deep learning approaches as well?

To answer the above research question and achieve the objective the main contribution of this research are as follows:

(1)
To achieve optimised sentiment classification, three popular word embedding techniques, FasText, Keras, and Word2Vec, are applied to the datasets to extract the semantics in the texts. The effectiveness of these techniques is tested, and the best one is selected for implementation in the proposed model.
(2)
To use the advantages of different deep learning techniques, i.e., CNN and DualLSTM, followed by another layer of CNN, is applied. The integrated model not only performs proper utilisation of the capacities of CNN to extract the set of local features but also uses the advantages of bi-directional long-term capabilities and the network of short-term memories in extracting the global features of texts. The purpose of using BiLSTM is to resolve the issues that CNN is not able to consider. Therefore, integration of BiLSTM with CNN does not solve the meaning of context intact in the words in the classification of text. Therefore, the integration of BiLSTM with CNN enhances the accuracy of feature selection in text classification. The model was applied on different length of tweets of user reviews to test the applicability and accuracy of classification.
(3)
The proposed model is tested and confronted with several baseline and advanced deep learning models to confirm the viability of the proposed model.

The rest of the research paper is laid out as follows: Section 2 provides a discussion about relevant researches, particularly in the area of sentiment analysis in relevant fields. The details of techniques for conducting the research are discussed in Section 3. Section 4 contains details of the dataset and classifier models that were utilized in the experiments. The experimental findings are shown in Section 5, which is followed by a discussion in Section 6. In the last section, the article is concluded by presenting potential future directions.
2. Related work

Sentiment Analysis has received increasing attention from researchers as a result of the growth of NLP, and various implementations have been performed in word-level embedding. Within the course of these researches, it was discovered that hybrid algorithms produced improved classification results. Jiang et al. [15] created a text representation strategy using Bag-of-words based on sentiment topical terms, which included a deep neural network, context information, and sentiment topic terms that performed significantly better in Sentiment Analysis. Whereas Rezaeinia et al. [16] presented an enhanced word embeddings technique based on POS tagging methodology and lexicons of sentiment that outperformed pre-trained embeddings of words utilized in Sentiment Analysis. For improving sentiment classification performance, Liu et al. [17] presented a model in which machine learning was coupled with deep learning. The efficiency of the suggested method is demonstrated using datasets of Turkish and Chinese languages in their research. Pham et al. [18] presented an approach that combines several CNNs and is centered on embeddings of words from Glove and Word2Vec, and it performed well in tasks of aspect sentiment classification. Han et al. [19] developed a hybrid neural network approach for document representation that included user and product information and used CNNs and LSTM.

A novel deep learning architecture with hybrid CNNs and BiLSTM (H2CBi) features has been presented in [20], which combines the power of CNNs and BiLSTM. They obtained distinct vectors of features supplied as input to LSTM using two separate pre-trained word vectors. The BERT model for representing text, proposed by Devlin et al. [21], may better depict the modifying associations in texts and performs well in tests of Sentiment Analysis. By combining a neural topical approach into representations of word-level semantics, Liu et al. [22] suggested latent topic information of the given text, as well as a novel topic-based attention method for texts to look at the syntactic of words using the prospects of given topics for word association. BiLSTMs were used in another research [23] to extract dependency knowledge from vocabulary and the position of the sentence.

BiLSTM and CNN were used in this proposed hybrid strategy to produce n-gram features from text categorisation by applying multiple CNNs to given LSTM outputs.

In another approach, Zhou and Long [24] performed text classification in reviews of Chinese products using the Bi-LSTM approach. With the help of CNN for extracting features using word embedding. The classification success of combining CNN and BiLSTM was superior to the experiments’ simple CNN and BiLSTM classification performance. In another research [25], a multilayer CNN with LSTM was utilised for analyzing the sentiments of users on the given dataset with the help of the application of social media from Tibetan. With the aid of a three-layered CNN, the characteristics were retrieved using the assistance of three-layered CNN, the characteristics were retrieved. The collected features are sent into a two-layer LSTM network as input. This hybrid model based on deep learning outperformed CNN and LSTM, according to the results. A hybrid Bi-LSTM model based on attention-mechanism is proposed in another research [26]. The proposed model efficiently integrated BiLSTM and CNN for the classification of text using Word2Vec with an attention mechanism.

In addition, Kaladevi and Thyagarajah [27] utilised a single layer of CNN and a dual- stacked LSTM for analysing Indian tweets on a sequential basis. The characteristics produced from the CNN layer are fed into the LSTM network in their hybrid experiments. Like earlier hybrid research, the researchers utilised CNN to extract characteristics. Zhou et al. [28] proposed a dual language representational learning model using attention, in which the scattered semantics of several documents in both languages of target and source texts are learned. Documents are modelled using LSTM networks. They also suggested a hierarchy-based attention methodology for multilingual LSTM networks, which outperformed the benchmark data set.

Furthermore, the Convolutional Neural Network [29] method might efficiently extract local information. Also, the work has been done utilising CNN to handle the issue of orientation analysis. AF-LSTM is proposed by Tay et al. [30]. To represent context and aspect terms at the word level, Atrial Fibrillation-LSTM (AF-LSTM) employs circular correlation and convolution. Through the information fusion procedure, the targeted knowledge is then integrated into the representation of sentences. Yu et al. [31] utilised a multi-way Gated Recurrent Unit (GRU) combined with an attention-based mechanism to classify brief text in e-commerce reviews, with promising results. These approaches demonstrate that when deep learning is coupled with an attention mechanism, it can improve short text sentiment analysis.

In another research, Huang et al. [32] presented an AOA model, i.e., an Attention-over-Attention (AOA) model. With the help of two fine-grained attention mechanisms, the AOA model extracts text-level associations between aspects and contexts that allow the sentence representation to focus automatically on the elements that belong to sentences that are significantly more important for the aspect term expressions. The TDLSTM model was proposed by Tang et al. [33]. In the proposed model, the sentence is split into two sections with aspects and models of of the hidden state of both sections using dual long-short-term networks (LSTM). After that, the two portions are merged to create a specified target representation that is then put into the activation softmax algorithm for emotion categorization. Baziotis et al. [34] utilised LSTM coupled with an attention- based model to assign weights to decisive words via the attention-based mechanism, which improved the effect of keywords in emotion sentences and obtained significantly better results in emotion categorization of Twitter reviews.

Although several researchers have suggested their deep neural network architectures for sentiment classification problems based on the CNN or RNN, merely a few researchers have comprehensively examined the performance of several classification models based on deep learning methods. Dong et al. [35] proposed an analytical comparison of several sentiment classifications based on deep learning model architectures to extract useful significance for the development of the sentiment classification approach. For the identical model structure, input at the word level produced better results for classification than input based on character-level. In another study, Hu et al. [36] showed that models built using deep learning outperformed standard techniques like algorithms based on dictionary methods, the Naive Bayes, or SVM. They did not, however, disclose quantifiable performance metrics such as F-Measures or their level of accuracy. Dzikienė et al. [37] presented a performance comparison of conventional machine learning methodologies Naïve-Bayes Multinomial, Support Vector Machine, and deep learning (LSTM and CNN) approaches on the Lithuanian internet comments dataset. Features based on morphological, character information, and conventional machine learning approaches were applied. The deep learning approaches were applied to both embeddings, Word2Vec and FastText. In their experiments, LSTM outperformed SVM and Naive Byes multinomial methods. Yin et al. [38] compared the results of LSTM, CNN, and GRU sentiment classification. However, the utility of their experimental outcomes was of less use since they could not focus on enough structural differences in the model, and their conclusive result was based on only one dataset’s experimental results.

RNN was utilised by Socher et al. [39] to tackle text categorisation difficulties. A model Sentiment Treebank was introduced that surpassed all earlier techniques on various criteria when the training was performed on the fresh tree-bank, which could accurately represent the impacts of negativity. Yang et al. [40] established a hierarchy-based text categorisation strategy based on the attention-based mechanism that effectively captured the text’s main sentiment information. Huang et al. [41] suggested combining bidirectional LSTM (BiLSTM) and CRF models to produce superior outcomes by using both backward and forward input characteristics. Sentic-LSTM was developed by Ma et al. [42] to incorporate explicit and implicit information explicitly, and an extended version of Sentic-LSTM was presented to deal with a combined work involving aspect detection of target-dependent and aspect-based classification of polarity. Gu et al. [43] presented a refinement method of word vector that improved all word vectors. By improving pre-trained vectors of the word and using intensity ratings of sentiments provided by sentimental lexicons, the proposed model performed better in sentiment analysis. By training a huge corpus of text, Peters et al. [44] proposed a textual representation strategy using a deep learning model. The researchers built a text depiction framework in the English language that incorporated grammar and sentiment elements.

Dimensional sentiment classification has achieved a lot of importance because it can describe emotional states as continuous numerical values on several dimensions, such as polarity (positive–negative) and arousal (excited–calm). In another study [45], the article provides a multi-dimensional relation model for dimension score prediction that incorporates relationships between dimensions into deep neural networks. The authors constructed a Chinese three-dimensional corpus with valence-arousal-irony (VAI) ratings to test the suggested technique. Experiments indicate that including inter-dimensional relationships in the prediction process outperforms typical approaches that treat each dimension separately. Rafael [46] offered a novel way of processing text with a dimensional model employing normative databases and comparing it to different categorical ways. For each pseudo-document, an emotive thesaurus and a bag-of-words model are used to produce vectors, and t reduction strategies are then assessed for category models. The researchers demonstrate that rather than a feature space driven by statistics, the dimensional technique may be utilised to visualise emotions in a psychologically relevant space. In the categorical representation of emotions, different statistically driven dimensionality reduction strategies were compared to a dimensional representation based on psychologically supported data. The results demonstrate that among categorical techniques to classification, NMF-based categorical classification works best, and the dimensional approach is similar to NMF. Sven [47], describes EMOBANK, annotated in the Valence-Arousal-Dominance (VAD) representation format with dimensional emotion metadata. The authors annotated for both writer and reader emotions; second, ratings for categorical basic emotions as well as VAD dimensions are now accessible for a portion of the EMOBANK. The reader’s perspective provides both stronger IAA values and greater emotional ratings. We demonstrated that using standard machine learning approaches, an automatic mapping between category and dimensional formats is possible with near-human performance for the bi-representationally annotated sub corpus. For the purpose of multidimensional emotion regression, an Adversarial Attention Network was presented by Suyang [48], which automatically evaluates various emotion dimension scores for an input text. Further, a discriminator is used to learn improved word weights through adversarial training between two attention layers. To learn public word weights between two emotion dimensions, a shared attention layer is used. The proposed technique provides considerable gains in R-values on both the EMOBANK Reader’s and Writer’s multi-dimensional emotion regression tasks in all domains over the state-of-the-art baselines, according to empirical evaluation on the EMOBANK corpus. Duyu [49] proposed learning sentiment embeddings, which are sentiment-specific word embeddings. There is an issue in sentiment analysis since words with similar contexts but opposite sentiment polarity, such as positive and terrible, are mapped to adjacent word vectors. Therefore, the authors addressed this issue by using sentiment information from texts (e.g. sentences and words) that is encoded in sentiment embeddings together with word contexts. On multiple benchmark datasets for these tasks, experimental results reveal that sentiment embeddings regularly outperform context- based embeddings. In another work, Jin Wang [50], proposed a regional tree-structured CNN-LSTM model that consists of two different parts: local CNN and LSTM for predicting the VA ratings of texts. To increase performance even more, an area division technique is proposed to find task-relevant words and clauses so that structured information can be incorporated into VA prediction. The suggested method outperforms regression-, standard Neural Network-based, and structured methods from earlier studies, according to experimental results. Word embeddings, which provide continuous low- dimensional vector representations of words, are widely utilised in natural language processing. Word2vec and GloVe frequently fail to capture enough sentiment information, resulting in opposite sentiment polarity (e.g., positive and terrible) for words with similar vector representations, worsening sentiment analysis performance [51]. The researchers presented a word vector refinement model that uses real-valued sentiment intensity scores generated by sentiment lexicons to refine existing pretrained word vectors. The goal of the refinement model is to improve each word vector such that there are more semantically and sentimentally similar words (i.e., those with similar intensity scores) and fewer sentimentally dissimilar words (i.e., those with dissimilar intensity scores) in the lexicon. On the SemEval and Stanford Sentiment Treebank datasets, experimental results show that the proposed refinement model can improve both traditional word embeddings and previously proposed sentiment embeddings for binary, ternary, and fine-grained sentiment classification.

Several research studies are being conducted by a number of researchers in the field of sentiment analysis, several researches are being performed by a number of researchers. Such an interesting algorithm has been proposed by M. Parimala [52], to conduct sentiment analysis on tweets about a certain incident for a specified area at various time intervals. The proposed algorithm for risk assessment sentiment analysis (RASA) classifies tweets based on keywords provided by the network and calculates a sentiment score for each location. Support vector machine, Naive Bayes, maximum entropy, logistic regression, and random forest, etc are state-of-the-art techniques used to validate the model. The results show that the suggested RASA outperforms XGBoost by 1% in a binary class situation and by 30% in a multiclass scenario on average when compared to all other approaches. Swarna Priya [53] proposed an effective Intrusion Detection System(IDS) using deep neural networks (DNN) in the Internet of Medical Things (IoMT) environment to predict and classify probable cyberattacks. The proposed methodology, which combines classic PCA with the bio-inspired Grey-Wolf Optimizer(PCAGWO) algorithm, aids in the extraction of high-impact features from the dataset. As the hybrid PCAGWO is employed as a dimensionality technique, the classification accuracy is improved when compared to traditional models. The suggested PCA-GWO based Deep Neural Network classifier architecture beats all other commonly used classifiers on the market.

Sentiment Analysis plays a great role for the improvement in the quality of services for any organization. Using this, people not only identify possible reactions of the users but also helps in improving the system based on the reviews of consumers. Such research has been performed by Chandrasehkaran [54] for sentiment analysis for COVID-19 tweets as information shared by the people during the pandemic. The researchers used Python- based libraries for performing sentiment analysis. Authors performed analysis using the different machine learning and deep learning algorithms (BiLSTM) and TextBlob, to assess people’s sentiments when Coronavirus reaches a high level. When compared to typical machine learning models for Twitter sentiment classification, the Bi-LSTM technique shows a higher accuracy (0.87). Abuqaddom [55] presented a novel anti- vanishing learning algorithm named as OSLD (Oriented Stochastic Loss Descent). OSLD iteratively updates a random-initialized parameter by a small positive random number, scaled by a tuned ratio of the model loss, in the opposite direction of its partial derivative sign. In five benchmark models, the research compares OSLD to the stochastic gradient descent method as the basic backpropagation technique and Adam as one of the best backpropagation algorithms. The results of their experiments show that OSLD is quite competitive to Adam in small and moderate depth models, and OSLD surpasses Adam in very long models, according to experimental results. Krishnan [56] presented a model that consists of six steps for tweets documented by Mongodb. The pr-processed tweets followed by feature extraction named as cross Holoentropy and joint Holoentropy are developed. The constant term used in evaluating the weight function is optimised in order to improve the performance of classification results. A new, enhanced approach called Self Adaptable Moth Flame Optimization (SA-MFO), which is an adaptive variation of the MFO algorithm, is introduced for this optimal tuning. The proposed model outperformed other conventional models.

Based on studies on these research papers, it is understood that a variety of methods have been implemented for the classification of sentiments using deep learning techniques. The approaches and research described above are primarily employed for extracting semantic information for features from the sentence dimension while ignoring the information-based features of the dimension of the word vector. In this research, two convolution layers (ConvNet) are used with an attention-based BiLSTM layer to extract the semantic information of the local characteristics of the word vector in the word insertion dimension. This research also uses max-pooling to achieve significantly comprehensive local feature information. Section 3 discusses the proposed model in detail.

3. Proposed model

The core idea behind the proposed mode is to use two distinct deep neural networks, namely ConvNet and DualLSTM. Then a hybrid ConvNet $+$ DualLSTM $+$ ConvNet model is used to detect users’ view orientation towards the services represented in Comments in tweets as shown in Fig. 4. To reduce unnecessary information in sentences, it is required to create a base sentiment dictionary with negative, positive, and neutral texts that contain only relevant sentiment words that are included manually, as well as a set of rules based on a grammatical sentence that includes associated words and degree adverbs. Among Word2Vec, Keras and FastText; FastText embedding has shown the most optimized performance. Therefore FastText is chosen as the word embeddings method in the proposed research. Further, the hybrid model is proposed because ConvNet extracts local features from comments, DualLSTM captures contextual information from both directions as well as long-range dependencies, and the hybrid model combines the benefits of both complimentary ConvNet and DualLSTM architectures.

Figure 1.

Proposed model of sentiment classification.

In this section, a detailed explanation of the proposed COBICO model is discussed. The proposed model consists of the following layers:

3.1 3.1

Pre-processing layer;

3.2

Word embedding layer;

3.3

Dropout layer;

3.4

ConvNet layer;

3.5

Pooling layer;

3.6

DualLSTM layer;

3.7

Attention layer;

3.8

Flatten and dense layer;

3.9

Output layer.

3.1 Pre-processing layer

Raw information usually contains words or symbols that computers cannot comprehend. Therefore, it is required to clean the data and restructure it in an understandable format, and therefore, data cleaning is an important stage in NLP. The preprocessing of input texts in Fig. 1 was implemented on the dataset to obtain excellent classification performance in categorising text review data. Before the word-based representation phase (Word Embedding), these preprocessing steps are used to reduce unwanted content and convert the dataset into a usable form. Initially, all text in the dataset is converted to lowercase. The information of links in the input dataset is substituted using the “URL”. Further, irregular spacing between words is trimmed to a single space. Punctuations, numerals, and unformatted characters in tweets were eliminated. Grammatical Error Correction (GEC) by NLP-Progress is used to correct all grammatical mistakes like spelling, grammar, and punctuation.

3.2 Word embedding layer

The proper representation of texts is one of the most essential phases in the text classification process. Initially FastText, Keras and Word2Vec Embedding are used in this research.

Following steps are executed for converting input text to Word Vector Representation:

•
The input for the process is a text denoted as $T$ having length m of tokens (words) where every token is mapped to its matching word vector representation $r$ .
•
The text is rearranged in a concatenated sequence of the word embedding,
•
$T=[r_{1};r_{2},r_{3},\ldots,r_{m}]$ , where $r_{i}$ is the embedding of the word for the $i^{\text{th}}$ word, which is projected to a given vector $r_{i}\in S^{d}$ .
•
For each text input, a matrix of sentence $\in S^{dxm}$ is constructed, where d represents the dimension of embedding and has been defined as the length of the sentence.
•
The matrix of the sentence is now passed to the CNN layers for further processing.

Figure 2.
(a) ConvNet. (b) DualLSTM.

3.3 Dropout layer

To minimize overfitting in our proposed model, the dropout approach is utilized in which some units (neurons) were temporarily deleted from the network models, along with their incoming and outgoing connections. Dropout inhibits model units from over-adapting to training data, resulting in improved generalization on the validation set. In this proposed model dropout rate is set to 0.03.

3.4 ConvNet layer

As illustrated in Fig. 2(a), the architecture of a Convent comprises an input layer, an output layer, and five different hidden layers. The input layer accepts a textual message that has been padded to a predetermined length of words, followed by a word embedding layer. The attention layer follows the word embedding layer, to extract high-level feature vectors. The attention layer is a sub-unit made up of context vectors that line up the source input with the goal output. Figure 4(a) shows an illustration of the attention mechanism in the upper right corner. The SpatialDropout1D (for dropout) layer uses feature vectors as inputs derived from the attention layer. On top of the dropout layer, a ConvNet layer (bottom right) with convolution filters and a ReLU activation function is applied. Finally, the probability distribution over three sentiment orientations is computed using a fully-connected dense layer comprised of a softmax function and three units (positive, neutral, negative).

•
The ConvNet Module uses a convolutional based operation “”, between the matrix of text $T\in S^{dxm}$ .
•
A filtering matrix $P\in S^{mxk}$ , which provides an output as matrix $O P$ , is termed as a features map.
•
The features map is learned as per the following equation:

$\displaystyle O_{ab}=(TP)=f\left(C\circ t_{a:a+k-1,b+d-1}+y\right)$ (1)

where $y$ represents the biasing vector, $C$ represents the weighted matrix, and f is the convolutional operation’s nonlinearity activation function. A nonlinearity function Rectified Linear Unit (ReLU) is applied here for speeding up the process of training and validation and provides better results.
3.5 Pooling layer

The pooling layer receives the output of the convolutional layer. The convolution layer minimizes the content representation even further by selecting the maximum value obtained from a pool of values and eliminating the irrelevant data. The procedure of pooling is represented as follows:

$\displaystyle U_{a,b}=\max\left(O_{a+b-1,b+d-1}\right)$ (2)

3.6 DualLSTM layer

To make precise predictions, the model must understand the long-duration dependence on text data. Since the convolutional layer has lacking this capacity, that’s why DualLSTM is used to incorporate this component into the proposed model. The model learns from the data in both directions i.e. left-to-right and right-to-left using DualLSTM. As a result, the DualLSTM layer enhances classification accuracy. In the Bidirectional LSTM, there are two autonomous LSTMs i.e. ahead LSTM and backward LSTM.

The hidden state “ $g$ ” is computed by forwarding LSTM using the hidden previous state “ $g_{t-1}$ ” and the input vector “ $z_{t}$ ” whereas the backward LSTM uses the hidden future state “ $g_{t+1}$ ” and the vector input “ $z_{t}$ ” to compute the hidden state $g$ ”.

Finally, using the following Eq. (3), both directions vectors (backward and ahead) have been merged as the last state(hidden) in the DualLSTM to create a series of output for vectors(hidden) $G=[g_{1};g_{2};g_{3};$ $\ldots;g_{n}]$ .

$\displaystyle\overleftrightarrow{g}=\overrightarrow{g}\oplus\overleftarrow{g}$ (3)

The following equations are used to implement the DualLSTM cell.

Forward LSTM:

$\displaystyle X=\begin{bmatrix}g_{t-1}\\ z_{t}\end{bmatrix}$ $\displaystyle\sigma fp_{t}=(W_{fp}\cdot X+y_{fp})$ (4) $\displaystyle\sigma ip_{t}=(W_{ip}\cdot X+y_{ip})$ (5) $\displaystyle\sigma op_{t}=(W_{op}\cdot X+y_{op})$ (6) $\displaystyle c_{pt}=f_{pt}\odot c_{pt-1}+ip_{t}\odot\tanh(W_{cp}\cdot X+y_{cp})$ (7) $\displaystyle g_{t}=op_{t}\odot\lambda(c_{pt})$ (8)

Backward LSTM:

$\displaystyle X=\begin{bmatrix}g_{t+1}\\ z_{t}\end{bmatrix}$ $\displaystyle\sigma fp_{t}=(W_{fp}\cdot X+y_{fp})$ (9) $\displaystyle\sigma ip_{t}=(W_{ip}\cdot X+y_{ip})$ (10) $\displaystyle\sigma op_{t}=(W_{op}\cdot X+y_{op})$ (11) $\displaystyle c_{pt}=f_{pt}\odot c_{pt-1}+ip_{t}\odot\tanh(W_{cp}\cdot X+y_{cp})$ (12) $\displaystyle g_{t}=op_{t}\odot(c_{pt})$ (13)

where weight matrices are represented by $W_{fp}$ , $W_{ip}$ , $W_{op}$ and $Y_{fp}$ , $Y_{ip}$ , $Y_{op}$ is associated biases, which are the input gate parameters, forget gate parameters, and output gate parameters respectively.

$\sigma$ represents sigmoid activation function.

Element-wise multiplication is denoted by $\odot$ .

$g_{t}$ is representing the vector in hidden state and the input vector is denoted by $z_{t}$ .

$\lambda$ is a tangent function.

The current state is represented by $cp_{t}$ , the previous state is represented by $cp_{t-1}$ and future state is represented by $cp_{t+1}$ .

The DualLSTM network is used here to identify the orientation of users’ opinions, as shown in Fig. 2(b). The ConvNet and Max Pooling layers are replaced by DualLSTM and Flatten layers, respectively, in this network design, which differs somewhat from the one illustrated in Fig. 2(a). An illustration of DualLSTM architecture and the attention mechanism is shown on the right column of Fig. 2(b). The output provided from the DualLSTM layer is again supplied as input to ConvNet Layer and the resultant is provided as input to the Attention Layer.

3.7 Attention layer

There are certain words in a statement that are irrelevant for polarity detection but on the other hand, some words are decisive. The attention-based mechanism is used to provide attention to such informative content. Therefore this layer was created to automatically extract the significant terms.

Equation (14) is used to calculate the word significance vector $e_{t}$ . It uses the whole DualLSTM hidden states $g$ , as input to the attention layer. $W$ stands for weight, $y$ for bias, and $\tanh$ for activation function.

$\displaystyle e_{t}=\tanh(W_{g}g_{t}+y_{h})$ (14)

The softmax function is then used to determine the normalized word weight $n_{t}$ with help of Eq. (15).

$\displaystyle n_{t}=\text{softmax}(e_{t})$ (15)

Finally, to create the output of the attention mechanism, Eq. (16) is used to calculate a weighted summation.

$\displaystyle cp_{t}=\sum_{t=1}^{m}n_{t}g_{t}$ (16)

The attention layer output $cp=[cp_{1},$ $cp_{2},$ $cp_{3},$ $\ldots,$ $cp_{m}]$ is provided as input for the following layer.

3.8 Flatten and dense layer

This layer converts the matrix of context retrieved from the preceding layer to a vector context that provides the input for the classification layer’s final stage. The following Eq. (17) is used to execute the flatten layer operation.

$\displaystyle f=\left[cp_{1}*cp_{2}*cp_{3}*\ldots*cp_{T}\right]$ (17)

3.9 Output layer

This is the last phase of the proposed method for resolving the class of sentiments in terms of negativity, positivity, or neutrality. The output of the flatten layer is provided to a softmax activation function, that calculates the likelihood of the sentiment classification. The final output is computed as:

$\displaystyle O_{j}=\sum_{i=1}^{m}w_{i}*f_{i}+y$ (18)

4. Data acquisition and experimental setup

This section provides details about the data acquisition and experimental setups. Section 4.1 discussed about acquisition of data and their categorization. Section 4.2 provides information about hardware and software environment for implementation. Hyperparamters details are discussed in Section 4.3 and in last Section 4.4 evaluation and validation parameters are discussed.

4.1 Data acquisition

The proposed model is implemented on the dataset gathered from user tweets on Twitter for Indian Airlines between 1st June 2021 and 31st August 2021, there were 24,235 tweets in the dataset. The dataset is collected for this duration because the domestic flights started after the second wave of COVID-19. A Rest API-based tool named Tweepy is used. Three sentiment classes are represented i.e. positive, negative, and neutral. The dataset is divided into two distinct subsets i.e. Traning Set and Validation Set. 75% of the tweets (18176 tweets) are grouped as Training Set and 25% (6059) tweets are grouped into Validation Set.

Figure 3 shows the graphical representation of review classification in different categories.

Table 1
Categorization details of tweets dataset

Dataset size	Positive	Negative	Neutral
24235	6544	9936	7755

Table 2

Categorization of training set

Dataset size	Positive	Negative	Neutral
18176	4908	7452	5816

Table 3

Categorization of validation set

Dataset size	Positive	Negative	Neutral
6059	1636	2484	1939

Figure 3.

Classified number of tweets in the dataset.

Tables 1, 2, and 3 provide the total number of tweets in the dataset, tweets in the training set, and validation set respectively.

4.2 Experimental setup

Experiments were carried out using the services of Google. Google provides a cloud- based service for file storage named Google-Drive, which was utilized to store our dataset. For this research, employed the Google Colab system is employed, which is a free cloud-based service provided by Google for Machine Learning developers which is based on Jupyter notebook for performing machine learning research using Python. Keras API with backend Tenserflow is used here for the execution of experiments for the proposed model. Experiments was performed on 64 bit Windows 11 operating system with 6 GB RAM and 3.50 GHz processor which provided better results.

4.3 Parameter settings

To get high model performance, hyper-parameter optimization must be implemented. Hyper-parameter settings are also used to avoid the issue of underfitting and overfitting. The randomized search technique was used to enhance accuracy. Following hyper- parameters were finalized to obtain the optimized performance for the CoBiCo model as shown in Table 4.

Table 4
Hyperparameter settings for the proposed model

Parameters	Values
Size of Kernel	5
Dimension (Embedding)	Keras (300)
Filter Size	128
DualLSTM Output Size	64
Regularization function	L2
Activation	ReLU
Weight Constraints	Kernel Constraints (max norm is 3)
Batch Size	128
No of Epoch	50
Batch Normalization	Yes
Learning Rate	0.01
Optimizer	Adam

4.4 Performance metrics

To appraise the performance of the suggested CoBiCo model, a standard performance evaluation is carried out. The experiments were carried out on the Indian Airlines dataset. The Confusion Matrix is used as a performance metric for the evaluation of the proposed model. The assessment parameters of the confusion matrix are based on four fundamental inputs: False Positive( $F_{P}$ ), False Negative( $F_{N}$ ), and True Positive( $T_{P}$ ), True Negative( $T_{N}$ ).

False Positive( $F_{P}$ ) $=$ The forecasted values were inaccurate in predicting a positive outcome. Negative values are anticipated to be positive.

False Negative( $F_{N}$ ) $=$ Positive outcomes are projected to be negative.

True Positive( $T_{P}$ ) $=$ Predicted values were correct in predicting that the outcome would be positive.

True Negative( $T_{N}$ ) $=$ Predicted values were accurately predicted as a negative value.

Based on these parameters following performance metrics are Precision( $\alpha$ ), Recall( $\beta$ ), F-Measure( $\gamma$ ), Accuracy( $\mu$ ). These parameters are defined below:

$\displaystyle\alpha=\frac{T_{p}}{T_{p}+F_{P}}$ (19) $\displaystyle\beta=\frac{Tp}{T_{p}+F_{N}}$ (20) $\displaystyle\gamma=\frac{2}{\frac{1}{p}+\frac{1}{r}}=\frac{2(T_{P}+F_{P})*(T_% {P}+F_{N})}{T_{p}}$ (21) $\displaystyle\mu=\frac{T_{P}+F_{P}}{T_{P}+F_{P}+F_{N}+T_{N}}$ (22)

5. Experimental results and analysis

This section provides the results and performance comparisons in detail. First of all Section 5.1 discussed about different word embedding peroformance is analysed. Based on the performance of the word embeddings the best one is selected for further implementation of the proopsed model. Section 5.2 shows the comparison of proposed model with several deep learning methods in detail. Furthermore the analysis was performed on different length of tweets to test whether the proposed model provides optmiized results or not. And finally Section 5.4 shows the efficiency of the model based on loss and accuracy of training and validations.

5.1 Comparison of word embeddings

The proposed CoBiCo Model has been used for the classification of sentiments. The overall performance of the word embedding is assessed using a weighted average of $\alpha$ , $\beta$ , and $\gamma$ .

Table 5
Performance of word embeddings

Word	Weighted			Ace(%)
Embedding	$\alpha$	$\beta$	$\gamma$	( $\mu$ )
Word2Vec	92.36	92.48	92.42	92.83
Kerns	94.56	94.14	94.35	95.32
FastText	94.86	94.92	94.89	95.72

Experiments were performed for evaluating the overall efficiency of classification on word2vec, Keras and the FastText embedding methods on the used dataset. The performance of efficiency of different word embeddings is represented in Table 5. Word2Vec embedding was observed to be less efficient and lower accuracy as compared to other two methods. the FastText method of embedding attained optimized performance. Compared with Word2Vec, Keras embedding has shown improved efficiency by 2.2%, 1.66%, 1.93% and 2.49% on $\alpha$ , $\beta$ , $\gamma$ and $\mu$ . Furthermore, the efficiency of Word2Vec is compared with FastText Embedding and the improvement is observed to be 2.5%, 2.44%, 2.47%, 2.89% on precision, recall, F-measure, and accuracy parameters respectively. In comparison with Keras and FastText, it is noticed that the FastText method is performing better than Keras Embedding on the given dataset in the text, if they are used as mathematical symbols. Punctuation marks are used at the end of equations as if they appeared directly in the text.

5.2 Performance comparison of several deep learning methods

The classification efficiency of the proposed CoBiCo model has been compared with other deep learning methods such as CNN, CNN-BiLSTM, BiLSTM, and BiLSTM-Attention. Since FastText Embedding has shown the most optimized performance over the other two embeddings, therefore the execution of the proposed classification model is performed using FastText embedding. The observations are shown in Table 6. The observations depict that the proposed model surpassed other deep learning models in view of increased performance as shown in Table 6:

Table 6
Performance comparison of deep learning methods

Models with FastText	Class 1			Class 2			Class 3			Acc% ( $\mu$ )
	$\alpha$	$\beta$	$\gamma$	$\alpha$	$\beta$	$\gamma$	$\alpha$	$\beta$	$\gamma$
CNN	84.59	85.63	85.11	85.98	86.32	86.15	83.68	84.10	83.89	85.63
BiLSTM	87.32	87.79	87.55	89.00	89.62	89.31	83.21	84.01	83.61	88.91
CNN-BiLSTM	88.76	89.10	88.93	81.68	82.15	81.91	85.25	86.01	85.63	89.36
BiLSTM-Attention	86.32	85.98	86.15	92.61	92.83	92.72	89.63	89.01	89.32	91.89
(Stacked) BiGRU	84.22	83.15	83.68	88.12	87.56	87.84	86.25	85.15	85.70	86.65
Ensemble CNN-GRU	85.12	84.32	84.72	86.25	85.55	85.90	84.35	82.19	83.26	84.32
Proposed CoBiCo	93.62	93.88	93.75	96.88	96.65	96.76	94.01	94.56	94.28	96.56

Figure 4.

Performance comparison of CNN and proposed CoBiCo.

CNN vs CoBiCo: In this experiment, emotion classification of airlines review text using proposed model is compared to that of a single layer CNN model (Fig. 4). In comparison to the suggested Attention-based CoBiCo model, a single layered CNN model produced unsatisfactory outcomes ( $\alpha$ : 84.59%, $\beta$ : 85.63, and $\gamma$ : 85.11% for positive classification, $\alpha$ : 85.98%, $\beta$ : 86.32 and $\gamma$ : 86.15% for neutral classification, $\alpha$ : 83.68%, $\beta$ : 84.10 and $\gamma$ : 83.89% for negative classification and $\mu$ : 85.63% accuracy). The reason for CNN’s poor performance is that it is unable to retain the text’s sequencing order required for the text categorization problem for keeping record of the details of ordering to provide improved classification results [57].

Figure 5.

Performance comparison of BiLSTM and proposed CoBiCo.

Bi-LSTM vs. Proposed CoBiCo: The Bi-LSTM model performance was analyzed and comparison is performed with the proposed CoBiCo model in the next experiment (Fig. 5). The Bi-LSTM delivers lower outcomes ( $\alpha$ : 87.32%, $\beta$ : 87.79%, $\gamma$ : 87.55% for positive classification, $\alpha$ : 89.00%, $\beta$ : 89.62, $\gamma$ : 89.31% for neutral classification, $\alpha$ : 83.21%, $\beta$ : 84.01, $\gamma$ : 83.61% for negative classification, $\mu$ : 88.91% accuracy) than the suggested model. The reason for degradation in the performance of the Bi-LSTM model is that due to double LSTM cell requirements the process is costly in terms of memory utilization [58].

Figure 6.

Performance comparison of CNN-BiLSTM and proposed CoBiCo.

Figure 7.

Performance comparison of BiLSTM-attention and proposed CoBiCo.

Figure 8.

Performance comparison of self organizing maps and proposed CoBiCo.

CNN-BiLSTM vs. Proposed CoBiCo: Furthermore, the research was performed to compare the performance results obtained from the proposed CoBiCo to the CNN- BiLSTM model (Fig. 6). In comparison to the suggested technique, the CNN-BiLSTM delivers poor outcomes ( $\alpha$ : 88.76%, $\beta$ : 89.10%, $\gamma$ : 88.93% for positive classification, $\alpha$ : 81.68%, $\beta$ : 82.15, $\gamma$ : 81.91% neutral classification, $\alpha$ : 85.25%, $\beta$ : 86.01, $\gamma$ : 85.63% for negative classification and $\mu$ : 89.36% accuracy). When compared to the suggested model, BiLSTM performs poorly because it lacks the attention mechanism [59].

Figure 9.

Performance comparison of restricted boltzmann machines and proposed CoBiCo.

Figure 10.

Performance comparison of overall accuracy.

Figure 11.

Length of reviews for positive reviews.

BiLSTM-Attention vs. Proposed CoBiCo: In the next phase of comparison the experiments were performed for comparison of results of BiLSTM-Attention model with proposed CoBiCo Model (Fig. 7). Comparing to the suggested technique, the BiLSTM-Attention delivers poor outcomes ( $\alpha$ : 86.32%, $\beta$ : 85.98%, $\gamma$ : 86.15% for positive classification, $\alpha$ : 92.61%, $\beta$ : 92.83, $\gamma$ : 92.72% neutral classification, $\alpha$ : 89.63%, $\beta$ : 89.01, $\gamma$ : 89.32% for negative classification and $\mu$ : 91.89% accuracy). It was observed that the Attention mechanism has the disadvantage of having to pay attention to all words on the side of source for each target text, which is costly and makes translating longer sequences unfeasible [60].

Figure 12.

Length of reviews for neutral reviews.

Figure 13.

Length of reviews for negative reviews.

Figure 14.

Categorization of a dataset based on varying length.

(Stacked) BiGRU vs. Proposed CoBiCo: The experiments were performed for comparison of results of (Stacked) BiGRU model with proposed CoBiCo Model (Fig. 8). Comparing to the suggested technique, the Self Organizing Maps delivers less effective outcomes ( $\alpha$ : 84.22%, $\beta$ : 83.15%, $\gamma$ : 83.68% for positive classification, $\alpha$ : 88.12%, $\beta$ : 87.56, $\gamma$ : 87.84% neutral classification, $\alpha$ : 86.25%, $\beta$ : 85.15, $\gamma$ : 85.70% for negative classification and $\mu$ : 85.70% accuracy). It was observed that the (Stacked) BiGRU has the disadvantage of having to adjust varrying length of information weights which results in expensive calculation and lesser accuracy [61].

Table 7

Tweets categories and their count

Length category	No. Of Tweets
Very Short	4524
Short	3124
Medium	7220
Long	5946
Very Long	3421

Table 8

Rule sets for length based tweet categorization

Rule 1	IF Tweet_Length $>$ 0 AND Tweet_Length is $\leqslant$ 40, Then Category is Very Short
Rule 2	IF Tweet_Length $\geqslant$ 41 AND Tweet_Length is $\leqslant$ 80, Then Category is Short
Rule 3	IF Tweet_Length $\geqslant$ 81 AND Tweet_Length is $\leqslant$ 120, Then Category is Medium
Rule 4	IF Tweet_Length $\geqslant$ 121 AND Tweet_Length is $\leqslant$ 160, Then Category is Long
Rule 5	IF Tweet_Length $\geqslant$ 161 AND Tweet_Length is $\leqslant$ 200, Then Category is Very Long

Table 9

Performance evaluation of different models on varying review length

Models with FastText	Very Short (%)			Acc %
	$\alpha$	$\beta$	$\gamma$	( $\mu$ )
(a)
CNN	81.56	81.88	81.72	76.59
BiLSTM	88.51	87.99	88.25	81.32
CNN-BiLSTM	78.65	78.11	78.38	83.96
BiLSTM-Attention	88.56	88.96	88.76	88.01
Proposed CoBiCo	96.94	96.69	96.81	96.90

Models with FastText	Short (%)			Acc%
	$\alpha$	$\beta$	$\gamma$	( $\mu$ )
(b)
CNN	79.43	78.86	79.14	75.15
BiLSTM	80.15	79.63	79.89	80.22
CNN-BiLSTM	81.56	81.98	81.77	82.16
BiLSTM-Attention	90.32	91.01	90.66	87.16
Proposed CoBiCo	96.67	96.01	96.34	96.75

Models with FastText	Medium (%)			Acc%
	$\alpha$	$\beta$	$\gamma$	( $\mu$ )
(c)
CNN	71.89	71.54	71.71	74.11
BiLSTM	86.56	85.98	86.27	79.56
CNN-BiLSTM	83.32	82.65	82.98	82.55
BiLSTM-Attention	87.36	87.52	87.44	86.32
Proposed CoBiCo	95.98	95.56	95.77	96.66

Models with FastText	Medium (%)			Acc%
	$\alpha$	$\beta$	$\gamma$	( $\mu$ )
(d)
CNN	68.69	67.95	68.32	69.31
BiLSTM	83.22	83.56	83.39	82.66
CNN-BiLSTM	79.53	78.98	79.25	80.15
BiLSTM-Attention	84.11	84.35	84.23	85.01
Proposed CoBiCo	95.55	95.78	95.66	96.34

Models with FastText	Very Long (%)			Acc%
	$\alpha$	$\beta$	$\gamma$	( $\mu$ )
(e)
CNN	65.32	66.01	65.66	66.32
BiLSTM	78.56	78.11	78.33	79.01
CNN-BiLSTM	76.32	76.44	76.38	77.01
BiLSTM-Attention	82.11	82.34	82.22	83.21
Proposed CoBiCo	95.11	95.32	95.21	96.34

Ensemble CNN-GRU vs. Proposed CoBiCo: In the final phase of comparison the experiments were performed for comparison of results of Ensemble CNN-GRU model with proposed CoBiCo Model (Fig. 9). Comparing to the suggested technique, the BiLSTM-Attention delivers poor outcomes ( $\alpha$ : 85.12%, $\beta$ : 84.32%, $\gamma$ : 84.72 for positive classification, $\alpha$ : 86.25%, $\beta$ : 85.55, $\gamma$ : 85.90% neutral classification, $\alpha$ : 84.35%, $\beta$ : 82.19, $\gamma$ : 86.26% for negative classification and $\mu$ : 83.26% accuracy). It was observed that the Ensemble CNN-GRU has the limitation of higher error rate at every level. High rate of error resulted in poor accuracy and also this method proved to be very costly in terms of computation and prediction time [62].

The observations in Table 6 show that except for the proposed model none of the models shows consistent performance for all classifications. The experimental outcomes present that the BiLSTM model exhibits much lower performance for negative class classification whereas CNN-BiLSTM shows degraded performance for neutral class classification. The BiLSTM-Attention model shows relatively lower performance for positive class classification. Finally the consistent and highest level of performance of proposed model with $\alpha$ : 93.62%, $\beta$ : 93.88%, $\gamma$ : 93.75% for positive classification, $\alpha$ : 96.38%, $\beta$ : 96.65, $\gamma$ : 96.76% neutral classification, $\alpha$ : 94.01%, $\beta$ : 94.56, $\gamma$ : 94.28% for negative classification and $\mu$ : 96.56% accuracy).

The overall accuracy of classification of different deep learning methods is shown in Fig. 10. CNN method achieves 85.63% accuracy; BiLSTM shows 88.91% accuracy whereas a CNN-BiLSTM and BiLSTM-Attention method achieves 89.36% and 91.89% accuracy respectively. The proposed CoBiCo model outperforms all other models in the accuracy with 96.56%.

5.3 Comparison of performance and validation on varying lengths of reviews

It is also worth noting that the dataset includes comments of varying lengths. The shortest review is one word long, while the largest is 200 words long. The average length of review for the whole corpus is 21.02 words. The class-wise length description is depicted in Figs 11–13.

Figure 15.

Performance comparison of the proposed model on different varying length. (a) Very short length. (b) Short length. (c) Medium length. (d) Long length. (e) Very long length.

Figure 11 represents the length variation for positive class reviews. Figures 12 and 13 shows length variation in neutral and negative class tweets respectively. The negative reviews are observed to be having a higher length whereas the positive reviews have the lowest length. The average length for positive reviews is 14.05 words and the average length of neutral tweets is 13.55 words. The negative reviews are having the highest average length of 24.21 words.

Based on the observations in the varying length of the tweets, further analysis is performed to investigate the efficiency of the proposed model on the distinct datasets extracted from the original dataset. The Gaussian function is used to distribute the dataset into five different categories of tweet-length as shown in Fig. 14. Table 7 shows the number of tweets based on the varying lengths.

Based on observations from Fig. 14, the following fuzzy rules are designed for analysis of the performance of the proposed model. The rules are described in Table 8.

Figure 16.

Comparison of the proposed model for accuracy and loss using ROC curves. (a) Training accuracy. (b) Training loss. (c) Validation accuracy. (d) Validation loss.

The performance evaluation of different deep learning models is shown in Table 9(a)–(e). In Table 9(a) CNN model shows the least performance among all with a performance accuracy of 76.59%, the proposed model performs excellently with an accuracy of 96.90%. In Table 9(b) and 9(c), the proposed model shows an accuracy of 96.75% and 96.66% for short and medium length tweets respectively. Tables 9(d) and 9(e) signify that the performance of the CNN model shows significant degradation in classification for longer length reviews as the accuracy reduces to 69.31% and 66.32% for long and very long length reviews classification. The confusion matrix parameters precision, recall, and f-measure also degrade significantly with 65.32%, 66.01and 65.66% respectively for the very long length category of tweet reviews. Table 9(d) and 9(e) demonstrates that the BiLSTM model performs better than CNN-BiLSTM for classification of longer length category of reviews, whereas Table 9(a)–(c) clearly shows that CNN-BiLSTM performs better than the BiLSTM model. The BiLSTM-Model shows consistent performance for all length categories of reviews.

Figure 15(a)–(e) shows the performance comparison of several deep learning methods based on the observations for varying lengths of reviews. The observations show the consistency of the proposed model in all categories of lengths and state the effectiveness of the model.

5.4 Analysis of accuracy and loss using ROC curve

To verify the correctness and feasibility of the proposed model, a comparison of Training Accuracy and Training Loss was performed among the deep learning model. A receiver operating characteristic (ROC) curve graph [49] is used to illustrate the evaluation results. In Fig. 16(a), of ROC Curve graph X-axis denotes No. of Epochs and Y-axis denotes training accuracy of the model. The area under the curve is a number that goes from 0 to 1, ROC values near 1 imply that the model is performing well. As observed in the figure, the performance of the proposed model is better than other models. Figure 16(b) shows the training loss of all the models considered for comparison. The proposed model shows promising results in terms of training loss.

In Fig. 16(c) and 16(d), it is noticeable that the proposed model exhibited better performance than the other deep learning models. The observations obtained from ROC curves in Fig. 16(a) to 16(d) demonstrate that the proposed model achieves higher accuracy and lowest losses in Training and Validation.

6. Conclusion

This research provided a sentiment analyzer for extracting people’s views on Indian Airlines’ services stated on social media. Word2Vec, Keras, and FastText were the three pre-trained word embedding approaches that were trained and tested, and FastText performed the best in terms of word vectorization accuracy. Further experiments on deep learning models were implemented using FastText embeddings as it has shown the highest accuracy. The research findings revealed that the proposed CoBiCo model worked admirably on the obtained dataset, even beating the baseline classifier. The proposed model achieved an F-measure of 93.75% for the classification of positive reviews, 96.76% for neutral, and 94.28% for negative reviews classification. The overall accuracy achieved from the proposed model is 96.56%, which significantly outperforms other deep learning models. The results show that other models did not perform consistently for positive, negative, and neutral classes and the proposed model proved to be consistent. Considering the varying nature in the length of reviews, the experiments were further performed on the dataset using lower review length, medium review length, and higher review length. The findings of experiments show the effectiveness and validity of the proposed model since the model performed consistently with an F-measure of 96.17% for lower review length, 96.34% for medium review length, and 96.01% for higher review length. The proposed model achieves the highest accuracy and lowest loss in training and validation. The outcomes confirmed the utility of the proposed CoBiCo model as a viable option for addressing users’ emotions expressed for services on social media.

References

Pang

Lee

. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proc. 42nd Annu. Meeting Assoc. Comput. Linguistics. 2004; 271-278.

Hemmatian

Sohrabi

. A survey on classification techniques for opinion mining and sentiment analysis. Artif. Intell. Rev. 2019; 52(3): 1495-1545.

Zhu

Zhang

. Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics. J. Marketing. 2010; 74(2): 133-148.

Huang

Chen

Zheng

Dong

. Deep sentiment representation based on CNN and LSTM. In: Proc. Int. Conf. Green Informat. 2017; 30-33.

Mikolov

Chen

Corrado

Dean

. Efficient estimation of word representations in vector space. In: Proc. Int. Conf. Learn. Represent. (ICLR). 2013; 1-12.

Pennington

Socher

Manning

. Glove: Global vectors for word representation. In: Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP). 2014; 1532-1543.

Rehman

Malik

Raza

Ali

. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimedia Tools Appl. 2019; 78(18): 26597-26613.

Chollet

. Deep Learning With Python. Shelter Island: Manning; 2017.

Chen

Herrera

Hwang

. Cognitive computing: Architecture, technologies and intelligent applications. IEEE Access. 2018; 6: 19774-19783.

10.

Hwang

Chen

. Big_Data Analytics for Cloud, IoT and Cognitive Computing. Hoboken, NJ, USA: Wiley; 2017.

11.

Huang

. Bidirectional LSTM-CRF models for sequence tagging. Aug. 2015, arXiv: 1508.01991.

12.

Hovy

. End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. Mar. 2016, arXiv: 1603.01354.

13.

Chiu

JPC

Nichols

. Named entity recognition with bidirectional LSTM-CNNs. Nov. 2015, arXiv: 1511.08308.

14.

Lample

Ballesteros

Subramanian

Kawakami

Dyer

. Neural architectures for named entity recognition. Mar. 2016, arXiv: 1603.01360.

15.

Jiang

Gao

Chen

. Study on text representation method based on deep learning and topic information. Computing. 2019; 102(3): 623-642.

16.

Rezaeinia

Rahmani

Ghodsi

Veisi

. Sentiment analysis based on improved pre-trained word embeddings. Expert Syst. Appl. 2019; 117: 139-147.

17.

Liu

Deng

Chen

. A hybrid method for bilingual text sentiment classification based on deep learning. In: Proc. 17th IEEE/ACIS Int. Conf. Softw. Eng., Artif. Intell., Netw. Parallel/Distrib.Comput. (SNPD). 2016; 93-98.

18.

Pham

. Exploiting multiple word embeddings and one-hot character vectors for aspect-based sentiment analysis. Int. J. Approx.Reasoning. 2018; 103: 1-10.

19.

Han

Bai

. Augmented sentiment representation by learning context information. Neural Comput. Appl. 2019; 31(12): 8475-8482.

20.

Wint

Manabe

Aritsugi

. Deep learning based sentiment classification in social network services datasets. In: Proc. IEEE Int. Conf. Big Data, Cloud Comput., Data Sci. Eng. (BCD). 2018; 91-96.

21.

Devlin

Chang

Lee

Toutanova

. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol. 2019; 4171-4186.

22.

Liu

Cao

Yin

. Bi-level attention model for sentiment analysis of short texts. IEEE Access. 2019; 7: 119813-119822.

23.

Guo

Jin

Duan

. An integrated neural model for sentence classification. In: Proc. Chin. Control Decis. Conf. (CCDC). 2018; 6268-6273.

24.

Zhou

Long

. Sentiment analysis of text based on CNN and bi-directional LSTM model. In: Proc. 24th Int. Conf. Autom. Comput. (ICAC). 2018; 1-5.

25.

Sun

Tian

Liang

. Tibetan micro-blog sentiment analysis based on mixed deep learning. In: Proc. Int. Conf. Audio, Lang. Image Process. (ICALIP). 2018; 109-112.

26.

Zheng

. A hybrid bidirectional recurrent convolutional neural network attention-based model for text classification. IEEE Access. 2019; 7: 106673-106685.

27.

Kaladevi

Thyagarajah

. Integrated CNN- and LSTM DNN-based sentiment analysis over big social data for opinion mining. Behav Inf Technol. 2019; 1-9.

28.

Zhou

Wan

Xiao

. Attention-based LSTM network for cross-lingual sentiment classification. In: Proc. Conf. Empirical Methods Natural Lang. Process. 2016; 247-256.

29.

Lawrence

Giles

Tsoi

Back

. Face recognition: A convolutional neural-network approach. IEEE Trans Neural Netw. 1997; 8(1): 98-113.

30.

Tay

Tuan

Hui

. Learning to attend via word-aspect associative fusion for aspect-based sentiment analysis. In: Proc. 32nd AAAI Conf. Artif. Intell. (AAAI). 2018; 5956-5963.

31.

Zhao

Wang

. Attention-based bidirectional gated recurrent unit neural networks for sentiment analysis. In: Proc. 2nd Int. Conf. Artif. Intell. Pattern Recognit. Cham, Switzerland: Springer. 2019; 67-78.

32.

Huang

Carley

. Aspect level sentiment classification with attention-over-attention neural networks. In: Proc. Conf. Social Comput. 2018; 197-206.

33.

Tang

Qin

Feng

Liu

. Effective LSTMs for target-dependent sentiment classification. In: Proc. COLING 26th Int. Conf. Comput. Linguistics. 2016; 3298-3307.

34.

Baziotis

Pelekis

Doulkeridis

. DataStories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proc. 11th Int. Workshop Semantic Eval. 2017; 747-754.

35.

Seo

Kim

Kang

. Comparative study of deep learning-based sentiment classification. IEEE Access. 2020; 8: 6861-6875.

36.

Rui

Zeng

Chen

Fan

. Text sentiment analysis: A review. In: Proc. IEEE 4th Int. Conf. Comput. Commun. (ICCC). Dec. 2018; 2283-2288.

37.

Kapočiūtė-Dzikienė

Damaševičius

Woźniak

. Sentiment analysis of lithuanian texts using traditional and deep learning approaches. Computers. 2019; 8(1): 4.

38.

Yin

Kann

Schütze

. Comparative study of CNN and RNN for natural language processing. 2017, arXiv: 1702.01923.

39.

Socher

Perelygin

Chuang

Manning

Potts

. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. Conf. Empirical Methods Natural Lang. Process. 2013; 1631-1642.

40.

Yang

Dyer

Smola

Hovy

. Hierarchical attention networks for document classification. In: Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang. Technol. 2016; 1480-1489.

41.

Huang

. Bidirectional LSTM-CRF models for sequence tagging. Aug. 2015, arXiv: 1508.01991.

42.

Peng

Khan

Cambria

Hussain

. Sentic LSTM: A hybrid network for targeted aspect-based sentiment analysis. Cognit Comput. 2018; 10(4): 639-650.

43.

Zhang

Hou

Song

. A position-aware bidirectional attention network for aspect-level sentiment analysis. In: Proc. Int. Conf. Comput. Linguistics. 2018; 774-784.

44.

Peters

Neumann

Iyyer

Gardner

Clark

Lee

Zettlemoyer

. Deep contextualized word representations. J Assoc Comput Linguistics. 2018; 1: 2227-2237.

45.

Xie

Lin

Wang

. A multi-dimensional relation model for dimensional sentiment analysis. Information Sciences. 2021; 579: 832-844.

46.

Calvo

Kim

. Emotions in text: Dimensional and categorical models. Computational Intelligence. 2013; 29: 527-543.

47.

Buechel

Hahn

. EmoBank: Studying the impact of annotation perspective and representation format on dimensional emotion analysis. In: Proc. 15th Conf. Eur. Chapter Assoc. Comput. Linguistics: Volume 2, Short Papers. Valencia, Spain: Association for Computational Linguistics. 2017; 578-585.

48.

Zhu

Zhou

. Adversarial attention modeling for multi-dimensional emotion regression. In: Proc. 57th Annual Meeting Assoc. Comput. Linguistics. Florence, Italy: Association for Computational Linguistics. 2019; 471-480.

49.

Tang

Wei

Qin

Yang

Liu

Zhou

. Sentiment embeddings with applications to sentiment analysis. IEEE Trans Knowl Data Eng. 2016; 28(2): 496-509.

50.

Wang

L-C

Lai

Zhang

. Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020; 28: 581-591. doi: 10.1109/TASLP.2019..

51.

L-C

Wang

Lai

Zhang

. Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(3): 671-681. doi: 10.1109/TASLP.2017.2788182.

52.

Parimala

, Swarna Priya

Reddy

PKM

Chowdhary

Poluru

Khan

. Spatiotemporal-based sentiment analysis on tweets for risk assessment of event using deep learning approach. Software: Practice and Experience. 2021; 51: 550-570. doi: 10.1002/spe.2851.

53.

Swarna Priya

Maddikunta

PKR

Parimala

Koppu

Gadekallu

Chowdhary

Alazab

. An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in IoMT architecture. Computer Communications. 2020; 160: 139-149. doi: 10.1016/j.comcom.2020.05.048.

54.

Chandrasekaran

Hemanth

. Deep Learning and TextBlob Based Sentiment Analysis for Coronavirus (COVID-19) Using Twitter Data. International Journal on Artificial Intelligence Tools. 2022; 31(1): 2250011. doi: 10.1142/S0218213022500117.

55.

Abuqaddom

Mahafzah

Faris

. Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients. Knowledge-Based Systems. 2021; 230: 107391. doi: 10.1016/j.knosys.2021.107391.

56.

Krishnan

Elayidom

Santhanakrishnan

. Optimization assisted convolutional neural network for sentiment analysis with weighted holoentropy-based features. International Journal of Information Technology & Decision Making. 2021; 20(4): 1261-1297.

57.

What are some of the limitations or drawbacks of Convolutional Neural Networks? [Online]. Available: https://www.quora.com/What-are-some-of-thelimitations-or-drawbacks-of-Convolutional-Neural-Networks.

58.

Zhu

Gao

Zhang

Liu

Zhang

. A bi-directional LSTM CNN model with attention for aspect-level text classification. Future Internet. 2018; 10(12): 116. doi: 10.3390/fi10120116.

59.

Yao

. Attention-based BiLSTM neural networks for sentiment classification of short texts. In: Proceedings of the International Conference on Information Science and Cloud Computing. 2017; pp. 110-117.

60.

Zhang

Wang

Zhang

. YNU-HPCC at SemEval-2018 task 1: BiLSTM with attention based sentiment analysis for affect in tweets. In: Proceedings of the 12th International Workshop on Semantic Evaluation. 2018; pp. 273-278.

61.

Yin

Liu

Fang

. Sentiment analysis based on BiGRU information enhancement. Journal of Physics: Conference Series. 2021; 1748: 032054. doi: 10.1088/1742-6596/1748/3/032054.

62.

Ahmed

Islam

AKMM

Shatabda

. An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. arXiv preprint arXiv: 2112.05666.

CoBiCo: A model using multi-stage ConvNet with attention-based Bi-LSTM for efficient sentiment classification

Abstract

Keywords

1. Introduction

3. Proposed model

3.2 Word embedding layer

3.4 ConvNet layer

4.1 Data acquisition

Table 1 Categorization details of tweets dataset

4.3 Parameter settings

Table 4 Hyperparameter settings for the proposed model

5.1 Comparison of word embeddings

Table 5 Performance of word embeddings

Table 6 Performance comparison of deep learning methods

6. Conclusion

References

Table 1
Categorization details of tweets dataset

Table 4
Hyperparameter settings for the proposed model

Table 5
Performance of word embeddings

Table 6
Performance comparison of deep learning methods