Abstract
The rapid growth of social media and specialized websites that provide critical product reviews has resulted in a massive collection of information for customers worldwide. These data could contain a wealth of information, such as product reviews, market forecasting, and the polarity of sentiments. In these challenges, machine learning and deep learning algorithms give the necessary capabilities for sentiment analysis. In today’s competitive markets, it’s critical to grasp reviewer opinions and sentiments by extracting and analyzing their characteristics. The research aims to develop an optimised model for evaluating sentiments and categorising them into proper categories. This research proposes a unique, novel hybridised model that integrates the advantages of deep learning methods Dual LSTM (Long Short Term Memory) and CNN (Convolution Neural Network) with word embedding technique. The performance of different word embedding techniques is compared to select the best embedding for the implementation in the proposed model. Furthermore, a multi-convolution approach with attention-oriented BiLSTM is applied. To test the validity of the performance of the proposed model, standard metrics were applied. The outcome indicates that the suggested model achieves a significantly improved accuracy of 96.56%, superior to other models.
Keywords
Introduction
The technique for automatic extraction of sentiment data from unstructured text is known as sentiment analysis. Several fields, including machine learning, natural language processing (NLP), and data mining, use sentiment analysis [1]. The goal of this process is to determine the orientation of users’ reviews using word information taken from their context. Organizations employ sentiment analysis to learn about their customers’ reactions to their products and services. The extensive usage of social media platforms provides opportunities for people to get feedback on products, services, events, etc. [2]. These comments are frequently based on user interactions with the product or service, which may include positive or negative feelings about items or services. The ability to recognise unfavourable customer feedback is essential to an organization’s progress [3]. Statistical machine learning methods work excellent for simple sentiment analysis scenarios, but they are not appropriate for more complicated text classification issues [4]. These insights will aid businesses in improving their products and services, allowing them to make more profits.
The encoding of words as vectors is an essential step in sentiment analysis. Word2Vec [5] and Glove [6], based on distributed representation, are now the most frequently used word embedding technologies. Because the vector of words can capture many contextual factors to describe texts, it is highly beneficial for many tasks involving semantic similarity. According to previous research on sentiment classification [7], deep learning methods are now preferable over machine learning methods such as Support Vector Machines (SVM), Naive Bayes, Decision Trees, and Random Forests, which are commonly employed for classification.
The pioneers of deep learning-based models for sentiment classification are convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Due to the cost- effectiveness of large-scale manufacturing of efficient Graphics Processing Unit cards and the large datasets, deep learning has emerged as a superior method to machine learning in sentiment classification applications [8]. Compatibility between text representation techniques and their algorithms is critical in many text classification tasks. Therefore, it becomes feasible to provide high-performance classification with the assistance of a suitable representation of texts and a classification model.
With the advancements in cognitive computing and artificial intelligence, deep learning techniques have emerged as an increasingly popular method for sentiment analysis in several fields [9, 10]. As a result, many researchers use neural networks to automatically build feature representations from text sequences. The prominent deep learning algorithms for sentiment categorisation nowadays include RNN, CNN, and LSTM. Huang et al. [11] suggested combining bidirectional LSTM (BiLSTM) and CRF models to get superior outcomes by using both backward input and forward input characteristics. Ma and Hovy [12] proposed a combined approach using CNNs and LSTM-CRF to identify entities. On the CoNLL-2003 corpus, Chiu and Nichols [13] used the BiLSTM combined with the CNN model to achieve significant results. At the same time, Lample et al. [14] used the hybrid BiLSTM-CRF model with text-level representation for better analysis of sentiments.
Each deep learning approach has a distinctive property that is actively used to achieve a particular objective or assess datasets. The text representation technique, which translates texts into numeric form, has both benefits and disadvantages. For resolving difficulties and obtaining higher accuracy, a combined approach of the ideal text depiction method and the appropriate machine learning algorithm for emotion analysis is critical for resolving difficulties and obtaining higher accuracy. In the numeric format representation of user sentiments, any text representation approach (e.g., word embedding, character embedding) is not fully complete. A new hybrid model has been proposed to address these issues and evaluate a dataset of tweets from various airline services in India. The new hybrid model has been proposed using deep learning techniques that employ multiple representations of data.
Based on the studies above following research questions arise:
How to perform hybridization of CNN with multi-layer of Bi-LSTM (DualLSTM) for classification of textual dataset. What is the effect on the performance of the proposed model on varying length of tweets categorized into following categories based on length of texts: Very Short, Short, Medium, Long, Very Long? How to test the efficacy of the proposed model for text based sentiment classification against the baseline models and advanced deep learning approaches as well?
To answer the above research question and achieve the objective the main contribution of this research are as follows:
To achieve optimised sentiment classification, three popular word embedding techniques, FasText, Keras, and Word2Vec, are applied to the datasets to extract the semantics in the texts. The effectiveness of these techniques is tested, and the best one is selected for implementation in the proposed model. To use the advantages of different deep learning techniques, i.e., CNN and DualLSTM, followed by another layer of CNN, is applied. The integrated model not only performs proper utilisation of the capacities of CNN to extract the set of local features but also uses the advantages of bi-directional long-term capabilities and the network of short-term memories in extracting the global features of texts. The purpose of using BiLSTM is to resolve the issues that CNN is not able to consider. Therefore, integration of BiLSTM with CNN does not solve the meaning of context intact in the words in the classification of text. Therefore, the integration of BiLSTM with CNN enhances the accuracy of feature selection in text classification. The model was applied on different length of tweets of user reviews to test the applicability and accuracy of classification. The proposed model is tested and confronted with several baseline and advanced deep learning models to confirm the viability of the proposed model.
The rest of the research paper is laid out as follows: Section 2 provides a discussion about relevant researches, particularly in the area of sentiment analysis in relevant fields. The details of techniques for conducting the research are discussed in Section 3. Section 4 contains details of the dataset and classifier models that were utilized in the experiments. The experimental findings are shown in Section 5, which is followed by a discussion in Section 6. In the last section, the article is concluded by presenting potential future directions.
Sentiment Analysis has received increasing attention from researchers as a result of the growth of NLP, and various implementations have been performed in word-level embedding. Within the course of these researches, it was discovered that hybrid algorithms produced improved classification results. Jiang et al. [15] created a text representation strategy using Bag-of-words based on sentiment topical terms, which included a deep neural network, context information, and sentiment topic terms that performed significantly better in Sentiment Analysis. Whereas Rezaeinia et al. [16] presented an enhanced word embeddings technique based on POS tagging methodology and lexicons of sentiment that outperformed pre-trained embeddings of words utilized in Sentiment Analysis. For improving sentiment classification performance, Liu et al. [17] presented a model in which machine learning was coupled with deep learning. The efficiency of the suggested method is demonstrated using datasets of Turkish and Chinese languages in their research. Pham et al. [18] presented an approach that combines several CNNs and is centered on embeddings of words from Glove and Word2Vec, and it performed well in tasks of aspect sentiment classification. Han et al. [19] developed a hybrid neural network approach for document representation that included user and product information and used CNNs and LSTM.
A novel deep learning architecture with hybrid CNNs and BiLSTM (H2CBi) features has been presented in [20], which combines the power of CNNs and BiLSTM. They obtained distinct vectors of features supplied as input to LSTM using two separate pre-trained word vectors. The BERT model for representing text, proposed by Devlin et al. [21], may better depict the modifying associations in texts and performs well in tests of Sentiment Analysis. By combining a neural topical approach into representations of word-level semantics, Liu et al. [22] suggested latent topic information of the given text, as well as a novel topic-based attention method for texts to look at the syntactic of words using the prospects of given topics for word association. BiLSTMs were used in another research [23] to extract dependency knowledge from vocabulary and the position of the sentence.
BiLSTM and CNN were used in this proposed hybrid strategy to produce n-gram features from text categorisation by applying multiple CNNs to given LSTM outputs.
In another approach, Zhou and Long [24] performed text classification in reviews of Chinese products using the Bi-LSTM approach. With the help of CNN for extracting features using word embedding. The classification success of combining CNN and BiLSTM was superior to the experiments’ simple CNN and BiLSTM classification performance. In another research [25], a multilayer CNN with LSTM was utilised for analyzing the sentiments of users on the given dataset with the help of the application of social media from Tibetan. With the aid of a three-layered CNN, the characteristics were retrieved using the assistance of three-layered CNN, the characteristics were retrieved. The collected features are sent into a two-layer LSTM network as input. This hybrid model based on deep learning outperformed CNN and LSTM, according to the results. A hybrid Bi-LSTM model based on attention-mechanism is proposed in another research [26]. The proposed model efficiently integrated BiLSTM and CNN for the classification of text using Word2Vec with an attention mechanism.
In addition, Kaladevi and Thyagarajah [27] utilised a single layer of CNN and a dual- stacked LSTM for analysing Indian tweets on a sequential basis. The characteristics produced from the CNN layer are fed into the LSTM network in their hybrid experiments. Like earlier hybrid research, the researchers utilised CNN to extract characteristics. Zhou et al. [28] proposed a dual language representational learning model using attention, in which the scattered semantics of several documents in both languages of target and source texts are learned. Documents are modelled using LSTM networks. They also suggested a hierarchy-based attention methodology for multilingual LSTM networks, which outperformed the benchmark data set.
Furthermore, the Convolutional Neural Network [29] method might efficiently extract local information. Also, the work has been done utilising CNN to handle the issue of orientation analysis. AF-LSTM is proposed by Tay et al. [30]. To represent context and aspect terms at the word level, Atrial Fibrillation-LSTM (AF-LSTM) employs circular correlation and convolution. Through the information fusion procedure, the targeted knowledge is then integrated into the representation of sentences. Yu et al. [31] utilised a multi-way Gated Recurrent Unit (GRU) combined with an attention-based mechanism to classify brief text in e-commerce reviews, with promising results. These approaches demonstrate that when deep learning is coupled with an attention mechanism, it can improve short text sentiment analysis.
In another research, Huang et al. [32] presented an AOA model, i.e., an Attention-over-Attention (AOA) model. With the help of two fine-grained attention mechanisms, the AOA model extracts text-level associations between aspects and contexts that allow the sentence representation to focus automatically on the elements that belong to sentences that are significantly more important for the aspect term expressions. The TDLSTM model was proposed by Tang et al. [33]. In the proposed model, the sentence is split into two sections with aspects and models of of the hidden state of both sections using dual long-short-term networks (LSTM). After that, the two portions are merged to create a specified target representation that is then put into the activation softmax algorithm for emotion categorization. Baziotis et al. [34] utilised LSTM coupled with an attention- based model to assign weights to decisive words via the attention-based mechanism, which improved the effect of keywords in emotion sentences and obtained significantly better results in emotion categorization of Twitter reviews.
Although several researchers have suggested their deep neural network architectures for sentiment classification problems based on the CNN or RNN, merely a few researchers have comprehensively examined the performance of several classification models based on deep learning methods. Dong et al. [35] proposed an analytical comparison of several sentiment classifications based on deep learning model architectures to extract useful significance for the development of the sentiment classification approach. For the identical model structure, input at the word level produced better results for classification than input based on character-level. In another study, Hu et al. [36] showed that models built using deep learning outperformed standard techniques like algorithms based on dictionary methods, the Naive Bayes, or SVM. They did not, however, disclose quantifiable performance metrics such as F-Measures or their level of accuracy. Dzikienė et al. [37] presented a performance comparison of conventional machine learning methodologies Naïve-Bayes Multinomial, Support Vector Machine, and deep learning (LSTM and CNN) approaches on the Lithuanian internet comments dataset. Features based on morphological, character information, and conventional machine learning approaches were applied. The deep learning approaches were applied to both embeddings, Word2Vec and FastText. In their experiments, LSTM outperformed SVM and Naive Byes multinomial methods. Yin et al. [38] compared the results of LSTM, CNN, and GRU sentiment classification. However, the utility of their experimental outcomes was of less use since they could not focus on enough structural differences in the model, and their conclusive result was based on only one dataset’s experimental results.
RNN was utilised by Socher et al. [39] to tackle text categorisation difficulties. A model Sentiment Treebank was introduced that surpassed all earlier techniques on various criteria when the training was performed on the fresh tree-bank, which could accurately represent the impacts of negativity. Yang et al. [40] established a hierarchy-based text categorisation strategy based on the attention-based mechanism that effectively captured the text’s main sentiment information. Huang et al. [41] suggested combining bidirectional LSTM (BiLSTM) and CRF models to produce superior outcomes by using both backward and forward input characteristics. Sentic-LSTM was developed by Ma et al. [42] to incorporate explicit and implicit information explicitly, and an extended version of Sentic-LSTM was presented to deal with a combined work involving aspect detection of target-dependent and aspect-based classification of polarity. Gu et al. [43] presented a refinement method of word vector that improved all word vectors. By improving pre-trained vectors of the word and using intensity ratings of sentiments provided by sentimental lexicons, the proposed model performed better in sentiment analysis. By training a huge corpus of text, Peters et al. [44] proposed a textual representation strategy using a deep learning model. The researchers built a text depiction framework in the English language that incorporated grammar and sentiment elements.
Dimensional sentiment classification has achieved a lot of importance because it can describe emotional states as continuous numerical values on several dimensions, such as polarity (positive–negative) and arousal (excited–calm). In another study [45], the article provides a multi-dimensional relation model for dimension score prediction that incorporates relationships between dimensions into deep neural networks. The authors constructed a Chinese three-dimensional corpus with valence-arousal-irony (VAI) ratings to test the suggested technique. Experiments indicate that including inter-dimensional relationships in the prediction process outperforms typical approaches that treat each dimension separately. Rafael [46] offered a novel way of processing text with a dimensional model employing normative databases and comparing it to different categorical ways. For each pseudo-document, an emotive thesaurus and a bag-of-words model are used to produce vectors, and t reduction strategies are then assessed for category models. The researchers demonstrate that rather than a feature space driven by statistics, the dimensional technique may be utilised to visualise emotions in a psychologically relevant space. In the categorical representation of emotions, different statistically driven dimensionality reduction strategies were compared to a dimensional representation based on psychologically supported data. The results demonstrate that among categorical techniques to classification, NMF-based categorical classification works best, and the dimensional approach is similar to NMF. Sven [47], describes EMOBANK, annotated in the Valence-Arousal-Dominance (VAD) representation format with dimensional emotion metadata. The authors annotated for both writer and reader emotions; second, ratings for categorical basic emotions as well as VAD dimensions are now accessible for a portion of the EMOBANK. The reader’s perspective provides both stronger IAA values and greater emotional ratings. We demonstrated that using standard machine learning approaches, an automatic mapping between category and dimensional formats is possible with near-human performance for the bi-representationally annotated sub corpus. For the purpose of multidimensional emotion regression, an Adversarial Attention Network was presented by Suyang [48], which automatically evaluates various emotion dimension scores for an input text. Further, a discriminator is used to learn improved word weights through adversarial training between two attention layers. To learn public word weights between two emotion dimensions, a shared attention layer is used. The proposed technique provides considerable gains in R-values on both the EMOBANK Reader’s and Writer’s multi-dimensional emotion regression tasks in all domains over the state-of-the-art baselines, according to empirical evaluation on the EMOBANK corpus. Duyu [49] proposed learning sentiment embeddings, which are sentiment-specific word embeddings. There is an issue in sentiment analysis since words with similar contexts but opposite sentiment polarity, such as positive and terrible, are mapped to adjacent word vectors. Therefore, the authors addressed this issue by using sentiment information from texts (e.g. sentences and words) that is encoded in sentiment embeddings together with word contexts. On multiple benchmark datasets for these tasks, experimental results reveal that sentiment embeddings regularly outperform context- based embeddings. In another work, Jin Wang [50], proposed a regional tree-structured CNN-LSTM model that consists of two different parts: local CNN and LSTM for predicting the VA ratings of texts. To increase performance even more, an area division technique is proposed to find task-relevant words and clauses so that structured information can be incorporated into VA prediction. The suggested method outperforms regression-, standard Neural Network-based, and structured methods from earlier studies, according to experimental results. Word embeddings, which provide continuous low- dimensional vector representations of words, are widely utilised in natural language processing. Word2vec and GloVe frequently fail to capture enough sentiment information, resulting in opposite sentiment polarity (e.g., positive and terrible) for words with similar vector representations, worsening sentiment analysis performance [51]. The researchers presented a word vector refinement model that uses real-valued sentiment intensity scores generated by sentiment lexicons to refine existing pretrained word vectors. The goal of the refinement model is to improve each word vector such that there are more semantically and sentimentally similar words (i.e., those with similar intensity scores) and fewer sentimentally dissimilar words (i.e., those with dissimilar intensity scores) in the lexicon. On the SemEval and Stanford Sentiment Treebank datasets, experimental results show that the proposed refinement model can improve both traditional word embeddings and previously proposed sentiment embeddings for binary, ternary, and fine-grained sentiment classification.
Several research studies are being conducted by a number of researchers in the field of sentiment analysis, several researches are being performed by a number of researchers. Such an interesting algorithm has been proposed by M. Parimala [52], to conduct sentiment analysis on tweets about a certain incident for a specified area at various time intervals. The proposed algorithm for risk assessment sentiment analysis (RASA) classifies tweets based on keywords provided by the network and calculates a sentiment score for each location. Support vector machine, Naive Bayes, maximum entropy, logistic regression, and random forest, etc are state-of-the-art techniques used to validate the model. The results show that the suggested RASA outperforms XGBoost by 1% in a binary class situation and by 30% in a multiclass scenario on average when compared to all other approaches. Swarna Priya [53] proposed an effective Intrusion Detection System(IDS) using deep neural networks (DNN) in the Internet of Medical Things (IoMT) environment to predict and classify probable cyberattacks. The proposed methodology, which combines classic PCA with the bio-inspired Grey-Wolf Optimizer(PCAGWO) algorithm, aids in the extraction of high-impact features from the dataset. As the hybrid PCAGWO is employed as a dimensionality technique, the classification accuracy is improved when compared to traditional models. The suggested PCA-GWO based Deep Neural Network classifier architecture beats all other commonly used classifiers on the market.
Sentiment Analysis plays a great role for the improvement in the quality of services for any organization. Using this, people not only identify possible reactions of the users but also helps in improving the system based on the reviews of consumers. Such research has been performed by Chandrasehkaran [54] for sentiment analysis for COVID-19 tweets as information shared by the people during the pandemic. The researchers used Python- based libraries for performing sentiment analysis. Authors performed analysis using the different machine learning and deep learning algorithms (BiLSTM) and TextBlob, to assess people’s sentiments when Coronavirus reaches a high level. When compared to typical machine learning models for Twitter sentiment classification, the Bi-LSTM technique shows a higher accuracy (0.87). Abuqaddom [55] presented a novel anti- vanishing learning algorithm named as OSLD (Oriented Stochastic Loss Descent). OSLD iteratively updates a random-initialized parameter by a small positive random number, scaled by a tuned ratio of the model loss, in the opposite direction of its partial derivative sign. In five benchmark models, the research compares OSLD to the stochastic gradient descent method as the basic backpropagation technique and Adam as one of the best backpropagation algorithms. The results of their experiments show that OSLD is quite competitive to Adam in small and moderate depth models, and OSLD surpasses Adam in very long models, according to experimental results. Krishnan [56] presented a model that consists of six steps for tweets documented by Mongodb. The pr-processed tweets followed by feature extraction named as cross Holoentropy and joint Holoentropy are developed. The constant term used in evaluating the weight function is optimised in order to improve the performance of classification results. A new, enhanced approach called Self Adaptable Moth Flame Optimization (SA-MFO), which is an adaptive variation of the MFO algorithm, is introduced for this optimal tuning. The proposed model outperformed other conventional models.
Based on studies on these research papers, it is understood that a variety of methods have been implemented for the classification of sentiments using deep learning techniques. The approaches and research described above are primarily employed for extracting semantic information for features from the sentence dimension while ignoring the information-based features of the dimension of the word vector. In this research, two convolution layers (ConvNet) are used with an attention-based BiLSTM layer to extract the semantic information of the local characteristics of the word vector in the word insertion dimension. This research also uses max-pooling to achieve significantly comprehensive local feature information. Section 3 discusses the proposed model in detail.
Proposed model
The core idea behind the proposed mode is to use two distinct deep neural networks, namely ConvNet and DualLSTM. Then a hybrid ConvNet
Proposed model of sentiment classification.
In this section, a detailed explanation of the proposed COBICO model is discussed. The proposed model consists of the following layers:
Pre-processing layer; Word embedding layer; Dropout layer; ConvNet layer; Pooling layer; DualLSTM layer; Attention layer; Flatten and dense layer; Output layer.
Raw information usually contains words or symbols that computers cannot comprehend. Therefore, it is required to clean the data and restructure it in an understandable format, and therefore, data cleaning is an important stage in NLP. The preprocessing of input texts in Fig. 1 was implemented on the dataset to obtain excellent classification performance in categorising text review data. Before the word-based representation phase (Word Embedding), these preprocessing steps are used to reduce unwanted content and convert the dataset into a usable form. Initially, all text in the dataset is converted to lowercase. The information of links in the input dataset is substituted using the “URL”. Further, irregular spacing between words is trimmed to a single space. Punctuations, numerals, and unformatted characters in tweets were eliminated. Grammatical Error Correction (GEC) by NLP-Progress is used to correct all grammatical mistakes like spelling, grammar, and punctuation.
Word embedding layer
The proper representation of texts is one of the most essential phases in the text classification process. Initially FastText, Keras and Word2Vec Embedding are used in this research.
Following steps are executed for converting input text to Word Vector Representation:
The input for the process is a text denoted as The text is rearranged in a concatenated sequence of the word embedding,
For each text input, a matrix of sentence The matrix of the sentence is now passed to the CNN layers for further processing.
(a) ConvNet. (b) DualLSTM.
To minimize overfitting in our proposed model, the dropout approach is utilized in which some units (neurons) were temporarily deleted from the network models, along with their incoming and outgoing connections. Dropout inhibits model units from over-adapting to training data, resulting in improved generalization on the validation set. In this proposed model dropout rate is set to 0.03.
ConvNet layer
As illustrated in Fig. 2(a), the architecture of a Convent comprises an input layer, an output layer, and five different hidden layers. The input layer accepts a textual message that has been padded to a predetermined length of words, followed by a word embedding layer. The attention layer follows the word embedding layer, to extract high-level feature vectors. The attention layer is a sub-unit made up of context vectors that line up the source input with the goal output. Figure 4(a) shows an illustration of the attention mechanism in the upper right corner. The SpatialDropout1D (for dropout) layer uses feature vectors as inputs derived from the attention layer. On top of the dropout layer, a ConvNet layer (bottom right) with convolution filters and a ReLU activation function is applied. Finally, the probability distribution over three sentiment orientations is computed using a fully-connected dense layer comprised of a softmax function and three units (positive, neutral, negative).
The ConvNet Module uses a convolutional based operation “*”, between the matrix of text A filtering matrix The features map is learned as per the following equation:
where
The pooling layer receives the output of the convolutional layer. The convolution layer minimizes the content representation even further by selecting the maximum value obtained from a pool of values and eliminating the irrelevant data. The procedure of pooling is represented as follows:
To make precise predictions, the model must understand the long-duration dependence on text data. Since the convolutional layer has lacking this capacity, that’s why DualLSTM is used to incorporate this component into the proposed model. The model learns from the data in both directions i.e. left-to-right and right-to-left using DualLSTM. As a result, the DualLSTM layer enhances classification accuracy. In the Bidirectional LSTM, there are two autonomous LSTMs i.e. ahead LSTM and backward LSTM.
The hidden state “
Finally, using the following Eq. (3), both directions vectors (backward and ahead) have been merged as the last state(hidden) in the DualLSTM to create a series of output for vectors(hidden)
The following equations are used to implement the DualLSTM cell.
Forward LSTM:
Backward LSTM:
where weight matrices are represented by
Element-wise multiplication is denoted by
The current state is represented by
The DualLSTM network is used here to identify the orientation of users’ opinions, as shown in Fig. 2(b). The ConvNet and Max Pooling layers are replaced by DualLSTM and Flatten layers, respectively, in this network design, which differs somewhat from the one illustrated in Fig. 2(a). An illustration of DualLSTM architecture and the attention mechanism is shown on the right column of Fig. 2(b). The output provided from the DualLSTM layer is again supplied as input to ConvNet Layer and the resultant is provided as input to the Attention Layer.
There are certain words in a statement that are irrelevant for polarity detection but on the other hand, some words are decisive. The attention-based mechanism is used to provide attention to such informative content. Therefore this layer was created to automatically extract the significant terms.
Equation (14) is used to calculate the word significance vector
The softmax function is then used to determine the normalized word weight
Finally, to create the output of the attention mechanism, Eq. (16) is used to calculate a weighted summation.
The attention layer output
This layer converts the matrix of context retrieved from the preceding layer to a vector context that provides the input for the classification layer’s final stage. The following Eq. (17) is used to execute the flatten layer operation.
This is the last phase of the proposed method for resolving the class of sentiments in terms of negativity, positivity, or neutrality. The output of the flatten layer is provided to a softmax activation function, that calculates the likelihood of the sentiment classification. The final output is computed as:
This section provides details about the data acquisition and experimental setups. Section 4.1 discussed about acquisition of data and their categorization. Section 4.2 provides information about hardware and software environment for implementation. Hyperparamters details are discussed in Section 4.3 and in last Section 4.4 evaluation and validation parameters are discussed.
Data acquisition
The proposed model is implemented on the dataset gathered from user tweets on Twitter for Indian Airlines between 1st June 2021 and 31st August 2021, there were 24,235 tweets in the dataset. The dataset is collected for this duration because the domestic flights started after the second wave of COVID-19. A Rest API-based tool named Tweepy is used. Three sentiment classes are represented i.e. positive, negative, and neutral. The dataset is divided into two distinct subsets i.e. Traning Set and Validation Set. 75% of the tweets (18176 tweets) are grouped as Training Set and 25% (6059) tweets are grouped into Validation Set.
Figure 3 shows the graphical representation of review classification in different categories.
Categorization details of tweets dataset
Categorization details of tweets dataset
Categorization of training set
Categorization of validation set
Classified number of tweets in the dataset.
Tables 1, 2, and 3 provide the total number of tweets in the dataset, tweets in the training set, and validation set respectively.
Experiments were carried out using the services of Google. Google provides a cloud- based service for file storage named Google-Drive, which was utilized to store our dataset. For this research, employed the Google Colab system is employed, which is a free cloud-based service provided by Google for Machine Learning developers which is based on Jupyter notebook for performing machine learning research using Python. Keras API with backend Tenserflow is used here for the execution of experiments for the proposed model. Experiments was performed on 64 bit Windows 11 operating system with 6 GB RAM and 3.50 GHz processor which provided better results.
Parameter settings
To get high model performance, hyper-parameter optimization must be implemented. Hyper-parameter settings are also used to avoid the issue of underfitting and overfitting. The randomized search technique was used to enhance accuracy. Following hyper- parameters were finalized to obtain the optimized performance for the CoBiCo model as shown in Table 4.
Hyperparameter settings for the proposed model
Hyperparameter settings for the proposed model
To appraise the performance of the suggested CoBiCo model, a standard performance evaluation is carried out. The experiments were carried out on the Indian Airlines dataset. The Confusion Matrix is used as a performance metric for the evaluation of the proposed model. The assessment parameters of the confusion matrix are based on four fundamental inputs: False Positive(
False Positive(
False Negative(
True Positive(
True Negative(
Based on these parameters following performance metrics are Precision(
This section provides the results and performance comparisons in detail. First of all Section 5.1 discussed about different word embedding peroformance is analysed. Based on the performance of the word embeddings the best one is selected for further implementation of the proopsed model. Section 5.2 shows the comparison of proposed model with several deep learning methods in detail. Furthermore the analysis was performed on different length of tweets to test whether the proposed model provides optmiized results or not. And finally Section 5.4 shows the efficiency of the model based on loss and accuracy of training and validations.
Comparison of word embeddings
The proposed CoBiCo Model has been used for the classification of sentiments. The overall performance of the word embedding is assessed using a weighted average of
Performance of word embeddings
Performance of word embeddings
Experiments were performed for evaluating the overall efficiency of classification on word2vec, Keras and the FastText embedding methods on the used dataset. The performance of efficiency of different word embeddings is represented in Table 5. Word2Vec embedding was observed to be less efficient and lower accuracy as compared to other two methods. the FastText method of embedding attained optimized performance. Compared with Word2Vec, Keras embedding has shown improved efficiency by 2.2%, 1.66%, 1.93% and 2.49% on
The classification efficiency of the proposed CoBiCo model has been compared with other deep learning methods such as CNN, CNN-BiLSTM, BiLSTM, and BiLSTM-Attention. Since FastText Embedding has shown the most optimized performance over the other two embeddings, therefore the execution of the proposed classification model is performed using FastText embedding. The observations are shown in Table 6. The observations depict that the proposed model surpassed other deep learning models in view of increased performance as shown in Table 6:
Performance comparison of deep learning methods
Performance comparison of deep learning methods
Performance comparison of CNN and proposed CoBiCo.
CNN vs CoBiCo: In this experiment, emotion classification of airlines review text using proposed model is compared to that of a single layer CNN model (Fig. 4). In comparison to the suggested Attention-based CoBiCo model, a single layered CNN model produced unsatisfactory outcomes (
Performance comparison of BiLSTM and proposed CoBiCo.
Bi-LSTM vs. Proposed CoBiCo: The Bi-LSTM model performance was analyzed and comparison is performed with the proposed CoBiCo model in the next experiment (Fig. 5). The Bi-LSTM delivers lower outcomes (
Performance comparison of CNN-BiLSTM and proposed CoBiCo.
Performance comparison of BiLSTM-attention and proposed CoBiCo.
Performance comparison of self organizing maps and proposed CoBiCo.
CNN-BiLSTM vs. Proposed CoBiCo: Furthermore, the research was performed to compare the performance results obtained from the proposed CoBiCo to the CNN- BiLSTM model (Fig. 6). In comparison to the suggested technique, the CNN-BiLSTM delivers poor outcomes (
Performance comparison of restricted boltzmann machines and proposed CoBiCo.
Performance comparison of overall accuracy.
Length of reviews for positive reviews.
BiLSTM-Attention vs. Proposed CoBiCo: In the next phase of comparison the experiments were performed for comparison of results of BiLSTM-Attention model with proposed CoBiCo Model (Fig. 7). Comparing to the suggested technique, the BiLSTM-Attention delivers poor outcomes (
Length of reviews for neutral reviews.
Length of reviews for negative reviews.
Categorization of a dataset based on varying length.
(Stacked) BiGRU vs. Proposed CoBiCo: The experiments were performed for comparison of results of (Stacked) BiGRU model with proposed CoBiCo Model (Fig. 8). Comparing to the suggested technique, the Self Organizing Maps delivers less effective outcomes (
Tweets categories and their count
Rule sets for length based tweet categorization
Performance evaluation of different models on varying review length
Ensemble CNN-GRU vs. Proposed CoBiCo: In the final phase of comparison the experiments were performed for comparison of results of Ensemble CNN-GRU model with proposed CoBiCo Model (Fig. 9). Comparing to the suggested technique, the BiLSTM-Attention delivers poor outcomes (
The observations in Table 6 show that except for the proposed model none of the models shows consistent performance for all classifications. The experimental outcomes present that the BiLSTM model exhibits much lower performance for negative class classification whereas CNN-BiLSTM shows degraded performance for neutral class classification. The BiLSTM-Attention model shows relatively lower performance for positive class classification. Finally the consistent and highest level of performance of proposed model with
The overall accuracy of classification of different deep learning methods is shown in Fig. 10. CNN method achieves 85.63% accuracy; BiLSTM shows 88.91% accuracy whereas a CNN-BiLSTM and BiLSTM-Attention method achieves 89.36% and 91.89% accuracy respectively. The proposed CoBiCo model outperforms all other models in the accuracy with 96.56%.
It is also worth noting that the dataset includes comments of varying lengths. The shortest review is one word long, while the largest is 200 words long. The average length of review for the whole corpus is 21.02 words. The class-wise length description is depicted in Figs 11–13.
Performance comparison of the proposed model on different varying length. (a) Very short length. (b) Short length. (c) Medium length. (d) Long length. (e) Very long length.
Figure 11 represents the length variation for positive class reviews. Figures 12 and 13 shows length variation in neutral and negative class tweets respectively. The negative reviews are observed to be having a higher length whereas the positive reviews have the lowest length. The average length for positive reviews is 14.05 words and the average length of neutral tweets is 13.55 words. The negative reviews are having the highest average length of 24.21 words.
Based on the observations in the varying length of the tweets, further analysis is performed to investigate the efficiency of the proposed model on the distinct datasets extracted from the original dataset. The Gaussian function is used to distribute the dataset into five different categories of tweet-length as shown in Fig. 14. Table 7 shows the number of tweets based on the varying lengths.
Based on observations from Fig. 14, the following fuzzy rules are designed for analysis of the performance of the proposed model. The rules are described in Table 8.
Comparison of the proposed model for accuracy and loss using ROC curves. (a) Training accuracy. (b) Training loss. (c) Validation accuracy. (d) Validation loss.
The performance evaluation of different deep learning models is shown in Table 9(a)–(e). In Table 9(a) CNN model shows the least performance among all with a performance accuracy of 76.59%, the proposed model performs excellently with an accuracy of 96.90%. In Table 9(b) and 9(c), the proposed model shows an accuracy of 96.75% and 96.66% for short and medium length tweets respectively. Tables 9(d) and 9(e) signify that the performance of the CNN model shows significant degradation in classification for longer length reviews as the accuracy reduces to 69.31% and 66.32% for long and very long length reviews classification. The confusion matrix parameters precision, recall, and f-measure also degrade significantly with 65.32%, 66.01and 65.66% respectively for the very long length category of tweet reviews. Table 9(d) and 9(e) demonstrates that the BiLSTM model performs better than CNN-BiLSTM for classification of longer length category of reviews, whereas Table 9(a)–(c) clearly shows that CNN-BiLSTM performs better than the BiLSTM model. The BiLSTM-Model shows consistent performance for all length categories of reviews.
Figure 15(a)–(e) shows the performance comparison of several deep learning methods based on the observations for varying lengths of reviews. The observations show the consistency of the proposed model in all categories of lengths and state the effectiveness of the model.
To verify the correctness and feasibility of the proposed model, a comparison of Training Accuracy and Training Loss was performed among the deep learning model. A receiver operating characteristic (ROC) curve graph [49] is used to illustrate the evaluation results. In Fig. 16(a), of ROC Curve graph X-axis denotes No. of Epochs and Y-axis denotes training accuracy of the model. The area under the curve is a number that goes from 0 to 1, ROC values near 1 imply that the model is performing well. As observed in the figure, the performance of the proposed model is better than other models. Figure 16(b) shows the training loss of all the models considered for comparison. The proposed model shows promising results in terms of training loss.
In Fig. 16(c) and 16(d), it is noticeable that the proposed model exhibited better performance than the other deep learning models. The observations obtained from ROC curves in Fig. 16(a) to 16(d) demonstrate that the proposed model achieves higher accuracy and lowest losses in Training and Validation.
Conclusion
This research provided a sentiment analyzer for extracting people’s views on Indian Airlines’ services stated on social media. Word2Vec, Keras, and FastText were the three pre-trained word embedding approaches that were trained and tested, and FastText performed the best in terms of word vectorization accuracy. Further experiments on deep learning models were implemented using FastText embeddings as it has shown the highest accuracy. The research findings revealed that the proposed CoBiCo model worked admirably on the obtained dataset, even beating the baseline classifier. The proposed model achieved an F-measure of 93.75% for the classification of positive reviews, 96.76% for neutral, and 94.28% for negative reviews classification. The overall accuracy achieved from the proposed model is 96.56%, which significantly outperforms other deep learning models. The results show that other models did not perform consistently for positive, negative, and neutral classes and the proposed model proved to be consistent. Considering the varying nature in the length of reviews, the experiments were further performed on the dataset using lower review length, medium review length, and higher review length. The findings of experiments show the effectiveness and validity of the proposed model since the model performed consistently with an F-measure of 96.17% for lower review length, 96.34% for medium review length, and 96.01% for higher review length. The proposed model achieves the highest accuracy and lowest loss in training and validation. The outcomes confirmed the utility of the proposed CoBiCo model as a viable option for addressing users’ emotions expressed for services on social media.
