Abstract
Customer feedback is useful for product development and increases the sales of the product. Reviews on e-commerce websites provided by the user provide valuable information about the product. Sentiment analysis on the text review helps to analyze the sentiment of users about the product and predict the sales of a product. The existing techniques in sentiment analysis use Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) models that have limitations of vanishing gradient problem and overfitting problem. Initially, the Amazon review dataset is collected and processed in word embedding stage. The CNN is utilized to extract the features from the input dataset for sentiment analysis. The Word Embedding Attention (WEA) technique provides higher weight to the words having strong relation with class. The CNN feature helps to provide higher performance for a smaller number of training data. This technique helps to increase the performance of the model related class-wise, thus increasing the precision and recall value. Finally, WEA technique in Bi-directional LSTM (BiLSTM) is used to increase the classification performance. The Balanced Cross-Entropy is proposed to maintain the gradient and solves the vanishing gradient problem in the network. The WEA-BiLSTM model has 97.4% accuracy, and 86.8% precision, and the existing CNN model has 97.1% accuracy and 85.4% precision in sentiment analysis. In this study, WEA-LSTM is used for the sentimental analysis of user reviews. This technique solves the vanishing gradient problem in the network by using Balanced Cross Entropy and helps to increase the performance of the model.
Keywords
Introduction
Consumer feedback is useful information in business to assess the quality and improve the product for the benefit of customers that provides an idea of expectation of new products. Various deep learning techniques were applied to accurately predict the customer’s opinion on mobile phone reviews of Amazon products [1]. On e-commerce websites such as Flipkart, eBay, and Amazon, thousands of users leave reviews about the products and services provided by the websites such as Yelp, Rotten Tomatoes, and Trip Advisor. Some users leave reviews about the product or services on social media. Therefore, customer reviews and feedback shared on the common platform about services or products influence new customers’ perspectives towards institutions, organizations, services, and products [2]. In natural language processing, one of the basic tasks is to learn low-dimensional word vector representation from a large dataset. The existing word embedding techniques learns word vectors from context semantic information and grammar while ignoring words sentiment information. Some techniques of sentiment information in reviews don’t consider certain words in various domains [3]. Sentiment analysis exploits the grouping of text mining, computational linguistics, and natural language processing to evaluate, calibrate, derive and analyze textual data in terms of documents, phrases, sentences, etc. Recently, natural language processing gained a huge consideration due to its efficiency in extracting useful information [4]. Sentiment analysis aims to classify the given text into three categories of user emotions: Positive, Neutral, and Negative [5].
Sentiment analysis is applied to input text to determine the sentiment polarity such as positive, neutral, and negative. Recently, online merchants and retailers ask their customers to share their opinion about the product. Consequently, more opinions are created every day and this is hard to process the data to extract useful information about the customer review [6]. Recently, deep learning technologies were applied to sentiment analysis to provide more promising results. Various neural network-based architectures such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) were applied to improve the efficiency of sentiment analysis [7]. Deep learning models of Convolutional Neural Network (CNN) and LSTM provide higher efficiency in sentiment analysis [8, 9, 10]. The objectives and contribution of this study are given below:
The word embedding technique is proposed to provide higher weight value to the words having strong relation to the classes. This helps to increase the learning rate of the model and increases the class-wise performance of the model. The Balanced Cross Entropy is applied to maintain the gradient in the network to reduce the vanishing gradient problem. The gradient is maintained in the network based on the loss calculation. The CNN-based feature extraction technique is applied to generate more features in the convolutional layer and helps the model to provide higher performance for less training data. The Word Embedding Attention-Bi-directional Long Short-Term Memory (WEA-BiLSTM) model has higher performance in sentiment analysis classification compared to existing techniques. The WEA-BiLSTM model solves the vanishing gradient problem and decreases the overfitting problem in the classification.
This research study is structured as follows: Sentiment analysis researches are reviewed in Section 2 and the Word Embedding Attention (WEA) and Balanced Cross Entropy explanation are given in Section 3, the results of WEA-BiLSTM model are specified in Section 4. Lastly, Section 5 provides the conclusion.
Sentiment analysis helps to find the response of the user related to the product and this technique can be used for sales prediction or to improve the product quality. Sentiment analysis is a challenging process in large e-commerce review data due to the presence of unstructured data. Recent research related to sentiment analysis was reviewed in this Section.
Feature extraction methods
Banbhrani et al. [11] applied the sentiment analysis technique for the prediction of review ratings based on feature extraction techniques. The model provides significant features such as number of sentences, emoticons, hashtags, elongated words, punctuation marks, numerical words, number of capitalized words, Term Frequency Inverse Document Frequency (TF-IDF), and SentiWordNet-based statistical features. Sentiment classification was performed using random multimodal deep learning and extracted features. The developed technique was a supervised technique and deep learning technique has an overfitting problem.
Wei and Song [12] construct a classification model based on text sentiment and a multi-input matrix was used to combine sentiment features that provide input of multi-channel CNN to extract sentiment features for text sentiment classification. The model also developed an image sentiment classification model using face images and global image merging. The supervision module on CNN with weighted loss was applied to extract features of facial sentiment, whole image sentiment was fused with facial target sentiment to determine image sentiment polarity. The text and image sentiment output were fused for the decision fusion method for the classification technique. The fusion of features creates an overfitting technique and irrelevant features affect the classification performance.
Shobana and Murali [13] applied the skip-gram technique for semantic and contextual feature extraction of words. LSTM model was applied for learning the complex textual pattern and the Adaptive Particle Swarm Optimization (APSO) model was applied for weight parameters. The Adaptive Particle Swarm Optimization model has a local optima trap and lower efficiency in the classification.
Feature selection methods
Zhang et al. [14] applied the attention model of interactive attributes to consider relevant attributes and interactive relationships to enhance performance for reviews. Multiple interactive attribute encoder deliberates all the attributes to increase the sentiment classification performance. After the local text encoder, multiple interactive attribute encoders were exploited to extract hidden data for text representations of aligning attribute features with self-attention of bilinear interaction. The multi-loss objective function was applied to improve the performance and tested on three datasets such as Amazon, Yelp and IMDB. The developed technique was a supervised model and has a vanishing gradient problem. Gokalp et al. [15] applied Iterated Greedy (IG) technique for feature selection in sentiment classification. The Iterated Greedy model was tested on the Amazon product dataset and displays significant results. The developed technique has limitations of imbalance data problems and overfitting.
Classification methods
Chen et al. [16] applied sentiment classification techniques to analyze user’s review habits to increase the performance of Hierarchical Neural Networks. Based on users, training sets are partitioned in the model and each user review are aggregated that is user’s historical reviews. The LSTM-based Hierarchical Neural Network was applied to train target and historical reviews for document representations. The similarities between multiple historical reviews and document representation of target review. The LSTM model has a vanishing gradient problem and the Hierarchical Neural Network model has an overfitting problem in classification.
Zhao et al. [17] applied an attention sharing mechanism and parameter transferring technique for cross-domain sentiment classification that consists of Target Domain Network and Source Domain Network. The pre-training language model was applied as training data in Hierarchical Attention Network such as global vectors for Bidirectional Encoder Representations from Transformers (BERT) and global vectors for word representation. To fine-tune the words, the parameter transferring technique in word and sentence levels was applied from Target Domain Network to Source Domain Network. The system with overfitting degrades the model’s performance.
Sazzed and Jayarathna [18] applied a hybrid technique of Self-supervised Sentiment Analyzer that uses lexicon-based technique and machine learning technique for sentiment classification from unlabelled data. The Lexical Rule-based Sentiment Analyzer was applied to predict review semantic orientation for prediction confidence score. The confidence score helps to provide highly accurate pseudo labels for Self-supervised Sentiment Analyzer and this consists of machine learning techniques to improve classification performance for complex reviews and less polarized. The Self-supervised Sentiment Analyzer and Lexical Rule-based Sentiment Analyzer performance was compared with existing unsupervised and achieved higher performance.
Rezapour [19] applied three machine learning classifiers such as Decision Tree, Support Vector Machine (SVM) and Naïve Bayes for sentiment analysis. Sparse word removal was applied for data in the removal of almost sentiments of negative reviews due to positive reviews being applied for reviews majority. Training the model individually for positive and negative reviews helps to increase learning capacity. Maintaining stop words, applications of various N-grams, separate tokenization process, and review title combination with their contents were applied to improve model performance. The learning capacity of the model was less and the feature selection process was not effective.
Kumar et al. [20] applied a hybrid technique of a three-step semi-supervised model that jointly learns aspects and sentiment from review sentences. Each aspect of seed words in a small set was considered and constructed sentiment class for respective semantics in vocabularies of coherent class. The Part-of-Speech (POS) tags were used to construct vocabularies to label sentences subset from training datasets. The semi-automatic technique was applied to label the data and this induced noise in the label during annotation. The overfitting problem in the joint learning method degrades the performance of classification.
Geetha and Renuka [21] performed sentiment analysis on consumer review data to classify positive and negative feelings. Various models such as SVM, LSTM and Naïve Bayes were used for review classification. The deep learning technique of the BERT Base Uncased model was used to solve the problem of sentiment analysis. An improved performance was provided by the BERT model with high accuracy and good prediction than the existing technique. The training of the BERT model generates more weights that create overfitting in classification.
Dadhich and Thankachan [22] perform automatic identification of sentiment analysis using K-Nearest Neighbor, Random Forest, SentiWordNet, Logistic Regression, and Naïve Bayes on English text from products of Flipkart and Amazon. Five key parameters of methods and existing sentiment analysis were presented in this study. The Product Comment Summarizer and Analyzer system was presented in this paper. A generic and automatic comment analyzer was used to find sentiment polarity and provide effective comments. This summarizes and classifies the comments as neutral, negative and positive very effectively. Rakshit et al. [23] applied analytics-based statistical techniques on primary data from samples of millennial samples to effective communication in selling e-marketers on individual human connections. Twitter platform of Twitter data was used for sentiment analysis to analyze the performance of Amazon and Flipkart during Big Billion Day Sales 2019 in India. The Naïve Bayes applied in the model consider the factors are independent and lower efficiency in learning feature importance.
Priyadarshini and Cotton [24] applied LSTM-CNN with grid search of Deep Neural Network. Baseline models including CNN-LSTM, LSTM-CNN, Neural Networks, K-Nearest Neighbors, and CNN were considered in this research. The developed model displays significant improvement, but the overfitting issue disturbs the effectiveness of developed model. Zhang et al. [25] applied a hybrid model for the amazon review dataset and neural networks were applied for the collaborative filtering process. This model applies embedding techniques to perform the model for a smaller number of training data. This model suffers from the limitation of overfitting problems to degrade the performance.
Proposed Word Embedding Attention-Bi-directional Long Short-Term Memory (WEA-BiLSTM) model
Here, in this research, the input Amazon Review dataset was used and given to CNN model for feature extraction. The attention layer provides weight value to the features and is applied to the BiLSTM model for sentiment classification. The Balanced Cross-Entropy is applied to maintain gradient in the network. The flow of the WEA-BiLSTM model for sentiment classification is illustrated in Fig. 1.
The flow of Word Embedding Attention-Bi-Directional Long Short-Term Memory (WEA-BiLSTM) model for sentiment classification.
This is a sizable crawl of Amazon merchandise reviews. From about 20 million users, this dataset includes 82.83 million unique evaluations. Additionally, it contains 9.35 million pieces, covering the period from May 1996 to July 2014. It contains the meta data: , category, helpfulness votes, item-to-item relationships, price, product image, reviews and ratings, sales Rank, timestamps.
Word embedding
Word embeddings are calculated in various ways. The placement of each word takes place in a three-dimensional area. The data scientist decides how many variables to use in this area. Also experiment with various dimensions to see which one yields the finest outcome. Dense vectors with a lot less dimension are word embeddings. In addition, the vectors’ direction and distance represent the semantic connections between words. The embedding technique tends to give rules to words that are only occasionally encountered in training because it has a larger vocabulary.
Existing pre-trained word embeddings never succeed in sentiment analysis tasks because some sentiment words have similar syntactic and semantic characteristics in the corpus. This study suggests a word embedding technique to enhance the sentence-level emotion classification’s efficiency. The mapping between word embeddings and the corresponding sentiment orientations is fully utilized by this technique of sentiment enhancement. Words are first transformed into word embeddings using this technique, and then emotion mapping vectors are applied to each word embedding. Then, word embeddings and the emotion mapping vector that go with them are combined to create sentiment. The predicted sentiment orientations are obtained after reducing the dimensions of sentients through a completely connected layer. Then this output is processed for feature extraction stage where CNN is employed.
Convolutional Neural Network (CNN)
CNN is excellent at extracting features from images and has been demonstrated to be highly effective at identifying patterns that are challenging to identify using conventional techniques. CNN employs a feature extractor in the training process. The feature extractor used by CNN is made up of unique neural network classes, the weights of which are determined during training. Multiple convolution layers are followed by max-pooling and an activation function in the feature extraction process [26]. A crucial part of the CNN architecture is the convolution layer, which carries out feature extraction. Feature extraction usually entails combining linear and nonlinear operations, such as the convolution operation and the activation function [27]. Humans find the method for comprehending images fascinating, and it is a realistic option for individuals. However, there are more unnoticed difficulties with the machine’s ability to comprehend an image. The CNN is a deep learning that aims to mimic the visual technology of an individual. It is stimulated by the visual cortex of the brain. In image processing, which includes detection, localization, segmentation, and classification, among other tasks, CNNs represent a significant advancement.
Architecture of Convolutional Neural Network (CNN) for feature extraction.
The major factor driving the model’s widespread use is the great CNN effectiveness in classification. Implementing the convolutional layer of learnable weights and biases similar to real neurons is part of the CNN. As demonstrated in Fig. 2, the fundamental components of CNN are convolutional layers, activation functions, and fully linked layers. This article provides a succinct explanation of the CNN, as well as the report includes a thorough description of CNN [28].
Convolutional Layer: The visual system of brain contains neuronal cells that are involved in retrieving image features [29]. Each neural cell extracts different features that aid in the comprehension of an image. Convolutional layers are used to simulate neuronal cells, which allows for the extraction of properties including gradient perception, texture, colors, and edges. Convolutional filters, also known as kernels, have a size of
Activation Function: While non-linearity predominates in actual statistics, non-linear data transformation is applied. This guarantees that specified representations of the input space for diverse output spaces according to the necessities.
This requires real-value number
In a non-linear function, the real value number
Pooling: It does a non-linear down-sampling with a combined feature. As a result, dimensional reduction data processing requires less computational resources. To lower the spatial size, data are aggregated based on feature type or space; rotational variance of images outperforms translation and regulates overfitting. A rectangle patch set is created when inputs are separated to use a pooling method. Depending on pooling procedure, a single value is computed to replace every patch. Maximum and average pooling are the two types of pooling that are widely frequently used [31].
Fully Connected Layer: Inputs are interconnected to every node in the subsequent layer and weight values are assigned to every node, similar to a neural network [32]. The model’s final result is the sum of the inputs times the respective weights. To carry out the classification task, a fully connected layer is coupled to the sigmoid activation function. In Fig. 3, CNN’s completely linked layer is displayed.
Convolutional Neural Network (CNN) with fully connected layer.
Three region CNN networks are applied in a weighted word representation layer to use knowledge of the text. The word representation is given as input and applied to a convolutional layer built on every tweet.
where word vector
Here, max-pooling layer is applied in the feature map to denote
The max-pooling attention calculation is given in Eq. (5) above and generated attention score
The attention score output is applied in BiLSTM to learn the feature context. The bi-LSTM model performs sequential maps to generate final features. The final feature context
The LSTM model is widely used in sentiment prediction because it provides superior performance on data sequences [33]. The input
The Long Short-Term Memory unit consists of input, output and forget gate in Bi-directional Long Short-Term Memory (BiLSTM).
Forget gate
Element-wise product of cell state is applied with forget gate
The output gate
Equation (11) is used to compute the hidden state
The prediction is performed using LSTM model, this model uses input as historic capacity data
Two LSTMs layers is applied to learn the sequence of each token-on-token past and future context. The LSTM model manages the order from left to right and another one from vice versa, as in Fig. 5. A hidden forward layer and
The bi-directional network is transformed into high-level sentiment representation in fully connected dense layer to predict text sentiment polarity. Equation (12) provides the output.
where BiLSTM network [34, 35] provides feature map
Attention layer in Bi-directional Long Short-Term Memory (BiLSTM) model for sentiment classification.
The deep neural network baseline is combined with two distinct losses of optimization and a small replay memory: the distillation loss and standard softmax cross-entropy.
Model is trained using total loss
where
At each incremental step
where associated ground truth label is
The Word Embedding Attention-Bi-directional Long Short-Term Memory (WEA-BiLSTM) model is applied on the Amazon dataset for the sentiment analysis and compared with other methods. The performance and error metrics were measured on the results of WEA-BiLSTM model for comparison.
Performance of WEA-BiLSTM model for various iterations
Performance of WEA-BiLSTM model for various iterations
The WEA-BiLSTM model is applied for sentiment analysis and its performance is measured for various iterations, as shown in Table 1. This shows that the model has higher accuracy due to its capacity to detect sentiment. The attention technique uses word embedding for the convolution process which helps to provide a better feature representation. The recall value of the model is low due to the presence of text information related to sarcasm and distinguishing features. In 400th iteration, the WEA-BiLSTM model accuracy is 95.6%, precision is 86%, recall is 76%, and F-Measure is 78.7%.
Accuracy and F-measure of WEA-BiLSTM model for various iterations.
Precision and recall of WEA-BiLSTM model for various iterations.
The WEA-BiLSTM model accuracy and F-measure are measured in sentiment analysis for various iterations, as shown in Fig. 6. This shows that the WEA-BiLSTM model has higher accuracy and considerably lower performance than F-Measure. The WEA-BiLSTM model has lower performance in the negative class and this reduces the F-measure of the model. The model has the lesser results in the negative class since the presence of sarcastic information from the user.
The WEA-BiLSTM model classifier comparison on sentiment analysis.
The precision and recall of the WEA-BiLSTM model is measured for various iterations, as shown in Fig. 7. The WEA-BiLSTM model achieves higher precision and recall values within 50 iterations. The WEA technique increases the learning rate of the BiLSTM model based on weight values provided to input text. The WEA technique provides higher weight to the words having strong relation with class and this helps to enhance the model effectiveness.
The WEA-BiLSTM classifier comparison on sentiment analysis
The WEA-BiLSTM model performance for various categories
WEA-BiLSTM precision and recall on sentiment analysis.
The performance of the WEA-BiLSTM model is evaluated with other classifiers, as displayed in Table 2 and Fig. 8. WEA with CNN can generate deep features and provide higher weight to the words having strong relation with class. The existing SVM model has an imbalance data problem, and the LSTM model has a vanishing gradient problem. The APSO-LSTM model has a local optima trap and misses some potential features for the classification. The CNN model has an overfitting issue since the formation of additional features in extraction process. The WEA-BiLSTM model has 97.4% accuracy, and 86.8% precision, and existing CNN contains 97.1% accuracy, and 85.4% precision in sentiment analysis.
The precision and recall value of the WEA-BiLSTM model is calculated and compared with existing classifiers on sentiment analysis, as shown in Fig. 9. The WEA-BiLSTM model uses word embedding to provide a higher weight value to the words having strong relation with dataset categories. This increases WEA-BiLSTM model performance related to the product category and increases the precision and recall value. The existing CNN model has lower efficiency since the formation of additional features in the convolutional layer. LSTM has a vanishing gradient problem and this degrades the model performance.
The WEA-BiLSTM model performance is evaluated on the parameters of accuracy and F-measure for several categories in the dataset, as exposed in Table 3 and Fig. 10. The WEA-BiLSTM contains higher performance in every category in the dataset. The WEA-BiLSTM model applies the word embedding technique to provide higher weight value to words having strong relation with classes. The CNN extracted features and weight value increases the learning rate of the LSTM model in classification. The balanced cross-entropy maintains the gradient values in the model that helps to overcome the vanishing gradient problem. The WEA-BiLSTM model has 93.85% accuracy, 93.5% precision, and CNN model has 91.95% accuracy, and 91.5% precision.
The precision and recall value of WEA-BiLSTM model for various categories in the dataset is shown in Fig. 11. The WEA-BiLSTM model has higher precision and recall value than existing techniques in sentiment analysis. The WEA-BiLSTM model applies a higher weight value for the words having strong relation with classes. This helps to increase the performance of the model class-wise, thus increasing the precision and recall value.
WEA-BiLSTM model Mean Absolute Error (MAE) on sentiment analysis
The WEA-BiLSTM model accuracy and F-measure for several categories.
The Mean Absolute Error (MAE) value of WEA-BiLSTM model for various categories on the Amazon dataset is calculated and evaluated by conventional methods, as displayed in Table 4 and Fig. 12. WEA-BiLSTM contains higher performance than existing method. The WEA technique provides a higher weight value to the words having strong relation with classes. The balanced cross-entropy maintains the gradient value in the network and avoids the vanishing gradient problem.
WEA-BiLSTM Root Mean Square Error (RMSE) on sentiment analysis
The WEA-BiLSTM model precision and recall for various categories.
Mean Absolute Error (MAE) of WEA-BiLSTM model on sentiment analysis.
WEA-BiLSTM model for sentiment analysis
Root Mean Square Error (RMSE) of WEA-BiLSTM model on sentiment analysis.
The WEA-BiLSTM model for user cold start.
The WEA-BiLSTM model for item cold start.
WEA-BiLSTM model for user Item cold start.
The WEA-BiLSTM model measures Root Mean Square Error (RMSE) value for various categories on the dataset, as displayed in Table 5 and Fig. 13. The WEA-BiLSTM shows better class-wise output due to weight value which is provided related to product category. The conventional CNN consist of overfitting problems due to generation of more features in convolutional layers.
The WEA-BiLSTM model is evaluated with Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) values for various cold start problems, as shown in Table 6. The user cold start has less information about the user information and the item cold start has less information about products. The WEA-BiLSTM model uses the CNN based feature extraction that helps to extract deep features and provide higher weight values to the words having strong relation.
The WEA-BiLSTM model performance is measured with Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for various user, as shown in Figs 14–16, respectively. The WEA-BiLSTM achieved supreme results in classification over conventional models on sentiment analysis. The WEA-BiLSTM model has higher efficiency for a smaller number of training data due to the model uses CNN feature extraction for classification.
Sentiment analysis on review provides the sentiment of the user about the product that is useful for product development. The existing LSTM and CNN based models were applied in sentiment analysis that has limitations of vanishing gradient problem and overfitting problem in classification. This study proposed the WEA technique to provide higher weight values to the words having strong relation with classes. The CNN model is applied for feature extraction due to the generation of more features in the convolutional layer. The Balanced Cross Entropy is applied to maintain the gradient in the network to solve the vanishing gradient problem. The CNN feature extraction helps to provide higher performance for a smaller number of training data for classification. The future work comprises applying the feature selection method to select relevant features and achieve higher performance in imbalance data.
Funding
This research received no external funding.
Data availability
The datasets generated during and/or analysed during the current study are available in the [Amazon Review datasets] repository: [https://www.kaggle.com/datasets/bittlingmayer/amazonreviews].
Footnotes
Conflict of interest
The authors declare that they have no conflict of interest.
Author’s Bios
