Abstract
Text Sentiment analysis is the process of predicting whether a segment of text has opinionated or objective content and analyzing the polarity of the text’s sentiment. Understanding the needs and behavior of the target customer plays a vital role in the success of the business so the sentiment analysis process would help the marketer to improve the quality of the product as well as a shopper to buy the correct product. Due to its automatic learning capability, deep learning is the current research interest in Natural language processing. Skip-gram architecture is used in the proposed model for better extraction of the semantic relationships as well as contextual information of words. However, the main contribution of this work is Adaptive Particle Swarm Optimization (APSO) algorithm based LSTM for sentiment analysis. LSTM is used in the proposed model for understanding complex patterns in textual data. To improve the performance of the LSTM, weight parameters are enhanced by presenting the Adaptive PSO algorithm. Opposition based learning (OBL) method combined with PSO algorithm becomes the Adaptive Particle Swarm Optimization (APSO) classifier which assists LSTM in selecting optimal weight for the environment in less number of iterations. So APSO - LSTM ‘s ability in adjusting the attributes such as optimal weights and learning rates combined with the good hyper parameter choices leads to improved accuracy and reduces losses. Extensive experiments were conducted on four datasets proved that our proposed APSO-LSTM model secured higher accuracy over the classical methods such as traditional LSTM, ANN, and SVM. According to simulation results, the proposed model is outperforming other existing models.
Introduction
There has been significant websites increment in the quantity of Web 2.0 applications, such as online interpersonal organization and internet business locales, where customers whole heartedly express their decisions. Because of this expansion, a lot of information is created. Along these lines, opinion investigation was introduced as a device for automatically extracting insightful and valuable data from client produced information [1]. Sensory analysis is the major natural language processing (NLP) tasks. Due to its usefulness and challenges, this field attracted numerous researchers and professional communities [2, 3]. With the support of social media, people can share information as well as messages opinion and ideas. A huge number of people utilize web sites likes Facebook, Twitter, WhatsApp, Instagram, Google Plus and LinkedIn to express their opinions. Presently about 2.46 billion people utilize social media, with an anticipated increment to 3.02 billion until 2021 [4].
They also used to share their daily life events that lead to collecting large and different types of data. People want to share about a specific product’s experience using posts, likes, and reviews, and it gives companies a chance to gather this information and analyze the fame of their item and services [5]. How media is created and distributed through sharing and the realization of messages without any control is a revolution. Moreover, social media has an important impact on the business, advertising, and e-commerce industry, as it describes consumers’ behavior and perceptions about specific business plans, services, and products. Monitoring social media activity is an incredible method to quantify loyalty of customers, track their impressions of items or brands, influence crusades and the achievement of marketing messages, and identify the best influencers that are generally applicable to the campaign, item or brand. Thus the Social media is recognized as the next logical marketing platform [6, 7].
Specifically, comments on products in tweets are merit mining. Vendors can get purchaser’s feedback progressively and afterward update their items to be increasingly serious in the commercial center; Buyers can pick up the experience of others through these comments to decide if to purchase an item [8]. The tweets in real-time have a large influence on network transmission [9]. And also, the opinions are significant for organizations to know what customers are saying, whether it is positive or negative. If it is negative means, they can improve product quality and satisfy the customers. Even though to find the customer’s opinion is a difficult task, Most of the algorithms belong to machine learning, feature extraction methods are developed for sentimental analysis. The main contribution of the proposed work are summarized as: We proposed the Adaptive Particle Swarm Optimization algorithm model (APSO-LSTM) for effective sentiment analysis. As the APSO algorithm selects the weight parameters of LSTM, computational complexity reduced and the accuracy of the work improved. To obtain the overall higher accuracy, the Skip-gram word embedding method is utilized. Skip-gram word2vec representation achieves superior results over other word representations. The evaluation results executed on four datasets shows that our APSO model is effective. Furthermore, we analyze different metrics such as accuracy, recall, precision, and F-measure. The results are proved through conducted experiments.
Literature review
Numerous machine learning algorithms are used for sentimental analysis. In paper [10] sentimental analysis is developed using the Naive Bayes classifier. Here, they identify whether the particular sentence is positive, negative, or neutral. In [2], Online Movies’ Reviews based Sentiment Analysis of utilizing enhanced k-Nearest Neighbor Classifier was explained. In this work, Sentiment Analysis of Customer Product Reviews Using support vector machine [11] is analyzed. Moreover, Deep neural networks also known as DNNs have currently attained huge gains in different NLP undertakings, for example, language modeling [12], sentiment analysis [13], syntactic parsing [14], and machine interpretation [15]. A recurrent neural network is also known as RNN is an uncommon kind of neural system, where connections are made between units that structure a directed cycle, which permits it to show a unique temporal behavior for the model. One of the special variations of RNN is LSTM. Nowadays, many researchers have developed LSTM for sentimental analysis, but the work in this area is not enough in terms of accuracy. So, an efficient sentimental analysis methodology is presented in this work.
The purpose of the proposed methodology is the automatic prediction of customer opinion of different products. The proposed model has three modules namely, pre-processing of text, implicit and explicit feature engineering and polarity prediction based on sentiments. Gathering reviews from social media is the first task and then pre-processing is done to enhance the data quality. Text pre-processing is the first step. Feature engineering is done using skip-gram based word embedding. Then, the extracted features are given to the LSTM network. The remaining portions in the paper are sorted as pursues; in section 2 some of the existing literatures are discussed.
Many researchers had developed customer review based sentimental analysis. some of them are discussed in this section. Sheng et al. [16] explained the finding of a rumor based on consumer opinion. To achieve the predictive process, they used a convolution neural network with LSTM (CNN-LSTM). In this paper, LSTM was incorporated into the pooling layer on CNN. Meanwhile, the perception has been added as an important element in the rumor detection model. The effectiveness of this method has been verified by experiment. Moreover, Zhigang et al. [17] explained a stock closing forecast depends on LSTM and sentiment analysis. Here, they were first introduced to include investor’s perceptions of stock forecasting, which improved the forecasting accuracy of the model. Second, the stock price sequence is a complex time series with varying time fluctuations, making accurate forecasting more challenging. Then, they introduced a gradual decomposition of the stock price complex by adding the empirical mode decomposition (EMD), which attained better predictive accuracy. Third, they adopted LSTM because of the merits of evaluating the relationships between time-series data via its memory function. The experimental results depicted that their proposed model not only improves accuracy but also reduces the delay.
Xiangua Fu et al. [18] described Lexicon- enhanced LSTM using an attention mechanism. The research focused on word embedding quality improvement and thereby improve the sentiment classifier’s accuracy using sentiment lexicon. He carried out experiments on three English datasets namely IMDB, Yelp2013, MR, and two Chinese datasets namely NB4000 and Book 4000. when compared with ALE-LSTM and WALE-LSTM, the proposed method obtained a higher accuracy of 89%,60.6%,79.9%,93%, and 96% [19]. Guixian Xu et al. proposed aBiLSTM model sentiment analysis tasks of comment text. Sentiment information concatenates with traditional TD-IDF word representation is utilized in this work. Relu activation function is used to overcome overfitting and gradient vanishing problems with the help of a gating mechanism.
Similarly, Latif et al. [20] had explained an ensemble approach for sentimental analysis. Here, they introduced a combination of three kinds of features namely, unigram, lexicon, and phrases. Then, two-level ensembles were introduced for the selection of features by using Logistic Regression (LR), Gini Index (GI), SVM, and Information Gain (IG). Finally, the classification was done by SVM. Here, they reviewed different products namely, Books, DVDs, electronics products, kitchen items, and movies review. The performance of the introduced method attains the maximum accuracy of 81.85%, 91.45%, 89.70%, 93.05%, and 97.60% for electronics, DVD, Books products, kitchen items, and movies review respectively.
In [21], Graber et al. had explained an aspect-based opinion mining of drug review. Here, initially, SA was performed to foresee the sentiments concerning by and large fulfillment level, effectiveness, and side effects of client reviews on particular drugs. Then, the features are given to the logistic regression model to predict the recommended drugs based on positive reviews. This method was attained the maximum accuracy of 75.8% [22]. Da’U and Salim presented an efficient neural attention based recommendor system. This model composed of LSTM encoder, Semi-Supervised topic model, mechanism for co-attention and an analysis layer for predicting the rating of the users. The speciality of this model is the better learning capacity of the aspects of the products and sentiment lexicons and thereby improve the efficiency of recommender system. Kashfia Sailunaz and Reda Alhajj [23]. Explained the sentiment analysis task on twitter comments. The main contribution of this work is the detection of emotions and sentiments from the posts and tweets people in the social networks using the Naïve Bayes classifier. This model provides a topic-based general recommendation and user-based customized recommendation.
In [24], Shoieb and Ajit had explained an emoticon-based SA using web data. Initially, reviews are collected from the web. Then, pre-processing was done. In data pre-processing, POS tagging, stop-word removal, Lemmatization, and emotion processing was applied. Then using sentiwordnet emotion-based features using derived. After the feature extraction process, classification was done by using different classifiers namely, SVM, IBK, MLP, and Naive bias classifier. Among the classifier, naive bias classifier attained the maximum precision of 84.7% for college dataset and 83.3$ for using hospital dataset. In this paper, reviews are labeled as neutral tweets, positive, strong positive, strong-negative, weak-positive, negative and weak-negative. Additionally, Zeeshan et al. [25] had presented a lexicon and ANN-based SA. Here, they utilized the movie review dataset and the dataset consists of two labels namely positive and negative. The trained network managed to achieve a final accuracy of 91% [26]. Ankit et al. suggested an ensemble sentiment classifier for twitter sentiment analysis. The computation of the work is to predict the category of the tweet as positive, neutral, or negative. Four datasets like Stanford-sentiment 140 corpus, Health Care Reform(HCR), First GOP debate, and Sentiment analysis dataset were collected from twitter and used for testing. The proposed ensemble technique acquires 75.79%,70.28%,76.85%, and 73.33%.
Weijiang et al. [27] had explained Bidirectional LSTM with multi-channel features and self-attention mechanism (SAMF-BiLSTM) for classification of sentiment. SAMF-BiLSTM model was completely misuse the connection between sentiment polarity words and target words in a sentence, and doesn’t depend on physically sorted out opinion dictionary. Moreover, they presented the SAMF-BiLSTM-D model dependent on SAMF-BiLSTM model for record level content order assignments. The strategy gets the portrayal of all sentences in the archive through SAMF-BiLSTM preparing, at that point coordinates BiLSTM to become familiar with the portrayal all things considered, and further gets the assessment include data of the whole record. At long last, they assess test results under various datasets. The outcomes show that SAMF-BiLSTM and SAMF-BiLSTM-D was better than other propelled strategies in order precision as a rule.
Moreover, Alqaryouti et al. [28] had presented aspects depend on sentimental analysis utilizing data of government review. This approach has been adopted to address the challenges of language analysis techniques, rules, and dictionaries in much sensory analysis and to provide concise results. Identification of Indirect features in this approach which used to enhance the accuracy of the feature extraction process. Furthermore, the combined model for classification surpasses the dictionary-depend criteria and other rule combinations by an average accuracy of 5%. While utilizing the same dataset, the introduced method surpasses machine learning schemes using the support vector machine abbreviated as SVM [29]. Paramita Ray and AmlanChakrabati proposed a combined approach of Rule-based and Deep Learning method for aspect level sentiment analysis. This research used dependency parsing, machine learning techniques, and a seven-layered deep convolution neural network (CNN)for tagging each aspect in the comments.
Akyol et al. [30] describes a Social Impact Theory depend Optimization Algorithm and whale optimization model based opinion mining. The prevalent exchanging methodology dependent on the sentiment feedback quality between the tweets and news utilizing conventional programming optimization strategy was discussed by Yang et al. [31]. Keshavarz et al. [32] proposed a genetic algorithm based sentimental analysis model. Six different datasets are used for conducting experiments and the result achieved higher accuracy. In Paper [33], the author proposed a sentimental analysis method based on cross-domain aspect. They introduced a heterogeneous organization depend depiction that merges different qualities into a single network.
Donatas Meskele and Flavius Frasincar [34] given a neural network based ontology model for giving answer for sentence level opinion mining. Based a viewpoint’s opinion esteem in a given sentence. For estimating the importance of the words in the given sentence based on aspect’s sentiment value, bidirectional mechanism is used [35]. Wenxin Liang et al described topic embedding model for short texts. Gibbs sampling process is utilized in this research to enhance the topic coherence. Global and Local word embedding contributes more in increasing the performance of this model.
Yao Hu et al. [36] proposed a LSTM network aggregates the PSO algorithm for safety forecast model. Enhanced PSO – GD aggregated LSTM best suited for the analysis for Time series data. GD methods are applied iteratively to LSTM parameters to reduce the cost and to improve the accuracy. Peng Wang et al. [37] presented a solution for toll station based lane work-schedule using toll data. LSTM and PSO algorithm predicts the average length queue of the lane with the three input parameters such as the traffic volume, queue length average and time taken for service. The proposed PSO-LSTM method increased 2% and 3% of the accuracy respectively when compared to the SVR models and traditional LSTM. At last, the toll station operating cost is used for the estimation of work-schedule of toll lane.
Dataset description
For conducting experiments, four datasets are used namely, Amazon reviews [38], trip advisor [39], demonetization reviews [40], and book reviews [41]. Table 1 describes the datasets.
Description of datasets
Description of datasets
The Amazon dataset includes 60000 Amazon customer reviews and star ratings. The 60000 reviews are classified as 25627 positive reviews, 19567 negative reviews and 14806 are neutral reviews. The Trip advisor dataset includes customer’s reviews about 1000 hotels which are delivered by Datafiniti’s Business Database. The Trip advisor dataset consists of hotel name, location, review data, title, username, rating and more. It has 20000 numbers of reviews out of which 10000 reviews are positive, 7000 reviews are negative and 3000 reviews are neutral. The demonetization twitter’s dataset with 12974 tweets is classified as 2974 positive tweets, 4936 negative tweets and 5064 neutral tweets Besides the books review dataset contains 213335 numbers of reviews out of which 177268, 38434 and 27567 numbers of reviews are positive, negative and neutral respectively.
The proposed model’s work flow diagram is depicted in the Fig. 1. As shown in the figure, reviews or tweets from the datasets such as Amazon dataset, Trip Advisor dataset, Demonetization dataset and Book review dataset are pre-processed by pursuit the phased tokenization, stop words removal, stemming and segregation. Then the pre-processed twitter words are represented as a vector using one-hot encoding representation method. Skip-gram based word2vec architecture model is used to mention the words in lower-dimensional space and to make the representation more accurate. Finally, the input tweet or review is classified as positive or negative polarity using the LSTM network. To improve the performance of the LSTM, optimal weight parameters using Adaptive Particle Swarm Optimization (APSO) algorithm are chosen.

The Flow diagram of proposed model.
Before classifying the sentiment of tweets or reviews, the following steps are applied to datasets.
Words to a vector representation
To convert words as vectors, one-hot encoding technique is used. Figure 2 delineates the one-hot encoding strategy. It refers to words as vectors that are similar in size as vocabulary dictionary. For illustration, as in Fig. 1, if there are 100 words in a dictionary, the words in a sentence deals with a dimensional vector size of 100. The Skip-Gram algorithm is used to marks words at a lower-dimensional space and to show their significance in a vector.

An example of a one-hot encoding technique.
Numerous words list are extricate from contents statement into vectors with lower dimensions, usually from 10 to 1000 dimensions. Break down the frequency of words in text analysis and statements by large neglects of the sequence of sentences, paragraphs and words. In any case, this sort of evaluation may confine the comprehension of words importance in the sentence on the grounds that the logical significance of the words and the presence of the words are barred from the analysis. In this way, we used word embedding method to more precisely understand users’ reviews and to understand the basic qualities and meaning of words.
Skip-gram algorithm [42] is used in this proposed work as it shows better accuracy in extracting relationships between semantic words. The structure of the skip-gram word2Vec representation is shown in Fig. 3. The skip-gram word2Vec model speculates words that can be shown with regards to the current word. The information layer utilized words introduced as one-hot encoding vector. This vector implies as a context word, just one unit from U units,
Where, the input word is represented as w
input
and v denotes the input vector and v’ denotes the output vector representations of the word w. σ () represents a sigmoid function. Path length is denoted as L. The jthnode in the binary tree is represented as m(w,j), child(m)denotes a child node of mchosen subjectively. Then, the predicted words should minimize error function and the error is determined as follows:
Where, the size of the context is represented as c and the number of word sequences is represented as N. The goal of this function is to update the weight matrix W’ with minimum error rate. In the same context, if two reliable words are used, both words will be assigned with the, unlike vector values. In these ways, various assumptions and analyzes made based on them.

Structure of the skip-gram model.
The output vectors or features are given as classifier’s input which classifies the tweet or review as a positive context or negative context.
The set of feature is denoted as follows,
A special type of recurrent neural network (RNN) [13] called an LSTM neural network [44], is utilized in the proposed model for classification. Conventional neural structures do not consider sequential factors and cannot review the substance of the past. RNN was structured to understand this issue. Figure 4 displays the structure of an RNN. The hidden state Ht time is purchased from the Yt information and ultimately from the Ht - 1 yield. It is used to find the loss model of the current layer and to determine the Ht + 1 of the following layer. Regardless, the hidden structure of the RNN sequence index position t was upgraded to keep the target distance from the gradient disappearing issue, considering the way an RNN gradient would deal with the breakdown. Then, an abnormal RNN model called LSDM can adjust the long-distance reliability information. LSDM is somewhat inaccurate to the general neural system module of RNNs. In RNN [44], the re-emphasized NN module A has a basic structure, for example, a tanH layer.

Structure of RNN.
Where, cH denotes the control parameters of hidden state. tanH allows neural network to add or remove information to the previous input. wH refers to the weight parameter of the hidden state Ht.
On the other hand, Fig. 5 displays that LSTM consists of four neural network layers that connect within an exceptional way. Using a phenomenally designed structure named as a “gate”, LSTM can add or delete information to the memory cell state. This is the area where the gate selects operational data, viz. features of the input. It has the sigmoid neural system layer and multiplication function. The sigmoid layer switches over the input values of the features through the sigmoid function and outputs value someplace in the scope of 0 and 1, depicted how much input features can encounter that in Section A of the framework. “0” indicates that no data is permitted to pass. “1” shows that all data is permitted to pass. At each sequence index level t, the gate system in the LSDM, everywhere, connected to the gates. The sigmoid output is to assumed if the limit is of [0,1], Equations (5)–(9) describe how a model of LSTM works, as shown in Fig. 5.

Structure of LSTM.
The Forget gate will select which information to discard or keep from last minute’s memory:
The input gate chooses the information that should be stored:
Where,
It → input gate.
σ→ Sigmoid function
Ht-1 → output of the previous timestamp
Yt → output of the current LSTM block
WI → weight parameter of the input gate neurons
CI → points to the bias for the input gate.
Another candidate value vector is made by a tanH layer and is denoted as follows:
The tanH allows LSTM to add or remove information to the previous input. Vt denotes candidate at timestamp (t) for the cell state. control and weight parameters of tanH layer are referred as cV and wV refers.
The generation of the candidate value vector is selected at input gate, and the forget gate selects whether to keep or discard the information to create the final memory.
Where Vt denotes memory cell state at current timestamp(t) and *represent the element-wise multiplication of the vector. Finally, the output gate determines which part of the memory is in the long run yielded:
By then, the passed data streams into the tanH layer for getting ready. The output is regard between [–1, 1] and the output gate multiplies the yielded regard. Ot denotes the output gate. Wo carries the weight parameter of the output gate neurons. σ points the sigmoid function and the obtained output from the previous timestamp is denoted as Ht-1, Yt points to the obtained output of the current LSTM block, c0 points to the bias for the input gate. Finally the output is evaluated by,
Where,* denotes the vector’s element wise multiplication. Through softmax output layer, predicted output from the current block is obtained and is pointed by Ht. Vt denotes memory cell state at the (t) current time stamp.
Finally, the loss function of this model is estimated by calculating the MSE (mean square error). The calculation is done as follows
Where, Tt denotes the desired output. N is the prediction generated from a sample of n data points. Loss calculates the mean squared prediction error.
The mean square error (MSE) is the average squared difference between the values of evaluated and the actual value.
If the estimated score is below 0 (negative values), then the tweet or review is considered as a sentiment of negative and if the estimated score is above 0 (positive), then the tweet or review is considered as a positive sentiment and lastly if the estimated score is 0 it is considered as neutral.
To enhance the performance of the LSTM, Weight parameters
The PSO has 4 levels: Initialization of particles, Estimating the particles with the fitness function Renewing particle positions and velocities. Updating the experiences of the particles with the general knowledge of the swarm.
For increasing population diversity and for avoiding the premature convergence of PSO, the opposition based learning (OBL) method [45] is used with PSO. Opposition based learning (OBL) combined with PSO becomes the Adaptive Particle Swarm Optimization (APSO).
APSO optimizer plays a key role in increasing the accuracy of the proposed LSTM neural network model by adjusting the attributes such as weights and learning rates to reduce losses.
While surveying an answer x to a given issue, figuring its opposite answer x’at the same time, gives another opportunity to estimate a closer solution for the global optimum. The concept of opposition-based learning can be integrated with neural networking;
•
The initialization is done as
wF → weight parameter of forget gate,
WI → weight parameter of the input gate neurons
Wo → weight parameter of the output gate neurons
WV → Weight parameter of tanH layer
Besides, the opposite solution is represented as follows;
For a given problem, whenever a answer x is find then the opposite answer
The real number is choosen from X ∈ [a, b] and the
The optimal solution is the solution with the least fitness value. APSO finds the optimal weight based on the environment within less time duration. If the number of iterations increases then computational complexity will increase. This also minimizes the loss.
In each iteration, the particle’s velocity have to be adjusted to its newly formed position Pbest and global position Gbest. The velocity v of each particle is to be updated according to conditions, the
W denotes inertia weight. The inertia weight is used for the searching process. The inertia weight will be reducing while maximizing the iteration. This is estimated as follows,

Flowchart of the APSO algorithm.
The proposed sentiment analysis on various datasets is implemented using the programming language Python 3.7 in the operating system of windows 2007 with 64 bit and with 4GB main memory at 2 GHz dual-core PC. In this simulation, various datasets such as Amazon, Trip advisor, Demonetization, and Books are utilized. From each dataset, 80% of the dataset are taken for training the proposed classifier APSO-LSTM and 20% of the dataset are taken for testing the classifier. Using the training dataset, pre-processing and word embedding are done and the classifiers are APSO-LSTM, LSTM, ANN, and SVM are trained. Similarly, pre-processing and word embedding processes are also done on the testing dataset. Finally, the embedded word features from the testing dataset are given as input to the trained classifiers. Due to the performance of trained classifiers, the sentiment score of the input tweets is analyzed. After the classification (positive, negative, and neutral) of input tweets, based on precision, recall, accuracy, and F-score, the performance of the classifier is appraised. The following section defines the performance metrics.
Classifiers
For performance analysis, we evaluate the classifiers using four datasets namely, Amazon reviews, trip advisor, demonetization reviews, and book reviews. The results of these classical sentiment classifiers such as ANN(Artificial neural network) [25], SVM [12] (Support vector machine) traditional LSTM [17] and, LSTM – PSO [36, 37] are compared with our proposed classifier APSO-LSTM.
Where;
a → Threshold value
ω → Weight factor
The training sample of the discriminant function is given in Equation (23).
b i → Lagrange multiplier
a → Threshold value
u i , v i → Two vectors
r (u i , u) → Kernel function
The metrics such as accuracy, recall, precision and F-score are defined with the TP,TN,FP and FN. Where TP denotes True positive, TN denotes True Negative, FP denotes False Positive and FN denotes False Negative. The performance metrics are defined as follows.
For demonstrating the proposed technique efficacy, much comparative analysis has undergone. A good selection of feature extraction techniques contributes well for a better sentimental prediction task.
Selection of parameter for skip-gram word embedding
In Table 2, the Hyper-Parameter choices for word embedding are tabulated below
Hyperparameters for word embedding
Hyperparameters for word embedding
In this paper skip gram algorithm [42] is used for feature extraction process,. This model converts the word into vector format. Here, the performance comparison between the N-gram model [43] and proposed skip-gram model is done.
Table 3 shows the comparative analysis of model with different word embedding size for Amazon dataset. As skip-gram model reduces the computational complexity, accuracy percentage of the skip-gram model with 100 word embedding size is increased to 91.5% while the existing n-gram model attains 80.6%. In addition, at word embedding size 500, skip-gram model based word representation obtains 95.2% accuracy while the existing n-gram model attains 86.1%. The Table 3 corresponding graphical representation is given in Fig. 7(a).
Accuracy obtained at different word embedding size on Amazon Review dataset
Accuracy obtained at different word embedding size on Amazon Review dataset

Accuracy plot for different feature extraction methods at different Word Embedding dimensions (a) using Amazon dataset (b) using Trip advisor dataset.
In Table 4, results obtained at different word embedding size using Trip advisor dataset is shown. skip-gram model acquired the accuracy of 92.5% at word embedding size 100. As tabulated, our skip-gram model shows better performance at word embedding size 500. The results shows the continuous highest performance of the proposed model on different datasets. The Table 4 corresponding virtual representation is given in Fig. 7(b).
Accuracy obtained at different word embedding size on Trip advisor dataset
While making predictions, the skip-gram model learns better representations for the rare words because there is no averaging of embedding vector. Skip gram based word embedding yields consistent best accuracy in models using large corpora and a high number of dimensions.
The Parameter setting for the sentiment classifier is listed below in Table 5.
Parameters used in this work
Parameters used in this work
The goal of the proposed methodology is the prediction of reviewer’s opinion using APSO-LSTM algorithm. Evaluation is done on proposed model using Amazon review dataset to assess the performance.
The experimental results obtained by using Amazon dataset is given in Table 6. To enhance the performance of LSTM, APSO algorithm selects the optimal weight parameters As per the analysis in Table 6, the maxima accuracy of 96.8%, precision of 85.28%, recall of 76.08% and F-measure of 80.45% attained by theproposed strategy. This is due to optimal weight selection using APSO. The corresponding graphical representation is given in Fig. 8.
Performance analysis of Proposed APSO-LSTM at different Iterations on Amazon review dataset
Performance analysis of Proposed APSO-LSTM at different Iterations on Amazon review dataset

Experimental results using Amazon dataset at different Iterations.
For demonstrate the proposed technique’s efficacy, the presented APSO-LSTM based sentimental analysis compared with different algorithm namely, LSTM, ANN and SVM based sentimental analysis. Similarly, experimental results analyzed using different four dataset namely, Amazon review dataset, trip advisor dataset, demonstration dataset and book review dataset. Table 7 shows the comparative analysis.
The results of Accuracy, Recall, Precision and F-measure of different classifiers
The results of Accuracy, Recall, Precision and F-measure of different classifiers
When analyzing Table 7, for Amazon product review dataset, our proposed strategy attained the maximum accuracy of 96.8% which is 94.4% by using PSO-LSTM based sentimental analysis, 81.9% by using ANN based sentimental analysis,92.5% by using LSTM based sentimental analysis and 91.8% by using SVM based sentimental analysis. Compared to ANN based sentimental analysis and SVM based sentimental analysis, APSO-LSTM and PSO-LSTM methods attained the better results. Similarly, our proposed method attained the higher precision as 85.28%, recall as 76.08% and F-Measure as 80.4%. Comparative analysis of proposed against existing using Amazon product review dataset is given in Fig. 9. For analyzing trip advisor dataset, proposed method attained the maximum accuracy of 97.8%, precision of 87.28%, and recall of 78.08% and F-measure of 82.42%.

Comparative analysis using Amazon dataset.
Due to skip-gram model and APSO, the proposed method attained the better results compared to existing method. Comparative analysis of proposed against existing using trip advisor dataset is given in Fig. 10. For demonization dataset, the proposed strategy attained the maximum accuracy of 93.2%, which is 91.2% for utilizing PSO-LSTM based sentimental analysis, 79% for ANN based sentimental analysis and 88.9% for using SVM based sentimental analysis. Similarly, proposed model attained more precision of 82.3% which is only 80% in PSO-LSTM, 76.55 for using ANN and 78% for using SVM based sentimental analysis. Comparative analysis using demonetization review dataset is given in Fig. 11. For analyzing book review dataset, the proposed method attains the maximum accuracy compared to exiting methods. The graphical representation of book review dataset is given in Fig. 12.

Comparative analysis using Trip adviser.

Comparative analysis using demonetization.

Comparative analysis using Book review dataset.
PSO-LSTM shows superior results over the classic algorithms LSTM, SVM and ANN due to its long term memory capabilities at predicting text sequence. PSO-LSTM solves the gradient vanishing and exploding problems. APSO assists LSTM in selecting best weights for the environment in less number of iterations. The computational complexity reduces when the number of iterations decreases. So APSO - LSTM ‘s ability in selecting optimal weights for neural network combined with a good hyper parameter choices leads to improved accuracy when compared to traditional LSTM.
In this work, APSO-LSTM based sentimental analysis model is proposed. The skip gram model based feature extraction has been used for word embedding. Skip-gram Word to Vector representation requires less memory space and yield continuous higher accuracy when compared with other Word to vector representations. LSTM encoder is used for capturing global sentiment information. The aim of the work is to improve accuracy in sentiment analysis tasks. So, the model particularly combines APSO with LSTM to enhance the performance. The contribution of APSO in selecting weight parameters for LSTM neural network increases accuracy and decreases loss. APSO- LSTM prevents over-fitting problem and significantly reduces the training time of the large neural network. The mathematical expression of LSTM, Skip gram model and APSO has been clearly explained. For experimental analysis, four types of dataset has been used. The performance evaluation of proposed method has been done using different metrics like accuracy, recall, precision and F-measure. The proposed methodology attained the maximum accuracy of 96.8% for Amazon dataset, 97.8% for trip advisor dataset, 93.2% for demonetization dataset and 95.2% for book review dataset. Empirical results proves the superior performance of the proposed Methodology. As a future work, deep learning techniques will be incorporated for improving the results on different Natural Language Processing tasks like sentiment analysis, named entity recognition, text summaization etc.
